×

zbMATH — the first resource for mathematics

Improved haplotype assembly using Xor genotypes. (English) Zbl 1397.92462
Summary: Given a set of aligned fragments, haplotype assembly is the problem of finding the haplotypes from which the fragments have been read. The problem is important because haplotypes contain SNP information, which is essential to many genomic analyses such as the analysis of potential association between certain diseases and genetic variations. The current state-of-the-art haplotype assembly algorithm, HapSAT, does not exploit genotype information and only receives a read matrix as input. However, the imminent importance of haplotypes and inexpensiveness of genotype information motivate for exploiting genotype information to obtain more accurate haplotypes. In this paper, an improved haplotype assembly method, xGenHapSAT, is proposed, which exploits xor genotype information for more accurate haplotype assembly. Xor genotype information is even less expensive than full genotype information, e.g., using the Denaturing High-Performance Liquid Chromatography (DHPLC) technique. It is shown that using this inexpensively obtainable information significantly improves the accuracy of the assembled haplotypes. In addition, a new, more efficient, Max-2-SAT formulation is adopted in xGenHapSAT, which, on average, increases the speed of the algorithm. Moreover, the proposed xGenHapSAT method replaces the current state-of-the-art haplotype assembly method based on genotype information. Finally, our state-of-the-art haplotype assembly software, HapSoft, which includes both xGenHapSAT and HapSAT, is made freely available for research purposes.

MSC:
92D10 Genetics and epigenetics
68Q25 Analysis of algorithms and problem complexity
92-04 Software, source code, etc. for problems pertaining to biology
Software:
HapCUT; UBCSAT
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Altshuler, D., A haplotype map of the human genome, Nature, 437, 1299-1320, (2005)
[2] Bansal, V.; Bafna, V., Hapcut: an efficient and accurate algorithm for the haplotype assembly problem, Bioinformatics, 24, i153-i159, (2008)
[3] Barzuza, T., et al., 2004. Computational problems in perfect phylogeny haplotyping: xor-genotypes and tag SNPs. In: Proceedings of the Combinatorial Pattern Matching, LNCS 3109. Springer-Verlag, Berlin, Heidelberg, pp. 14-31. · Zbl 1104.92044
[4] Bonizzoni, P., Pure parsimony xor haplotyping, IEEE/ACM transactions on computational biology and bioinformatics, 7, (2010), 14-31
[5] Broder, A.Z., Frieze, A.M., Upfal, E., 1993. On the satisfiability and maximum satisfiability of random 3-CNF formulas. In: Proceedings of the Fourth ACM-SIAM Symposium Discrete Algorithms. The Society for Industrial and Applied Mathematics, USA, pp. 322-330 · Zbl 0801.68082
[6] Cook, S.A., 1971. The complexity of theorem proving procedures. In: Proceedings of the ACM Symposium Theory of Computing, ACM, New York, pp. 151-158.
[7] Garey, M.R.; Johnson, D.S., Computers and intractability: A guide to the theory of NP-completeness, (1979), W.H. Freeman New York · Zbl 0411.68039
[8] Geraci, F., A comparison of several algorithms for the single individual SNP haplotyping reconstruction problem, Bioinformatics, 26, 2217-2225, (2010)
[9] He, D., Optimal algorithms for haplotype assembly from whole-genome sequence data, Bioinformatics, 26, i183-i190, (2010)
[10] Kang, S.H., Hapassembler: a web server for haplotype assembly from SNP fragments using genetic algorithm, Biochem. biophys. res. commun., 397, 340-344, (2010)
[11] Kang, S.H., et al., 2008. Haplotype assembly from weighted SNP fragments and related genotype information. In: Proceedings of the Front Algorithmics Workshop, LNCS 5059, Springer-Verlag, Berlin, Heidelberg, pp. 45-54 · Zbl 1143.92323
[12] Karp, R.M., Reducibility among combinatorial problems, (), 85-103 · Zbl 0366.68041
[13] Kitzman, J.O., Haplotype-resolved genome sequencing of a gujarati Indian individual, Nat. biotechnol., 29, 59-63, (2011)
[14] Levy, S., The diploid genome sequence of an individual human, Plos biol., 5, e254, (2007)
[15] Mousavi, S.R., Effective haplotype assembly via maximum Boolean satisfiability, Biochem. biophys. res. commun., 404, 593-598, (2011)
[16] Smyth, K., et al., 2003. Iterated robust tabu search for MAX-SAT. Proc. Artif. Intell., LNCS 2671, Springer-Verlag, Berlin, Heidelberg, pp. 129-144.
[17] Tompkins, D.A.D., Hoos, H.H., 2005. UBCSAT: an implementation and experimentation environment for SLS algorithms for SAT and MAX-SAT. In: Proceedings of the Seventh International Conference on Theory and Applications of SAT, LNCS 3542, Springer-Verlag, Berlin, Heidelberg, pp. 306-320 · Zbl 1122.68620
[18] Wang, J., A practical exact algorithm for the individual haplotyping problem MEC/GI, Algorithmica, 56, 283-296, (2010) · Zbl 1191.68841
[19] Wang, Y., The haplotype assembly model with genotype information and iterative local-exhaustive search algorithm, Comput. biol. chem., 31, 288-293, (2007) · Zbl 1122.92051
[20] Xiao, W.; Oefner, P.J., Denaturing high-performance liquid chromatography: a review, Hum. mutat., 17, 439-474, (2001)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.