×

Haplotyping populations by pure parsimony based on compatible genotypes and greedy heuristics. (English) Zbl 1217.92070

Summary: The population haplotype inference problem based on the pure parsimony criterion (HIPP) infers an \(m \times n\) genotype matrix for a population by a \(2m \times n\) haplotype matrix with the minimum number of distinct haplotypes. Previous integer programming based HIPP solution methods are time-consuming, and their practical effectiveness remains unevaluated. On the other hand, previous heuristic HIPP algorithms are efficient, but their theoretical effectiveness in terms of optimality gaps has not been evaluated, either. We propose two new heuristic HIPP algorithms (MGP and GHI) and conduct more complete computational experiments. In particular, MGP exploits the compatible relations among genotypes to solve a reduced integer linear programming problem so that a solution of good quality can be obtained very quickly; GHI exploits a weight mechanism to selects better candidate haplotypes in a greedy fashion. The computational results show that our proposed algorithms are efficient and effective, especially for solving cases with larger recombination rates.

MSC:

92D10 Genetics and epigenetics
90C59 Approximation methods and heuristics in mathematical programming
92-08 Computational methods for problems pertaining to biology
90C90 Applications of mathematical programming
90C10 Integer programming

Software:

HAPLO; RPOLY; MiniSat
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Helmuth, L., Map of the human genome 3.0, Science, 293, 583-585 (2001)
[2] Bonizzoni, P.; Vedova, G. D.; Dnodi, R., The haplotyping problem: an overview of computational models and solutions, J. Comput. Sci. Technol., 18, 675-688 (2003) · Zbl 1083.68579
[3] Qian, W. Y.; Yang, Y. J.; Yang, N. N.; Chun, L., Particle swarm optimization for SNP haplotype reconstruction problem, Appl. Math. Comput., 196, 266-272 (2008) · Zbl 1136.92015
[4] Wu, J. L.; Wang, J. X.; Chen, J. N., A practical algorithm based on particle swarm optimization for haplotype reconstruction, Appl. Math. Comput., 209, 363-372 (2009) · Zbl 1156.92030
[5] Clark, A. G., Inference of haplotypes from PCR-amplified samples of diploid populations, Mol. Biol. Evol., 7, 111-122 (1990)
[6] Gusfield, D., Inference of haplotypes from samples of diploid populations: complexity and algorithms, J. Comput. Biol., 8, 305-323 (2001)
[7] Excoffier, L.; Slatkin, M., Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population, Mol. Biol. Evol., 12, 921-927 (1995)
[8] Long, J. C.; Williams, R. C.; Urbanek, M., An E-M algorithm and testing strategy for multiple-locus haplotypes, Am. J. Hum. Genet., 56, 799-810 (1995)
[9] Hawley, M. E.; Kidd, K. K., HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes, J. Hered., 86, 409-411 (1995)
[10] Stephens, M.; Smith, N. J.; Donnelly, P., A new statistical method for haplotype reconstruction from population data, Am. J. Hum. Genet., 68, 978-989 (2001)
[11] Niu, T.; Qin, Z. S.; Xu, X.; Liu, J. S., Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms, Am. J. Hum. Genet., 70, 157-169 (2002)
[12] Stephens, M.; Donnelly, P., A comparison of Bayesian methods for haplotype reconstruction from population genotype data, Am. J. Hum. Genet., 73, 1162-1169 (2003)
[13] D. Gusfield, Haplotyping as perfect phylogeny: conceptual framework and efficient solutions, in: Annual Conference on Research in Computational Molecular Biology: Proceedings of the Sixth Annual International Conference on Computational Biology, 2002, pp. 166-175.; D. Gusfield, Haplotyping as perfect phylogeny: conceptual framework and efficient solutions, in: Annual Conference on Research in Computational Molecular Biology: Proceedings of the Sixth Annual International Conference on Computational Biology, 2002, pp. 166-175.
[14] D. Gusfield, Haplotype inference by pure parsimony, in: Combinatorial Pattern Matching: 14th Annual Symposium, 2003, pp. 144-155.; D. Gusfield, Haplotype inference by pure parsimony, in: Combinatorial Pattern Matching: 14th Annual Symposium, 2003, pp. 144-155.
[15] Drysdale, C.; McGraw, D.; Stack, C.; Stephens, J.; Judson, R.; Nandabalan, K.; Arnold, K.; Ruano, G.; Liggett, S., Complex promoter and coding region \(β_2\)-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness, Proc. Natl. Acad. Sci. USA, 97, 10483-10488 (2000)
[16] Lancia, G.; Pinotti, C. M.; Rizzi, R., Haplotyping populations by pure parsimony: complexity, exact and approximation algorithms, INFORMS J. Comput., 16, 348-359 (2004) · Zbl 1239.90076
[17] Brown, D. G.; Harrower, I. M., Integer programming approaches to haplotype inference by pure parsimony, IEEE ACM Trans. Comput. Biol., 3, 141-154 (2006)
[18] Huang, Y. T.; Chao, K. M.; Chen, T., An approximation algorithm for haplotype inference by maximum parsimony, J. Comput. Biol., 12, 1261-1274 (2005)
[19] K. Kalpakis, P. Namjoshi, Haplotype phasing using semidefinite programming, in: Proceedings of the Fifth IEEE Symposium on Bioinformatics and Bioengineering, 2005, pp. 145-152.; K. Kalpakis, P. Namjoshi, Haplotype phasing using semidefinite programming, in: Proceedings of the Fifth IEEE Symposium on Bioinformatics and Bioengineering, 2005, pp. 145-152.
[20] Li, Z.; Zhou, W.; Zhang, X.; Chen, L., A parsimonious tree-grow method for haplotype inference, Bioinformatics, 21, 3475-3481 (2005)
[21] Wang, L.; Xu, Y., Haplotype inference by maximum parsimony, Bioinformatics, 19, 1773-1780 (2003)
[22] I. Lynce, J. Marques-Silva, SAT in bioinformatics: Making the case with haplotype inference, in: International Conference on Theory and Applications of Satisfiability Testing, Seattle, USA, 2006.; I. Lynce, J. Marques-Silva, SAT in bioinformatics: Making the case with haplotype inference, in: International Conference on Theory and Applications of Satisfiability Testing, Seattle, USA, 2006.
[23] Lynce, I.; Marques-Silva, J., Haplotype inference with Boolean Satisfiability, Int. J. Artif. Intell. Trans., 17, 2, 355-387 (2008)
[24] Lynce, I.; Marques-Silva, J.; Prestwich, S., Boosting haplotype inference with local search, Constraints Int. J., 13, 1-2 (2008) · Zbl 1142.92030
[25] Graca, A.; Marques-Silva, J.; Lynce, I.; Oliveira, A., Efficient Haplotype Inference with Pseudo-Boolean Optimization, (Algebraic Biology (2007), Hagenberg: Hagenberg Austria) · Zbl 1126.92035
[26] A. Graca, J. Marques-Silva, I. Lynce, A. Oliveira, Efficient haplotype inference with combined CP and OR techniques, in: International Conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems, 2008, pp. 308-312.; A. Graca, J. Marques-Silva, I. Lynce, A. Oliveira, Efficient haplotype inference with combined CP and OR techniques, in: International Conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems, 2008, pp. 308-312.
[27] N. Eén, N. Sörensson, An extensible SAT-solver, in: International Conference on Theory and Applications of Satisfiability Testing (SAT), 2003, pp. 502-518.; N. Eén, N. Sörensson, An extensible SAT-solver, in: International Conference on Theory and Applications of Satisfiability Testing (SAT), 2003, pp. 502-518.
[28] Eén, N.; Sörensson, N., Translating pseudo-Boolean constraints into SAT, J. Satisfiability Boolean Model. Comput., 2, 1-26 (2006) · Zbl 1116.68083
[29] Hudson, R. R., Generating samples under a Wright-Fisher neutral model of genetic variation, Bioinformatics, 18, 337-338 (2002)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.