Modeling substitution and indel processes for AFLP marker evolution and phylogenetic inference. (English) Zbl 1160.62091

Summary: The amplified fragment length polymorphism (AFLP) method produces anonymous genetic markers from throughout a genome. We extend the nucleotide substitution model of AFLP evolution to additionally include insertion and deletion processes. The new Sub-ID model relaxes the common assumption that markers are independent and homologous. We build a Markov chain Monte Carlo methodology tailored for the Sub-ID model to implement a Bayesian approach to infer AFLP marker evolution.
The method allows us to infer both the phylogenies and the subset of markers that are possibly homologous. In addition, we can infer the genome-wide relative rate of indels versus substitutions. In a case study with AFLP markers from sedges, a grass-like plant common in North America, we find that accounting for insertion and deletion makes a difference in phylogenetic inference. The inference of topologies is not sensitive to the prior settings and the Jukes-Cantor assumption for nucleotide substitution. The model for insertion and deletion we introduce has potential value in other phylogenetic applications.


62P10 Applications of statistics to biology and medical sciences; meta analysis
92D15 Problems related to evolution
62F15 Bayesian inference
60J20 Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.)
65C40 Numerical analysis or methods applied to Markov chains


MrBayes; Bali-phy
Full Text: DOI arXiv


[1] Albertson, R. C., Markert, J. A., Danley, P. D. and Kocher, T. D. (1996). Phylogeny of a rapidly evolving clade: The cichlid fishes of Lake Malawi, East Africa. Proc. Natl. Acad. Sci. USA 96 5107-5110.
[2] Felsenstein, J. (1992). Phylogenies from restriction sites: A maximum-likelihood approach. Evolution 46 159-173.
[3] Felsenstein, J. (2004). Inferring Phylogenies 251-256. Sinauer Associates, Inc., Sunderland, MA.
[4] FlyBase (2007). http://flybase.bio.indiana.edu. Downloaded in May, 2007.
[5] Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statist. Sci. 7 457-511. · Zbl 1386.65060
[6] Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82 711-732. · Zbl 0861.62023 · doi:10.1093/biomet/82.4.711
[7] Green, P. J. (2003). Trans-dimensional Markov chain Monte Carlo. In Highly Structured Stochastic Systems (P. J. Green, N. L. Hjort and S. Richardson, eds.) 179-198. Oxford Univ. Press, Oxford.
[8] Hipp, A. L., Rothrock, P. E., Reznicek, A. A. and Berry, P. E. (2006). Chromosome number changes associated with speciation in sedges: A phylogenetic study in Carex section Ovales (Cyperaceae) using AFLP data. In Monocots : Comparative Biology and Evolution (J. T. Columbus, E. A. Friar, J. M. Porter, L. M. Prince and M. G. Simpson, eds.). Rancho Santa Ana Botanic Garden, Claremont, CA.
[9] Huelsenbeck, J. P. and Ronquist, F. (2001). MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17 754-755.
[10] Jones, C. J., Edwards, K. J., Castaglione, S., Winfield, M. O., Sala, F., van de Wiel, C., Bredemeijer, G., Vosman, B., Matthes, M., Daly, A., Brettschneider, R., Bettini, P., Buiatti, M., Maestri, E., Malcevschi, A., Marmiroli, N., Aert, R., Volckaert, G., Rueda, J., Linacero, R., Vazquez, A. and Karp, A. (1997). Reproducibility testing of RAPD, AFLP and SSR markers in plants by a network of European laboratories. Molecular Breeding 3 381-390.
[11] Jukes, T. H. and Cantor, C. R. (1969). Evolution of protein molecules. In Mammalian Protein Metabolism , Vol. 3 (H. N. Munro, ed.) 21-132. Academic Press, New York.
[12] Luo, R. (2007). Bayesian study of AFLP marker evolution in phylogenetic inference. Ph.D. thesis, Dept. of statistics, Univ. Wisconsin-Madison. · Zbl 1166.62354
[13] Luo, R., Hipp, A. and Larget, B. (2007). A Bayesian model of AFLP marker evolution and phylogenetic inference. Statist. Appl. Genet. Mol. Biol. 6 Article 11. · Zbl 1166.62354 · doi:10.2202/1544-6115.1152
[14] Luo, R. and Larget, B. (2009). Supplement to “Modeling substitution and indel processes for AFLP marker evolution and phylogenetic inference.” DOI: 10.1214/08-AOAS212SUPP. · Zbl 1160.62091 · doi:10.1214/08-AOAS212
[15] Mau, B. and Newton, M. A. (1997). Phylogenetic inference for binary data on dendograms using Markov chain Monte Carlo. J. Comput. Graph. Statist. 6 122-131.
[16] Miklós, I., Lunter, G. A. and Holmes, I. (2004). A “long indel” model for evolutionary sequence alignment. Mol. Biol. Evol. 21 529-540.
[17] NCBI (2007). http://www.ncbi.nlm.nih.gov/Genomes/. Downloaded in May, 2007.
[18] Powell, W., Morgante, M., Andre, C., Hanafey, M., Vogel, J., Tingey, S. and Rafalski, A. (1996). The comparison of RFLP, RAPD, AFLP and SSR markers for germplasm analysis. Molecular Breeding 2 225-238.
[19] Redelings, B. D. and Suchard, M. A. (2005). Joint Bayesian estimation of alignment and phylogeny. Systematic Biology 54 401-418.
[20] Tamura, K. and Nei, M. (1993). Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 10 512-526.
[21] Thorne, J. L., Kishino, H. and Felsenstein, J. (1991). An evolutionary model for maximum likelihood alignment of DNA sequences. Journal of Molecular Evolution 33 114-124.
[22] Thorne, J. L., Kishino, H. and Felsenstein, J. (1992). Inching toward reality: An improved likelihood model of sequence evolution. Journal of Molecular Evolution 34 3-16.
[23] Tierney, L. (1994). Markov chains for exploring posterior distributions. Ann. Statist. 22 1701-1728. · Zbl 0829.62080 · doi:10.1214/aos/1176325750
[24] Vos, P., Hogers, R., Bleeker, M., Reijans, M., van de Lee, T., Hornes, M., Frijters, A., Pot, J., Peleman, J., Kuiper, M. and Zabeau, M. (1995). AFLP: A new technique for DNA fingerprinting. Nucleic Acids Research 23 4407-4414.
[25] Wolfe, A. D. and Liston, A. (1998). Molecular Systematics of Plants II 43-86. Kluwer Academic, Boston, MA.
[26] Yang, Z. (1993). Maximum likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol. Biol. Evol. 10 1396-1401.
[27] Yang, Z. (1994). Maximum likelihood estimation of phylogeny from DNA sequences with variable rates over sites: Approximate methods. Journal of Computational Evolution 39 306-314.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.