Large-scale multiple sequence alignment and phylogeny estimation. (English) Zbl 1462.92046

Chauve, Cedric (ed.) et al., Models and algorithms for genome evolution. Selected contributions based on the presentations at the MAGE conference, Montréal, Canada, August 23–26, 2013. London: Springer. Comput. Biol. 19, 85-146 (2013).
Summary: With the advent of next generation sequencing technologies, alignment and phylogeny estimation of datasets with thousands of sequences is being attempted. To address these challenges, new algorithmic approaches have been developed that have been able to provide substantial improvements over standard methods. This paper focuses on new approaches for ultra-large tree estimation, including methods for co-estimation of alignments and trees, estimating trees without needing a full sequence alignment, and phylogenetic placement. While the main focus is on methods with empirical performance advantages, we also discuss the theoretical guarantees of methods under Markov models of evolution. Finally, we include a discussion of the future of large-scale phylogenetic analysis.
For the entire collection see [Zbl 1274.92002].


92D20 Protein sequences, DNA sequences
92D15 Problems related to evolution
92-02 Research exposition (monographs, survey articles) pertaining to biology
Full Text: DOI Link


[1] Dobzhansky, T.: Nothing in biology makes sense except in the light of evolution. Am. Biol. Teach. 35, 125-129 (1973)
[2] de Chardin, P.T.: Le Phénomene Humain. Harper Perennial, New York (1959)
[3] Eisen, J.A.: Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res. 8, 163-167 (1998)
[4] Wang, L.-S., Leebens-Mack, J., Wall, K., Beckmann, K., de Pamphilis, C., et al.: The impact of protein multiple sequence alignment on phylogeny estimation. IEEE/ACM Trans. Comput. Biol. Bioinform. 8, 1108-1119 (2011)
[5] Simmons, M., Freudenstein, J.: The effects of increasing genetic distance on alignment of, and tree construction from, rDNA internal transcribed spacer sequences. Mol. Phylogenet. Evol. 26, 444-451 (2003)
[6] Liu, K., Linder, C.R., Warnow, T.: Multiple sequence alignment: a major challenge to large-scale phylogenetics. PLoS Currents: Tree of Life (2010)
[7] Hall, B.G.: Comparison of the accuracies of several phylogenetic methods using protein and DNA sequences. Mol. Evol. Biol. 22, 792-802 (2005)
[8] Kumar, S., Filipski, A.: Multiple sequence alignment: in pursuit of homologous DNA positions. Genome Res. 17, 127-135 (2007)
[9] Ogden, T., Rosenberg, M.: Multiple sequence alignment accuracy and phylogenetic inference. Syst. Biol. 55, 314-328 (2006)
[10] Liu, K., Raghavan, S., Nelesen, S., Linder, C.R., Warnow, T.: Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science 324, 1561-1564 (2009)
[11] Morrison, D.: Multiple sequence alignment for phylogenetic purposes. Aust. Syst. Bot. 19, 479-539 (2006)
[12] Graybeal, A.: Is it better to add taxa or characters to a difficult phylogenetic problem? Syst. Biol. 47, 9-17 (1998)
[13] Pollock, D., Zwickl, D., McGuire, J., Hillis, D.: Increased taxon sampling is advantageous for phylogenetic inference. Syst. Biol. 51, 664-671 (2002)
[14] Zwickl, D., Hillis, D.: Increased taxon sampling greatly reduces phylogenetic error. Syst. Biol. 51, 588-598 (2002)
[15] Hillis, D.: Inferring complex phylogenies. Nature 383, 130-131 (1996)
[16] Felsenstein, J.: Inferring Phylogenies. Sinauer Associates, Sunderland (2003)
[17] Kim, J., Warnow, T.: Tutorial on phylogenetic tree estimation. Presented at the ISMB 1999 Conference (1999). Available on-line at http://www.cs.utexas.edu/users/tandy/tutorial.ps
[18] Linder, C.R., Warnow, T.: An overview of phylogeny reconstruction. In: Aluru, S. (ed.) Handbook of Computational Molecular Biology. Chapman and Hall/CRC Computer and Information Science Series, vol. 9. CRC Press, Boca Raton (2005)
[19] Semple, C., Steel, M.: Phylogenetics. Oxford University Press, London (2003) · Zbl 1043.92026
[20] Hillis, D., Moritz, C., Mable, B. (eds.): Molecular Systematics. Sinauer Associates, Sunderland (1996)
[21] Ortuno, F., Valenzuela, O., Pomares, H., Rojas, F., Florido, J., et al.: Predicting the accuracy of multiple sequence alignment algorithms by using computational intelligent techniques. Nucleic Acids Res. 41 (2013)
[22] Whelan, S., Lin, P., Goldman, N.: Molecular phylogenetics: state-of-the-art methods for looking into the past. Trends Genet. 17, 262-272 (2001)
[23] Goldman, N., Yang, Z.: Introduction: statistical and computational challenges in molecular phylogenetics and evolution. Philos. Trans. R. Soc. Lond. B, Biol. Sci. 363, 3889-3892 (2008)
[24] Kemena, C., Notredame, C.: Upcoming challenges for multiple sequence alignment methods in the high-throughput era. Bioinformatics 25, 2455-2465 (2009)
[25] Do, C., Katoh, K.: Protein multiple sequence alignment. In: Methods in Molecular Biology: Functional Proteomics, Methods and Protocols, vol. 484, pp. 379-413. Humana Press, Clifton (2008)
[26] Mokaddem, A., Elloumi, M.: Algorithms for the alignment of biological sequences. In: Elloumi, M., Zomaya, A. (eds.) Algorithms in Computational Molecular Biology. Wiley, New York (2011). doi: 10.1002/9780470892107.ch12
[27] Pei, J.: Multiple protein sequence alignment. Curr. Opin. Struct. Biol. 18, 382-386 (2008)
[28] Sievers, F., Wilm, A., Dineen, D., Gibson, T., Karplus, K., et al.: Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega. Mol. Syst. Biol. 7 (2011)
[29] Katoh, K., Toh, H.: PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences. Bioinformatics 23(3), 372-374 (2007)
[30] Nelesen, S., Liu, K., Wang, L.S., Linder, C.R., Warnow, T.: DACTAL: divide-and-conquer trees (almost) without alignments. Bioinformatics 28, i274-i282 (2012)
[31] Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., Mcgettigan, P.A., et al.: ClustalW and ClustalX version 2.0. Bioinformatics 23, 2947-2948 (2007)
[32] Lassmann, T., Frings, O., Sonnhammer, E.: Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features. Nucleic Acids Res. 37, 858-865 (2009)
[33] Neuwald, A.: Rapid detection, classification, and accurate alignment of up to a million or more related protein sequences. Bioinformatics 25, 1869-1875 (2009)
[34] Price, M.N., Dehal, P.S., Arkin, A.P.: FastTree-2—approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010). 10.1371/journal.pone.0009490
[35] Smith, S., Beaulieu, J., Stamatakis, A., Donoghue, M.: Understanding angiosperm diversification using small and large phylogenetic trees. Am. J. Bot. 98, 404-414 (2011)
[36] Stamatakis, A.: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688-2690 (2006)
[37] Goloboff, P.A., Catalano, S.A., Mirande, J.M., Szumik, C.A., Arias, J.S., et al.: Phylogenetic analysis of 73,060 taxa corroborates major eukaryotic groups. Cladistics 25, 211-230 (2009)
[38] Goloboff, P., Farris, J., Nixon, K.: TNT, a free program for phylogenetic analysis. Cladistics 24, 774-786 (2008)
[39] Liu, K., Warnow, T., Holder, M., Nelesen, S., Yu, J., et al.: SATé-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees. Syst. Biol. 61, 90-106 (2011)
[40] Maddison, W.: Gene trees in species trees. Syst. Biol. 46, 523-536 (1997)
[41] Delsuc, F., Brinkmann, H., Philippe, H.: Phylogenomics and the reconstruction of the tree of life. Nat. Rev. Genet. 6, 361-375 (2005)
[42] Edwards, S.V.: Is a new and general theory of molecular systematics emerging? Evolution 63, 1-19 (2009)
[43] Dunn, C.W., Hejnol, A., Matus, D.Q., Pang, K., Browne, W.E., et al.: Broad phylogenomic sampling improves resolution of the animal tree of life. Nature 452, 745-749 (2008)
[44] Wu, D., Hugenholtz, P., Mavromatis, K., Pukall, R., Dalin, E., et al.: A phylogeny-driven genomic encyclopedia of bacteria and archaea. Nature 462, 1056-1060 (2009)
[45] Eisen, J., Fraser, C.: Phylogenomics: intersection of evolution and genomics. Science 300, 1706-1707 (2003)
[46] Bininda-Emonds, O. (ed.): Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life. Kluwer Academic, Dordrecht (2004) · Zbl 1060.68083
[47] Baum, B., Ragan, M.A.: The MRP method. In: Bininda-Emonds, O.R.P. (ed.) Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, pp. 17-34. Kluwer Academic, Dordrecht (2004)
[48] Chen, D., Eulenstein, O., Fernández-Baca, D., Sanderson, M.: Minimum-flip supertrees: complexity and algorithms. IEEE/ACM Trans. Comput. Biol. Bioinform. 3, 165-173 (2006)
[49] Bininda-Emonds, O.R.P.: The evolution of supertrees. Trends Ecol. Evol. 19, 315-322 (2004) · Zbl 1060.68083
[50] Snir, S., Rao, S.: Quartets MaxCut: a divide and conquer quartets algorithm. IEEE/ACM Trans. Comput. Biol. Bioinform. 7, 704-718 (2010)
[51] Steel, M., Rodrigo, A.: Maximum likelihood supertrees. Syst. Biol. 57, 243-250 (2008)
[52] Swenson, M., Suri, R., Linder, C., Warnow, T.: An experimental study of quartets MaxCut and other supertree methods. Algorithms Mol. Biol. 6(1), 7 (2011)
[53] Swenson, M., Suri, R., Linder, C., Warnow, T.: SuperFine: fast and accurate supertree estimation. Syst. Biol. 61, 214-227 (2012)
[54] Nguyen, N., Mirarab, S., Warnow, T.: MRL and SuperFine+MRL: new supertree methods. Algorithms Mol. Biol. 7(3) (2012)
[55] Than, C.V., Nakhleh, L.: Species tree inference by minimizing deep coalescences. PLoS Comput. Biol. 5 (2009)
[56] Boussau, B., Szollosi, G., Duret, L., Gouy, M., Tannier, E., et al.: Genome-scale co-estimation of species and gene trees. Genome Res. 23(2), 323-330 (2013)
[57] Degnan, J.H., Rosenberg, N.A.: Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol. Evol. 26, 332-340 (2009)
[58] Chaudhary, R., Bansal, M.S., Wehe, A., Fernández-Baca, D., Eulenstein, O.: IGTP: a software package for large-scale gene tree parsimony analysis. BMC Bioinform. 11, 574 (2010)
[59] Larget, B., Kotha, S.K., Dewey, C.N., Ané, C.: BUCKy: gene tree/species tree reconciliation with the Bayesian concordance analysis. Bioinformatics 26, 2910-2911 (2010)
[60] Yu, Y., Warnow, T., Nakhleh, L.: Algorithms for MDC-based multi-locus phylogeny inference: beyond rooted binary gene trees on single alleles. J. Comput. Biol. 18, 1543-1559 (2011)
[61] Yang, J., Warnow, T.: Fast and accurate methods for phylogenomic analyses. BMC Bioinform. 12(Suppl 9), S4 (2011). doi: 10.1186/1471-2105-12-S9-S4
[62] Liu, L., Yu, L., Edwards, S.: A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol. Biol. 10, 302 (2010)
[63] Chauve, C., Doyon, J.P., El-Mabrouk, N.: Gene family evolution by duplication, speciation, and loss. J. Comput. Biol. 15, 1043-1062 (2008)
[64] Hallett, M.T., Lagergren, J.: New algorithms for the duplication-loss model. In: Proceedings RECOMB 2000, pp. 138-146. ACM Press, New York (2000)
[65] Doyon, J.P., Chauve, C.: Branch-and-bound approach for parsimonious inference of a species tree from a set of gene family trees. Adv. Exp. Med. Biol. 696, 287-295 (2011)
[66] Ma, B., Li, M., Zhang, L.: From gene trees to species trees. SIAM J. Comput. 30, 729-752 (2000) · Zbl 0968.68057
[67] Zhang, L.: From gene trees to species trees II: species tree inference by minimizing deep coalescence events. IEEE/ACM Trans. Comput. Biol. Bioinform. 8, 1685-1691 (2011)
[68] Arvestad, L., Berglung, A.C., Lagergren, J., Sennblad, B.: Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence evolution. In: Bininda-Emonds, O. (ed.) Proc. RECOMB 2004, pp. 238-252 (2004)
[69] Sennblad, B., Lagergren, J.: Probabilistic orthology analysis. Syst. Biol. 58, 411-424 (2009)
[70] Edwards, S., Liu, L., Pearl, D.: High-resolution species trees without concatenation. Proc. Natl. Acad. Sci. USA 104, 5936-5941 (2007)
[71] Heled, J., Drummond, A.J.: Bayesian inference of species trees from multilocus data. Mol. Biol. Evol. 27, 570-580 (2010)
[72] Roch, S.: An analytical comparison of multilocus methods under the multispecies coalescent: the three-taxon case. In: Proc. Pacific Symposium on Biocomputing, vol. 18, pp. 297-306 (2013)
[73] Kopelman, N.M., Stone, L., Gascuel, O., Rosenberg, N.A.: The behavior of admixed populations in neighbor-joining inference of population trees. In: Proc. Pacific Symposium on Biocomputing, vol. 18 (2013)
[74] Degnan, J.H.: Evaluating variations on the STAR algorithm for relative efficiency and sample sizes needed to reconstruct species trees. In: Proc. Pacific Symposium on Biocomputing, vol. 18, pp. 262-272 (2013)
[75] Bayzid, M., Mirarab, S., Warnow, T.: Inferring optimal species trees under gene duplication and loss. In: Proc. Pacific Symposium on Biocomputing, vol. 18, pp. 250-261 (2013)
[76] Pei, J., Grishin, N.: PROMALS: towards accurate multiple sequence alignments of distantly related proteins. Bioinformatics 23, 802-808 (2007)
[77] Edgar, R.C., Sjölander, K.: SATCHMO: sequence alignment and tree construction using hidden Markov models. Bioinformatics 19, 1404-1411 (2003)
[78] Hagopian, R., Davidson, J., Datta, R., Jarvis, G., Sjölander, K.: SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction. Nucleic Acids Res. 38(Web Server Issue), W29-W34 (2010)
[79] O’Sullivan, O., Suhre, K., Abergel, C., Higgins, D., Notredame, C.: 3DCoffee: combining protein sequences and structure within multiple sequence alignments. J. Mol. Biol. 340, 385-395 (2004)
[80] Zhou, H., Zhou, Y.: SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures. Bioinformatics 21, 3615-3621 (2005)
[81] Deng, X., Cheng, J.: MSACompro: protein multiple sequence alignment using predicted secondary structure, solvent accessibility, and residue-residue contacts. BMC Bioinform. 12, 472 (2011)
[82] Roshan, U., Livesay, D.R.: Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 22, 2715-2721 (2006)
[83] Roshan, U., Chikkagoudar, S., Livesay, D.R.: Searching for RNA homologs within large genomic sequences using partition function posterior probabilities. BMC Bioinform. 9, 61 (2008)
[84] Do, C., Mahabhashyam, M., Brudno, M., Batzoglou, S.: PROBCONS: probabilistic consistency-based multiple sequence alignment of amino acid sequences. Software available at http://probcons.stanford.edu/download.html (2006)
[85] Nawrocki, E.P., Kolbe, D.L., Eddy, S.R.: Infernal 1.0: inference of RNA alignments. Bioinformatics 25, 1335-1337 (2009)
[86] Nawrocki, E.P.: Structural RNA homology search and alignment using covariance models. Ph.D. thesis, Washington University in Saint Louis, School of Medicine (2009)
[87] Gardner, D., Xu, W., Miranker, D., Ozer, S., Cannonne, J., et al.: An accurate scalable template-based alignment algorithm. In: Proc. International Conference on Bioinformatics and Biomedicine, 2012, pp. 237-243 (2012)
[88] Edgar, R.C.: MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinform. 5, 113 (2004)
[89] Mirarab, S., Warnow, T.: FastSP: linear-time calculation of alignment accuracy. Bioinformatics 27, 3250-3258 (2011)
[90] Blackburne, B., Whelan, S.: Measuring the distance between multiple sequence alignments. Bioinformatics 28, 495-502 (2012)
[91] Stojanovic, N., Florea, L., Riemer, C., Gumucio, D., Slightom, J., et al.: Comparison of five methods for finding conserved sequences in multiple alignments of gene regulatory regions. Nucleic Acids Res. 27, 3899-3910 (1999)
[92] Edgar, R.: Quality measures for protein alignment benchmarks. Nucleic Acids Res. 7, 2145-2153 (2010)
[93] Thompson, J.D., Plewniak, F., Poch, O.: A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res. 27, 2682-2690 (1999)
[94] Thompson, J., Plewniak, F., Poch, O.: BAliBASE: a benchmark alignments database for the evaluation of multiple sequence alignment programs. Bioinformatics 15, 87-88 (1999)
[95] Raghava, G., Searle, S.M., Audley, P.C., Barber, J.D., Barton, G.J.: Oxbench: a benchmark for evaluation of protein multiple sequence alignment accuracy. BMC Bioinform. 4, 47 (2003)
[96] Gardner, P., Wilm, A., Washietl, S.: A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res. 33, 2433-2439 (2005)
[97] Walle, I.L.V., Wyns, L.: SABmark-a benchmark for sequence alignment that covers the entire known fold space. Bioinformatics 21, 1267-1268 (2005)
[98] Carroll, H., Beckstead, W., O’Connor, T., Ebbert, M., Clement, M., et al.: DNA reference alignment benchmarks based on tertiary structure of encoded proteins. Bioinformatics 23, 2648-2649 (2007)
[99] Blazewicz, J., Formanowicz, P., Wojciechowski, P.: Some remarks on evaluating the quality of the multiple sequence alignment based on the BAliBASE benchmark. Int. J. Appl. Math. Comput. Sci. 19, 675-678 (2009)
[100] Iantomo, S., Gori, K., Goldman, N., Gil, M., Dessimoz, C.: Who watches the watchmen? An appraisal of benchmarks for multiple sequence alignment. arXiv:1211.2160 [q-bio.QM] (2012)
[101] Aniba, M., Poch, O., Thompson, J.D.: Issues in bioinformatics benchmarking: the case study of multiple sequence alignment. Nucleic Acids Res. 38, 7353-7363 (2010)
[102] Morrison, D.A.: Why would phylogeneticists ignore computerized sequence alignment? Syst. Biol. 58, 150-158 (2009)
[103] Reeck, G., de Haen, C., Teller, D., Doolitte, R., Fitch, W., et al.: “Homology” in proteins and nucleic acids: a terminology muddle and a way out of it. Cell 50, 667 (1987)
[104] Galperin, M., Koonin, E.: Divergence and convergence in enzyme evolution. J. Biol. Chem. 287, 21-28 (2012)
[105] Sjolander, K.: Getting started in structural phylogenomics. PLoS Comput. Biol. 6, e1000621 (2010)
[106] Katoh, K., Kuma, K., Miyata, T., Toh, H.: Improvement in the accuracy of multiple sequence alignment MAFFT. Genome Inf. 16, 22-33 (2005)
[107] Do, C., Mahabhashyam, M., Brudno, M., Batzoglou, S.: PROBCONS: probabilistic consistency-based multiple sequence alignment. Genome Res. 15, 330-340 (2005)
[108] Loytynoja, A., Goldman, N.: An algorithm for progressive multiple alignment of sequences with insertions. Proc. Natl. Acad. Sci. 102, 10557-10562 (2005)
[109] Nelesen, S., Liu, K., Zhao, D., Linder, C.R., Warnow, T.: The effect of the guide tree on multiple sequence alignments and subsequent phylogenetic analyses. In: Proc. Pacific Symposium on Biocomputing, vol. 13, pp. 15-24 (2008)
[110] Fletcher, W., Yang, Z.: The effect of insertions, deletions, and alignment errors on the branch-site test of positive selection. Mol. Biol. Evol. 27, 2257-2267 (2010)
[111] Penn, O., Privman, E., Landan, G., Graur, D., Pupko, T.: An alignment confidence score capturing robustness to guide tree uncertainty. Mol. Biol. Evol. 27, 1759-1767 (2010)
[112] Toth, A., Hausknecht, A., Krisai-Greilhuber, I., Papp, T., Vagvolgyi, C., et al.: Iteratively refined guide trees help improving alignment and phylogenetic inference in the mushroom family bolbitiaceae. PLoS ONE 8, e56143 (2013)
[113] Capella-Gutiérrez, S., Gabaldón, T.: Measuring guide-tree dependency of inferred gaps for progressive aligners. Bioinformatics 29(8), 1011-1017 (2013)
[114] Preusse, E., Quast, C., Knittel, K., Fuchs, B., Ludwig, W., et al.: SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res. 35, 718-796 (2007)
[115] DeSantis, T., Hugenholtz, P., Keller, K., Brodie, E., Larsen, N., et al.: NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes. Nucleic Acids Res. 34, W394-W399 (2006)
[116] Löytynoja, A., Vilella, A.J., Goldman, N.: Accurate extension of multiple sequence alignments using a phylogeny-aware graph algorithm. Bioinformatics 28, 1685-1691 (2012)
[117] Papadopoulos, J.S., Agarwala, R.: COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics 23, 1073-1079 (2007)
[118] Berger, S.A., Stamatakis, A.: Aligning short reads to reference alignments and trees. Bioinformatics 27, 2068-2075 (2011)
[119] Sievers, F., Dineen, D., Wilm, A., Higgins, D.G.: Making automated multiple alignments of very large numbers of protein sequences. Bioinformatics 29(8), 989-995 (2013)
[120] Smith, S., Beaulieu, J., Donoghue, M.: Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches. BMC Evol. Biol. 9, 37 (2009)
[121] Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406-425 (1987)
[122] Roquet, C., Thuiller, W., Lavergne, S.: Building megaphylogenies for macroecology: taking up the challenge. Ecography 36, 013-026 (2013)
[123] Steel, M.A.: Recovering a tree from the leaf colourations it generates under a Markov model. Appl. Math. Lett. 7, 19-24 (1994) · Zbl 0794.60071
[124] Evans, S., Warnow, T.: Unidentifiable divergence times in rates-across-sites models. IEEE/ACM Trans. Comput. Biol. Bioinform. 1, 130-134 (2005)
[125] Tavaré, S.: Some probabilistic and statistical problems in the analysis of DNA sequences. In: Lectures on Mathematics in the Life Sciences, vol. 17, pp. 57-86 (1986) · Zbl 0587.92015
[126] Dayhoff, M., Schwartz, R., Orcutt, B.: A model of evolutionary change in proteins. In: Dayhoff, M. (ed.) Atlas of Protein Sequence and Structure. National Biomedical Research Foundation, pp. 345-352 (1978)
[127] Lakner, C., Holder, M., Goldman, N., Naylor, G.: What’s in a likelihood? Simple models of protein evolution and the contribution of structurally viable reconstructions to the likelihood. Syst. Biol. 60, 161-174 (2011)
[128] Le, S., Gascuel, O.: An improved general amino acid replacement matrix. Mol. Biol. Evol. 25, 1307-1320 (2008)
[129] Whelan, S., Goldman, N.: A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18, 691-699 (2001)
[130] Kosiol, C., Goldman, N.: Different versions of the Dayhoff rate matrix. Mol. Biol. Evol. 22, 193-199 (2005)
[131] Thorne, J.: Models of protein sequence evolution and their applications. Curr. Opin. Genet. Dev. 10, 602-605 (2000)
[132] Thorne, J., Goldman, N.: Probabilistic models for the study of protein evolution. In: Balding, D., Bishop, M., Cannings, C. (eds.) Handbook of Statistical Genetics, pp. 209-226. Wiley, New York (2003)
[133] Adachi, J., Hasegawa, M.: Model of amino acid substitution in proteins encoded by mitochondrial DNA. J. Mol. Evol. 42, 459-468 (1996)
[134] Goldman, N., Yang, Z.: A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11, 725-736 (1994)
[135] Scherrer, M., Meyer, A., Wilke, C.: Modeling coding-sequence evolution within the context of residue solvent accessibility. BMC Evol. Biol. 12, 179 (2012)
[136] Mayrose, I., Doron-Faigenbom, A., Bacharach, E., Pupko, T.: Towards realistic codon models: among site variability and dependency of synonymous and non-synonymous rates. Bioinformatics 23, i319-i327 (2007)
[137] Abascal, F., Zardoya, R., Posada, D.: ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21, 2104-2105 (2005)
[138] Wilke, C.: Bringing molecules back into molecular evolution. PLoS Comput. Biol. 8, e1002572 (2012)
[139] Liberles, D., Teichmann, S., et al.: The inference of protein structure, protein biophysics, and molecular evolution. Protein Sci. 21, 769-785 (2012)
[140] Lopez, P., Casane, D., Philippe, H.: Heterotachy, an important process of protein evolution. Mol. Biol. Evol. 19, 1-7 (2002)
[141] Whelan, S.: Spatial and temporal heterogeneity in nucleotide sequence evolution. Mol. Biol. Evol. 25, 1683-1694 (2008)
[142] Tuffley, C., Steel, M.: Links between maximum likelihood and maximum parsimony under a simple model of site substitution. Bull. Math. Biol. 59, 581-607 (1997) · Zbl 0933.62118
[143] Steel, M.A.: Can we avoid ‘SIN’ in the house of ‘No Common Mechanism’? Syst. Biol. 60, 96-109 (2011)
[144] Lobkovsky, A., Wolf, Y., Koonin, E.: Gene frequency distributions reject a neutral model of genome evolution. Genome Biol. Evol. 5, 233-242 (2013)
[145] Galtier, N., Gouy, M.: Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Mol. Biol. Evol. 15, 871-879 (1998)
[146] Foulds, L.R., Graham, R.L.: The Steiner problem in phylogeny is NP-complete. Adv. Appl. Math. 3, 43-49 (1982) · Zbl 0489.92002
[147] Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368-376 (1981)
[148] Allman, E.S., Ané, C., Rhodes, J.: Identifiability of a Markovian model of molecular evolution with gamma-distributed rates. Adv. Appl. Probab. 40, 229-249 (2008) · Zbl 1139.60335
[149] Allman, E.S., Rhodes, J.: Identifying evolutionary trees and substitution parameters for the general Markov model with invariable sites. Math. Biosci. 211, 18-33 (2008) · Zbl 1130.92039
[150] Allman, E.S., Rhodes, J.A.: The identifiability of tree topology for phylogenetic models, including covariant and mixture models. J. Comput. Biol. 13, 1101-1113 (2006)
[151] Atteson, K.: The performance of neighbor-joining methods of phylogenetic reconstruction. Algorithmica 25, 251-278 (1999) · Zbl 0938.68747
[152] Chang, J.: Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Math. Biosci. 137, 51-73 (1996) · Zbl 1059.92504
[153] Steel, M.A.: Consistency of Bayesian inference of resolved phylogenetic trees. arXiv:1001.2864 [q-bioPE] (2010) · Zbl 1411.92223
[154] Felsenstein, J.: Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27, 401-410 (1978)
[155] Chang, J.T.: Inconsistency of evolutionary tree topology reconstruction methods when substitution rates vary across characters. Math. Biosci. 134, 189-215 (1996) · Zbl 0847.92012
[156] Matsen, F., Steel, M.: Phylogenetic mixtures on a single tree can mimic a tree of another topology. Syst. Biol. 56, 767-775 (2007)
[157] Allman, E., Rhodes, J., Sullivant, S.: When do phylogenetic mixture models mimic other phylogenetic models? Syst. Biol. 61, 1049-1059 (2012)
[158] Erdos, P., Steel, M., Szekely, L., Warnow, T.: Local quartet splits of a binary tree infer all quartet splits via one dyadic inference rule. Comput. Artif. Intell. 16, 217-227 (1997) · Zbl 0871.68145
[159] Erdos, P., Steel, M., Szekely, L., Warnow, T.: A few logs suffice to build (almost) all trees (i). Random Struct. Algorithms 14, 153-184 (1999) · Zbl 0945.60004
[160] Erdos, P., Steel, M., Szekely, L., Warnow, T.: A few logs suffice to build (almost) all trees (ii). Theor. Comput. Sci. 221, 77-118 (1999) · Zbl 0933.68100
[161] Lacey, M.R., Chang, J.T.: A signal-to-noise analysis of phylogeny estimation by neighbor-joining: insufficiency of polynomial length sequences. Math. Biosci. 199, 188-215 (2006) · Zbl 1086.92039
[162] Csürős, M., Kao, M.Y.: Recovering evolutionary trees through harmonic greedy triplets. Proc. SODA 99, 261-270 (1999) · Zbl 0934.68106
[163] Csurös, M.: Fast recovery of evolutionary trees with thousands of nodes. J. Comput. Biol. 9, 277-297 (2002)
[164] Huson, D., Nettles, S., Warnow, T.: Disk-covering, a fast converging method for phylogenetic tree reconstruction. J. Comput. Biol. 6, 369-386 (1999)
[165] Steel, M.A., Székely, L.A.: Inverting random functions. Ann. Comb. 3, 103-113 (1999) · Zbl 0966.92018
[166] Steel, M.A., Székely, L.A.: Inverting random functions—II: explicit bounds for discrete maximum likelihood estimation, with applications. SIAM J. Discrete Math. 15, 562-575 (2002) · Zbl 1055.62522
[167] King, V., Zhang, L., Zhou, Y.: On the complexity of distance-based evolutionary tree reconstruction. In: SODA: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, pp. 444-453 (2003) · Zbl 1094.68614
[168] Mossel, E., Roch, S.: Learning nonsingular phylogenies and hidden Markov models. In: Proc. 37th Symp. on the Theory of Computing (STOC’05), pp. 366-376 (2005) · Zbl 1192.68394
[169] Mossel, E., Roch, S.: Learning nonsingular phylogenies and hidden Markov models. Ann. Appl. Probab. 16, 538-614 (2006) · Zbl 1137.60034
[170] Daskalakis, C., Mossel, E., Roch, S.: Optimal phylogenetic reconstruction. In: STOC’06: Proceedings of the 38th Annual ACM Symposium on Theory of Computing, pp. 159-168 (2006) · Zbl 1301.92054
[171] Daskalakis, C., Hill, C., Jaffe, A., Mihaescu, R., Mossel, E., et al.: Maximal accurate forests from distance matrices. In: RECOMB, pp. 281-295 (2006) · Zbl 1215.92048
[172] Mossel, E.: Distorted metrics on trees and phylogenetic forests. IEEE/ACM Trans. Comput. Biol. Bioinform. 4, 108-116 (2007)
[173] Gronau, I., Moran, S., Snir, S.: Fast and reliable reconstruction of phylogenetic trees with very short edges. In: SODA (ACM/SIAM Symp. Disc. Alg), pp. 379-388 (2008) · Zbl 1192.05030
[174] Roch, S.: Sequence-length requirement for distance-based phylogeny reconstruction: breaking the polynomial barrier. In: FOCS (Foundations of Computer Science), pp. 729-738 (2008)
[175] Daskalakis, C., Mossel, E., Roch, S.: Phylogenies without branch bounds: contracting the short, pruning the deep. In: RECOMB, pp. 451-465 (2009) · Zbl 1227.92042
[176] Lin, Y., Rajan, V., Moret, B.: A metric for phylogenetic trees based on matching. IEEE/ACM Trans. Comput. Biol. Bioinform. 9, 1014-1022 (2012)
[177] Rannala, B., Huelsenbeck, J., Yang, Z., Nielsen, R.: Taxon sampling and the accuracy of large phylogenies. Syst. Biol. 47, 702-710 (1998)
[178] Robinson, D., Foulds, L.: Comparison of phylogenetic trees. Math. Biosci. 53, 131-147 (1981) · Zbl 0451.92006
[179] Huelsenbeck, J., Hillis, D.: Success of phylogenetic methods in the four-taxon case. Syst. Biol. 42, 247-265 (1993)
[180] Hillis, D.: Taxonomic sampling, phylogenetic accuracy, and investigator bias. Syst. Biol. 47, 3-8 (1998)
[181] Nakhleh, L., Moret, B., Roshan, U., St John, K., Sun, J., et al.: The accuracy of fast phylogenetic methods for large datasets. In: Proc. 7th Pacific Symposium on BioComputing, pp. 211-222. World Scientific, Singapore (2002)
[182] Zwickl, D.J., Hillis, D.M.: Increased taxon sampling greatly reduces phylogenetic error. Syst. Biol. 51, 588-598 (2002)
[183] Pollock, D.D., Zwickl, D.J., McGuire, J.A., Hillis, D.M.: Increased taxon sampling is advantageous for phylogenetic inference. Syst. Biol. 51, 664-671 (2002)
[184] Wiens, J.: Missing data and the design of phylogenetic analyses. J. Biomed. Inform. 39, 36-42 (2006)
[185] Lemmon, A., Brown, J., Stanger-Hall, K., Lemmon, E.: The effect of ambiguous data on phylogenetic estimates obtained by maximum-likelihood and Bayesian inference. Syst. Biol. 58, 130-145 (2009)
[186] Wiens, J., Morrill, M.: Missing data in phylogenetic analysis: reconciling results from simulations and empirical data. Syst. Biol. 60, 719-731 (2011)
[187] Simmons, M.: Misleading results of likelihood-based phylogenetic analyses in the presence of missing data. Cladistics 28, 208-222 (2012)
[188] Moret, B., Roshan, U., Warnow, T.: Sequence-length requirements for phylogenetic methods. In: Guigo, R., Gusfield, D. (eds.) Proc. 2nd International Workshop on Algorithms in Bioinformatics. Lecture Notes in Computer Science, vol. 2452, pp. 343-356. Springer, Berlin (2002) · Zbl 1016.68610
[189] Gascuel, O.: BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol. Biol. Evol. 14, 685-695 (1997)
[190] Bruno, W.J., Socci, N.D., Halpern, A.L.: Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction. Mol. Biol. Evol. 17, 189-197 (2000)
[191] Wheeler, T.: Large-scale neighbor-joining with NINJA. In: Proc. Workshop Algorithms in Bioinformatics (WABI), vol. 5724, pp. 375-389 (2009)
[192] Desper, R., Gascuel, O.: Fast and accurate phylogeny reconstruction algorithm based on the minimum-evolution principle. J. Comput. Biol. 9, 687-705 (2002) · Zbl 1016.68692
[193] Price, M., Dehal, P., Arkin, A.: FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 7, 1641-1650 (2009)
[194] Brown, D., Truszkowski, J.: Towards a practical O(nlogn) phylogeny algorithm. In: Proc. Workshop Algorithms in Bioinformatics (WABI), pp. 14-25 (2011)
[195] Rice, K., Warnow, T.: Parsimony is hard to beat! In: Jiang, T., Lee, D. (eds.) Proceedings, Third Annual International Conference of Computing and Combinatorics (COCOON), pp. 124-133 (1997)
[196] Hillis, D., Huelsenbeck, J., Swofford, D.: Hobgoblin of phylogenetics. Nature 369, 363-364 (1994)
[197] Swofford, D.: PAUP*: Phylogenetic Analysis Using Parsimony (and Other Methods), Version 4.0. Sinauer Associates, Sunderland (1996)
[198] Roch, S.: A short proof that phylogenetic tree reconstruction by maximum likelihood is hard. IEEE/ACM Trans. Comput. Biol. Bioinform. 3, 92-94 (2006)
[199] Guindon, S., Gascuel, O.: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52, 696-704 (2003)
[200] Zwickl, D.: Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. Ph.D. thesis, The University of Texas at Austin (2006)
[201] Liu, K., Linder, C., Warnow, T.: RAxML and FastTree: comparing two methods for large-scale maximum likelihood phylogeny estimation PLoS ONE 6, e27731 (2012).
[202] Claesson, M.J., Cusack, S., O’Sullivan, O., Greene-Diniz, R., de Weerd, H., et al.: Composition, variability, and temporal stability of the intestinal microbiota of the elderly. Proc. Natl. Acad. Sci. 108, 4586-4591 (2011)
[203] McDonald, D., Price, M.N., Goodrich, J., Nawrocki, E.P., DeSantis, T.Z., et al.: An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J. 6, 610-618 (2012)
[204] Boussau, B., Guoy, M.: Efficient likelihood computations with non-reversible models of evolution. Syst. Biol. 55, 756-768 (2006)
[205] Whelan, S., Money, D.: The prevalence of multifurcations in tree-space and their implications for tree-search. Mol. Biol. Evol. 27, 2674-2677 (2010)
[206] Whelan, S., Money, D.: Characterizing the phylogenetic tree-search problem. Syst. Biol. 61, 228-239 (2012)
[207] Ronquist, F., Huelsenbeck, J.: MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572-1574 (2003)
[208] Drummond, A., Rambaut, A.: BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214 (2007)
[209] Lartillot, N., Philippe, H.: A Bayesian mixture model for across-site heterogeneities in the amino acid replacement process. Mol. Biol. Evol. 21 (2004)
[210] Foster, P.: Modeling compositional heterogeneity. Syst. Biol. 53, 485-495 (2004)
[211] Pagel, M., Meade, A.: A phylogenetic mixture model for detecting pattern heterogeneity in gene sequence or character state data. Syst. Biol. 53, 571-581 (2004)
[212] Huelsenbeck, J., Ronquist, R.: MrBayes: Bayesian inference of phylogeny. Bioinformatics 17, 754-755 (2001)
[213] Ronquist, F., Deans, A.: Bayesian phylogenetics and its influence on insect systematics. Annu. Rev. Entomol. 55, 189-206 (2010)
[214] Huelsenbeck, J.P., Ronquist, F., Nielsen, R., Bollback, J.P.: Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294, 2310-2314 (2001)
[215] Holder, M., Lewis, P.: Phylogeny estimation: traditional and Bayesian approaches. Nat. Rev. Genet. 4, 275-284 (2003)
[216] Lewis, P., Holder, M., Holsinger, K.: Polytomies and Bayesian phylogenetic inference. Syst. Biol. 54, 241-253 (2005)
[217] Ganapathy, G., Ramachandran, V., Warnow, T.: On contract-and-refine-transformations between phylogenetic trees. In: ACM/SIAM Symposium on Discrete Algorithms (SODA’04), pp. 893-902. SIAM Press, Philadelphia (2004) · Zbl 1318.92037
[218] Ganapathy, G., Ramachandran, V., Warnow, T.: Better hill-climbing searches for parsimony. In: Proceedings of the Third International Workshop on Algorithms in Bioinformatics (WABI), pp. 245-258 (2003)
[219] Bonet, M., Steel, M., Warnow, T., Yooseph, S.: Faster algorithms for solving parsimony and compatibility. J. Comput. Biol. 5, 409-422 (1999)
[220] Nixon, K.C.: The parsimony ratchet, a new method for rapid parsimony analysis. Cladistics 15, 407-414 (1999)
[221] Vos, R.: Accelerated likelihood surface exploration: the likelihood ratchet. Syst. Biol. 52, 368-373 (2003)
[222] Warnow, T., Moret, B.M.E., St John, K.: Absolute phylogeny: true trees from short sequences. In: Proc. 12th Ann. ACM/SIAM Symp. on Discr. Algs., SODA01, pp. 186-195. SIAM Press, Philadelphia (2001) · Zbl 0982.92026
[223] Nakhleh, L., Roshan, U., St John, K., Sun, J., Warnow, T.: Designing fast converging phylogenetic methods. Bioinformatics 17, 190-198 (2001)
[224] Warnow, T.: Large-scale phylogenetic reconstruction. In: Aluru, S. (ed.) Handbook of Computational Molecular Biology. Chapman and Hall/CRC Computer and Information Science Series, vol. 9. CRC Press, Boca Raton (2005)
[225] Roshan, U., Moret, B., Williams, T., Warnow, T.: Rec-I-DCM3: a fast algorithmic technique for reconstructing large phylogenetic trees. In: Proc. 3rd Computational Systems Biology Conf. (CSB’05). Proceedings of the IEEE, pp. 98-109 (2004)
[226] Steel, M.: The maximum likelihood point for a phylogenetic tree is not unique. Syst. Biol. 43, 560-564 (1994)
[227] Blair, C., Murphy, R.: Recent trends in molecular phylogenetic analysis: where to next? J. Heredity 102, 130-138 (2011)
[228] Nagy, L., Kocsube, S., Csanadi, Z., Kovacs, G., Petkovits, T., et al.: Re-mind the gap! Insertion and deletion data reveal neglected phylogenetic potential of the nuclear ribosomal internal transcribed spacer (its) of fungi. PLoS ONE 7, e49794 (2012).
[229] Barriel, V.: Molecular phylogenies and nucleotide insertion-deletions. C. R. Acad. Sci. III 7, 693-701 (1994)
[230] Young, N., Healy, J.: GapCoder automates the use of indel characters in phylogenetic analysis. BMC Bioinform. 4 (2003)
[231] Muller, K.: Incorporating information from length-mutational events into phylogenetic analysis. Mol. Phylogenet. Evol. 38, 667-676 (2006)
[232] Ogden, T., Rosenberg, M.: How should gaps be treated in parsimony? A comparison of approaches using simulation. Mol. Phylogenet. Evol. 42, 817-826 (2007)
[233] Dwivedi, B., Gadagkar, S.: Phylogenetic inference under varying proportions of indel-induced alignment gaps. BMC Evol. Biol. 9, 211 (2009)
[234] Dessimoz, C., Gil, M.: Phylogenetic assessment of alignments reveals neglected tree signal in gaps. Genome Biol. 11, R37 (2010)
[235] Yuri, T., Kimball, R.T., Harshman, J., Bowie, R.C.K., Braun, M.J., et al.: Parsimony and model-based analyses of indel in avian nuclear genes reveal congruent and incongruent phylogenetic signals. Biology 2, 419-444 (2013)
[236] Warnow, T.: Standard maximum likelihood analyses of alignments with gaps can be statistically inconsistent. PLoS Currents Tree of Life (2012)
[237] Daskalakis, C., Roch, S.: Alignment-free phylogenetic reconstruction. In: Berger, B. (ed.) Proc. RECOMB 2010. Lecture Notes in Computer Science, vol. 6044, pp. 123-137. Springer, Berlin (2010). http://dx.doi.org/10.1007/978-3-642-12683-3_9
[238] Thatte, B.: Invertibility of the TKF model of sequence evolution. Math. Biosci. 200, 58-75 (2006) · Zbl 1086.92042
[239] Hartmann, S., Vision, T.: Using ESTs for phylogenomics: can one accurately infer a phylogenetic tree from a Gappy alignment? BMC Evol. Biol. 8, 95 (2008)
[240] Mirarab, S., Nguyen, N., Warnow, T.: SEPP: SATé-enabled phylogenetic placement. In: Pacific Symposium on Biocomputing, pp. 247-258 (2012)
[241] Matsen, F.A., Kodner, R.B., Armbrust, E.V.: pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinform. 11, 538 (2010)
[242] Berger, S.A., Krompass, D., Stamatakis, A.: Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood. Syst. Biol. 60, 291-302 (2011)
[243] Eddy, S.: A new generation of homology search tools based on probabilistic inference. Genome Inform. 23, 205-211 (2009)
[244] Finn, R., Clements, J., Eddy, S.: HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39, W29-W37 (2011)
[245] Brown, D.G., Truskowski, J.: LSHPlace: fast phylogenetic placement using locality-sensitive hashing. In: Pacific Symposium on Biocomputing, vol. 18, pp. 310-319 (2013)
[246] Stark, M., Berger, S., Stamatakis, A., von Mering, C.: MLTreeMap—accurate maximum likelihood placement of environmental DNA sequences into taxonomic and functional reference phylogenies. BMC Genomics 11, 461 (2010)
[247] Droge, J., McHardy, A.: Taxonomic binning of metagenome samples generated by next-generation sequencing technologies. Brief. Bioinform. (2012)
[248] Giribet, G.: Exploring the behavior of POY, a program for direct optimization of molecular data. Cladistics 17, S60-S70 (2001)
[249] Hartigan, J.: Minimum mutation fits to a given tree. Biometrics 29, 53-65 (1973)
[250] Sankoff, D.: Minimal mutation trees of sequences. SIAM J. Appl. Math. 28, 35-42 (1975) · Zbl 0315.05101
[251] Sankoff, D., Cedergren, R.J.: Simultaneous comparison of three or more sequences related by a tree. In: Sankoff, D., Kruskall, J.B. (eds.) Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison, pp. 253-263. Addison Wesley, New York (1993)
[252] Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. J. Comput. Biol. 1, 337-348 (1994)
[253] Wang, L., Jiang, T., Lawler, E.: Approximation algorithms for tree alignment with a given phylogeny. Algorithmica 16, 302-315 (1996) · Zbl 0862.68119
[254] Wang, L., Gusfield, D.: Improved approximation algorithms for tree alignment. J. Algorithms 25(2), 255-273 (1997) · Zbl 0895.68061
[255] Wang, L., Jiang, T., Gusfield, D.: A more efficient approximation scheme for tree alignment. SIAM J. Comput. 30(1), 283-299 (2000) · Zbl 0965.05034
[256] Liu, K., Warnow, T.: Treelength optimization for phylogeny estimation. PLoS ONE 7, e33104 (2012)
[257] Varón, A., Vinh, L., Bomash, I., Wheeler, W.: POY software. Documentation by Varon, A., Vinh, L.S., Bomash, I., Wheeler, W., Pickett, K., Temkin, I., Faivovich, J., Grant, T., Smith, W.L. Available for download at http://research.amnh.org/scicomp/projects/poy.php (2007)
[258] Kjer, K., Gillespie, J., Ober, K.: Opinions on multiple sequence alignment, and an empirical comparison on repeatability and accuracy between POY and structural alignment. Syst. Biol. 56, 133-146 (2007)
[259] Ogden, T.H., Rosenberg, M.: Alignment and topological accuracy of the direct optimization approach via POY and traditional phylogenetics via ClustalW+PAUP*. Syst. Biol. 56, 182-193 (2007)
[260] Yoshizawa, K.: Direct optimization overly optimizes data. Syst. Entomol. 35, 199-206 (2010)
[261] Wheeler, W., Giribet, G.: Phylogenetic hypotheses and the utility of multiple sequence alignment. In: Rosenberg, M. (ed.) Sequence Alignment: Methods, Models, Concepts and Strategies, pp. 95-104. University of California Press, Berkeley (2009)
[262] Lehtonen, S.: Phylogeny estimation and alignment via POY versus clustal + PAUP*: a response to Ogden and Rosenberg. Syst. Biol. 57, 653-657 (2008)
[263] Liu, K., Nelesen, S., Raghavan, S., Linder, C., Warnow, T.: Barking up the wrong treelength: the impact of gap penalty on alignment and tree accuracy. IEEE/ACM Trans. Comput. Biol. Bioinform. 6, 7-21 (2009)
[264] Gu, X., Li, W.H.: The size distribution of insertions and deletions in human and rodent pseudogenes suggests the logarithmic gap penalty for sequence alignment. J. Mol. Evol. 40, 464-473 (1995)
[265] Altschul, S.F.: Generalized affine gap costs for protein sequence alignment. Proteins, Struct. Funct. Genomics 32, 88-96 (1998)
[266] Gill, O., Zhou, Y., Mishra, B.: Aligning sequences with non-affine gap penalty: PLAINS algorithm, a practical implementation, and its biological applications in comparative genomics. In: Proc. ICBA 2004 (2004)
[267] Qian, B., Goldstein, R.: Distribution of indel lengths. Proteins 45, 102-104 (2001)
[268] Chang, M., Benner, S.: Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments. J. Mol. Biol. 341, 617-631 (2004)
[269] Thorne, J.L., Kishino, H., Felsenstein, J.: An evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol. 33, 114-124 (1991)
[270] Thorne, J.L., Kishino, H., Felsenstein, J.: Inching toward reality: an improved likelihood model of sequence evolution. J. Mol. Evol. 34, 3-16 (1992)
[271] Thorne, J.L., Kishino, H., Felsenstein, J.: Erratum, an evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol. 34, 91-92 (1992)
[272] Rivas, E.: Evolutionary models for insertions and deletions in a probabilistic modeling framework. BMC Bioinform. 6, 30 (2005)
[273] Rivas, E., Eddy, S.: Probabilistic phylogenetic inference with insertions and deletions. PLoS Comput. Biol. 4, e1000172 (2008)
[274] Holmes, I., Bruno, W.J.: Evolutionary HMMs: a Bayesian approach to multiple alignment. Bioinformatics 17, 803-820 (2001)
[275] Miklós, I., Lunter, G.A., Holmes, I.: A “long indel model” for evolutionary sequence alignment. Mol. Biol. Evol. 21, 529-540 (2004)
[276] Redelings, B., Suchard, M.: Joint Bayesian estimation of alignment and phylogeny. Syst. Biol. 54, 401-418 (2005)
[277] Suchard, M.A., Redelings, B.D.: BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny. Bioinformatics 22, 2047-2048 (2006)
[278] Redelings, B., Suchard, M.: Incorporating indel information into phylogeny estimation for rapidly emerging pathogens. BMC Evol. Biol. 7, 40 (2007)
[279] Fleissner, R., Metzler, D., von Haeseler, A.: Simultaneous statistical multiple alignment and phylogeny reconstruction. Syst. Biol. 54, 548-561 (2005)
[280] Novák, A., Miklós, I., Lyngso, R., Hein, J.: StatAlign: an extendable software package for joint Bayesian estimation of alignments and evolutionary trees. Bioinformatics 24, 2403-2404 (2008)
[281] Lunter, G.A., Miklos, I., Song, Y.S., Hein, J.: An efficient algorithm for statistical multiple alignment on arbitrary phylogenetic trees. J. Comput. Biol. 10, 869-889 (2003)
[282] Lunter, G., Miklós, I., Drummond, A., Jensen, J.L., Hein, J.: Bayesian phylogenetic inference under a statistical indel model. In: Benson, G., Page, R. (eds.) Third International Workshop (WABI 2003). Lecture Notes in Bioinformatics vol. 2812, pp. 228-244. Springer, Berlin (2003)
[283] Lunter, G., Drummond, A., Miklós, I., Hein, J.: Statistical alignment: recent progress, new applications, and challenges. In: Nielsen, R. (ed.) Statistical Methods in Molecular Evolution (Statistics for Biology and Health), pp. 375-406. Springer, Berlin (2005)
[284] Metzler, D.: Statistical alignment based on fragment insertion and deletion models. Bioinformatics 19, 490-499 (2003)
[285] Miklós, I.: Algorithm for statistical alignment of sequences derived from a Poisson sequence length distribution. Discrete Appl. Math. 127, 79-84 (2003) · Zbl 1014.92027
[286] Arunapuram, P., Edvardsson, I., Golden, M., Anderson, J., Novak, A., et al.: StatAlign 2.0: combining statistical alignment with RNA secondary structure prediction. Bioinformatics 29(5), 654-655 (2013)
[287] Lunter, G., Miklós, I., Drummond, A., Jensen, J.L., Hein, J.: Bayesian coestimation of phylogeny and sequence alignment. BMC Bioinform. 6, 83 (2005)
[288] Bouchard-Côté, A., Jordan, M.I.: Evolutionary inference via the Poisson indel process. Proc. Natl. Acad. Sci. 110, 1160-1166 (2013)
[289] Brown, D., Krishnamurthy, N., Sjolander, K.: Automated protein subfamily identification and classification. PLoS Comput. Biol. 3, e160 (2007)
[290] Vinga, S., Almeida, J.: Alignment-free sequence comparison—a review. Bioinformatics 19, 513-523 (2003)
[291] Chan, C., Ragan, M.: Next-generation phylogenomics. Biol. Direct 8 (2013)
[292] Blaisdell, B.: A measure of the similarity of sets of sequences not requiring sequence alignment. Proc. Natl. Acad. Sci. USA 83, 5155-5159 (1986) · Zbl 0592.92011
[293] Sims, G., Jun, S.R., Wu, G., Kim, S.H.: Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proc. Natl. Acad. Sci. USA 106, 2677-2682 (2009)
[294] Jun, S.R., Sims, G., Wu, G., Kim, S.H.: Whole-proteome phylogeny of prokaryotes by feature frequency profiles: an alignment-free method with optimal feature resolution. Proc. Natl. Acad. Sci. USA 107, 133-138 (2010)
[295] Liu, X., Wan, L., Li, J., Reinert, G., Waterman, M., et al.: New powerful statistics for alignment-free sequence comparison under a pattern transfer model. J. Theor. Biol. 284, 106-116 (2011) · Zbl 1397.92459
[296] Yang, K., Zhang, L.: Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction. Nucleic Acids Res. 36, e33 (2008)
[297] Roshan, U., Moret, B.M.E., Williams, T.L., Warnow, T.: Performance of supertree methods on various dataset decompositions. In: Bininda-Emonds, O.R.P. (ed.) Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, pp. 301-328. Kluwer Academic, Dordrecht (2004)
[298] Nelesen, S.: Improved methods for phylogenetics. Ph.D. thesis, The University of Texas at Austin (2009)
[299] Swenson, M.: Phylogenetic supertree methods. Ph.D. thesis, The University of Texas at Austin (2008)
[300] Neves, D., Warnow, T., Sobral, J., Pingali, K.: Parallelizing SuperFine. In: 27th Symposium on Applied Computing (ACM-SAC) (2012)
[301] Cannone, J., Subramanian, S., Schnare, M., Collett, J., D’Souza, L., et al.: The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron and other RNAs. BMC Bioinform. 3 (2002)
[302] Roch, S.: Towards extracting all phylogenetic information from matrices of evolutionary distances. Science 327, 1376-1379 (2010) · Zbl 1226.92058
[303] Darling, A., Mau, B., Blatter, F., Perna, N.: Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 14, 1394-1403 (2004)
[304] Darling, A., Mau, B., Perna, N.: progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS ONE 5, e11147 (2010)
[305] Raphael, B., Zhi, D., Tang, H., Pevzner, P.: A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Res. 14, 2336-2346 (2004)
[306] Dubchak, I., Poliakov, A., Kislyuk, A., Brudno, M.: Multiple whole-genome alignments without a reference organism. Genome Res. 19, 682-689 (2009)
[307] Brudno, M., Do, C., Cooper, G., Kim, M., Davydov, E., et al.: LAGAN and multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13, 721-731 (2003)
[308] Phuong, T., Do, C., Edgar, R., Batzoglou, S.: Multiple alignment of protein sequences with repeats and rearrangements. Nucleic Acids Res. 34, 5932-5942 (2006)
[309] Paten, B., Earl, D., Nguyen, N., Diekhans, M., Zerbino, D., et al.: Cactus: algorithms for genome multiple sequence alignment. Genome Res. 21, 1512-1528 (2011)
[310] Angiuoli, S., Salzberg, S.: Mugsy: fast multiple alignment of closely related whole genomes. Bioinformatics (2011). 10.1093/bioinformatics/btq665
[311] Agren, J., Sundstrom, A., Hafstrom, T., Segerman, B.: Gegenees: fragmented alignment of multiple genomes for determining phylogenomic distances and genetic signatures unique for specified target groups. PLoS ONE 7, e39107 (2012)
[312] Gogarten, J., Doolittle, W., Lawrence, J.: Prokaryotic evolution in light of gene transfer. Mol. Biol. Evol. 19, 2226-2238 (2002)
[313] Gogarten, J., Townsend, J.: Horizontal gene transfer, genome innovation and evolution. Nat. Rev. Microbiol. 3, 679-687 (2005)
[314] Bergthorsson, U., Richardson, A., Young, G., Goertzen, L., Palmer, J.: Massive horizontal transfer of mitochondrial genes from diverse land plant donors to basal angiosperm Amborella. Proc. Natl. Acad. Sci. USA 101, 17,747-17,752 (2004)
[315] Bergthorsson, U., Adams, K., Thomason, B., Palmer, J.: Widespread horizontal transfer of mitochondrial genes in flowering plants. Nature 424, 197-201 (2003)
[316] Wolf, Y., Rogozin, I., Grishin, N., Koonin, E.: Genome trees and the tree of life. Trends Genet. 18, 472-478 (2002)
[317] Koonin, E., Makarova, K., Aravind, L.: Horizontal gene transfer in prokaryotes: quantification and classification. Annu. Rev. Microbiol. 55, 709-742 (2001)
[318] Linder, C., Rieseberg, L.: Reconstructing patterns of reticulate evolution in plants. Am. J. Bot. 91, 1700-1708 (2004)
[319] Sessa, E., Zimmer, E., Givnish, T.: Reticulate evolution on a global scale: a nuclear phylogeny for New World Dryopteris(Dryopteridaceae). Mol. Phylogenet. Evol. 64, 563-581 (2012)
[320] Moody, M., Rieseberg, L.: Sorting through the chaff, nDNA gene trees for phylogenetic inference and hybrid identification of annual sunflowers Helianthus. Mol. Phylogenet. Evol. 64, 145-155 (2012) (sect. Helianthus)
[321] Mindell, D.: The tree of life: metaphor, model, and heuristic device. Syst. Biol. 62(3), 479-489 (2013)
[322] Warnow, T., Evans, S., Ringe, D., Nakhleh, L.: A stochastic model of language evolution that incorporates homoplasy and borrowing. In: Phylogenetic Methods and the Prehistory of Languages, pp. 75-90. Cambridge University Press, Cambridge (2006)
[323] Nakhleh, L., Ringe, D.A., Warnow, T.: Perfect phylogenetic networks: a new methodology for reconstructing the evolutionary history of natural languages. Language 81, 382-420 (2005)
[324] Huson, D., Rupp, R., Scornovacca, C.: Phylogenetic Networks: Concepts, Algorithms and Applications. Cambridge University Press, Cambridge (2010)
[325] Morrison, D.: Introduction to Phylogenetic Networks. RJR Productions, Uppsala (2011)
[326] Nakhleh, L.: Evolutionary phylogenetic networks: models and issues. In: Problem Solving Handbook in Computational Biology and Bioinformatics, pp. 125-158. Springer, Berlin (2011)
[327] van Iersel, L., Kelk, S., Rupp, R., Huson, D.: Phylogenetic networks do not need to be complex: using fewer reticulations to represent conflicting clusters. Bioinformatics 26, i124-i131 (2010)
[328] Wu, Y.: An algorithm for constructing parsimonious hybridization networks with multiple phylogenetic trees. In: Proc. RECOMB (2013)
[329] Jin, G., Nakhleh, L., Snir, S., Tuller, T.: Maximum likelihood of phylogenetic networks. Bioinformatics 22, 2604-2611 (2006)
[330] Jin, G., Nakhleh, L., Snir, S., Tuller, T.: Inferring phylogenetic networks by the maximum parsimony criterion: a case study. Mol. Biol. Evol. 24, 324-337 (2007)
[331] Nakhleh, L., Warnow, T., Linder, C.: Reconstructing reticulate evolution in species—theory and practice. In: Proc. 8th Conf. Comput. Mol. Biol. (RECOMB’04), pp. 337-346. ACM Press, New York (2004)
[332] Nakhleh, L., Ruths, D., Wang, L.S.: RIATA-HGT: a fast and accurate heuristic for reconstructing horizontal gene transfer. In: Proc. 11th Conf. Computing and Combinatorics (COCOON’05). Lecture Notes in Computer Science. Springer, Berlin (2005) · Zbl 1128.92321
[333] Yu, Y., Than, C., Degnan, J., Nakhleh, L.: Coalescent histories on phylogenetic networks and detection of hybridization despite incomplete lineage sorting. Syst. Biol. 60, 138-149 (2011)
[334] Lapierre, P., Lasek-Nesselquist, E., Gogarten, J.: The impact of HGT on phylogenomic reconstruction methods. Brief. Bioinform. (2012). 10.1093/bib/bbs050
[335] Roch, S., Snir, S.: Recovering the tree-like trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. In: Proceedings RECOMB 2012 (2012)
[336] Gerard, D., Gibbs, H., Kubatko, L.: Estimating hybridization in the presence of coalescence using phylogenetic intraspecific sampling. BMC Evol. Biol. 11, 291 (2011)
[337] Yu, Y., Degnan, J., Nakhleh, L.: The probability of a gene tree topology within a phylogenetic network with applications to hybridization detection. PLoS Genet. 8, e1002660 (2012)
[338] Chowdhury, R., Ramachandran, V.: Cache-oblivious dynamic programming. In: Proc. ACM-SIAM Symposium on Discrete Algorithms (SODA), pp · Zbl 1192.90241
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.