×

Fast error-tolerant quartet phylogeny algorithms. (English) Zbl 1339.92057

Giancarlo, Raffaele (ed.) et al., Combinatorial pattern matching. 22nd annual symposium, CPM 2011, Palermo, Italy, June 27–29, 2011. Proceedings. Berlin: Springer (ISBN 978-3-642-21457-8/pbk). Lecture Notes in Computer Science 6661, 147-161 (2011).
Summary: We present a quartet-based phylogeny algorithm that returns the correct topology for \(n\) taxa in \(O(n \log n)\) time with high probability, assuming each quartet is inconsistent with the true tree topology with constant probability, independent of other quartets. Our incremental algorithm relies upon a search tree structure for the phylogeny that is balanced, with high probability, no matter the true topology. In experiments, our prototype was as fast as the fastest heuristics, but because real data do not typically satisfy our probabilistic assumptions, its overall performance is not as good as our theoretical results predict.
For the entire collection see [Zbl 1216.68028].

MSC:

92D15 Problems related to evolution
92-08 Computational methods for problems pertaining to biology

Software:

FastTree; Pfam; Scoredist
PDF BibTeX XML Cite
Full Text: DOI arXiv

References:

[1] Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffiths-Jones, S., Howe, K.L., Marshall, M., Sonnhammer, E.L.L.: The Pfam protein families database. Nucleic Acids Research 30(1), 276–280 (2002) · Zbl 05434960
[2] Berry, V., Jiang, T., Kearney, P.E., Li, M., Wareham, H.T.: Quartet cleaning: Improved algorithms and simulations. In: Nešetřil, J. (ed.) ESA 1999. LNCS, vol. 1643, pp. 313–324. Springer, Heidelberg (1999) · Zbl 0943.92027
[3] Brodal, G., Fagerberg, R., Pedersen, C.: Computing the quartet distance between evolutionary trees in time O(n log n). Algorithmica 38, 377–395 (2003) · Zbl 1072.68121
[4] Bryant, D., Tsang, J., Kearney, P.E., Li, M.: Computing the quartet distance between evolutionary trees. In: Proceedings of SODA 2000, pp. 285–286 (2000) · Zbl 0956.68105
[5] Csurös, M.: Fast recovery of evolutionary trees with thousands of nodes. J. Comp. Biol. 9(2), 277–297 (2002)
[6] Daskalakis, C., Mossel, E., Roch, S.: Phylogenies without branch bounds: Contracting the short, pruning the deep. In: Batzoglou, S. (ed.) RECOMB 2009. LNCS, vol. 5541, pp. 451–465. Springer, Heidelberg (2009) · Zbl 05561084
[7] Dubhashi, D.P., Panconesi, A.: Concentration of measure for the analysis of randomized algorithms. Cambridge Univ. Press, Cambridge (2009) · Zbl 1213.60006
[8] Erdös, P.L., Steel, M.A., Székely, L.A., Warnow, T.: A few logs suffice to build (almost) all trees: Part II. Theor. Comput. Sci. 221(1-2), 77–118 (1999) · Zbl 0933.68100
[9] Feige, U., Peleg, D., Raghavan, P., Upfal, E.: Computing with unreliable information. In: Proceedings of STOC 1990, pp. 128–137. · Zbl 0813.68057
[10] Felsenstein, J.: Inferring Phylogenies. Sinauer (2001)
[11] Gronau, I., Moran, S., Snir, S.: Fast and reliable reconstruction of phylogenetic trees with very short edges. In: Proceedings of SODA 2008, pp. 379–388 (2008) · Zbl 1192.05030
[12] Jiang, T., Kearney, P., Li, M.: Orchestrating quartets: Approximation and data correction. In: Proceedings of FOCS 1998, pp. 416–425 (1998)
[13] Kannan, S.K., Lawler, E.L., Warnow, T.J.: Determining the evolutionary tree using experiments. J. Algorithms 21(1), 26–50 (1996) · Zbl 0857.68082
[14] Kao, M.Y., Lingas, A., Östlin, A.: Balanced randomized tree splitting with applications to evolutionary tree constructions. In: Meinel, C., Tison, S. (eds.) STACS 1999. LNCS, vol. 1563, pp. 184–196. Springer, Heidelberg (1999) · Zbl 0930.05090
[15] Karp, R.M., Kleinberg, R.: Noisy binary search and its applications. In: Proceedings of SODA 2007, pp. 881–890 (2007) · Zbl 1302.68107
[16] King, V., Zhang, L., Zhou, Y.: On the complexity of distance-based evolutionary tree reconstruction. In: Proceedings of SODA 2003, pp. 444–453 (2003) · Zbl 1094.68614
[17] Price, M.N., Dehal, P.S., Arkin, A.P.: FastTree: Computing large minimum evolution trees with profiles instead of a distance matrix. Molecular Biology and Evolution 26(7), 1641–1650 (2009)
[18] Ranwez, V., Gascuel, O.: Quartet-based phylogenetic inference: Improvements and limits. Molecular Biology and Evolution 18(6), 1103–1116 (2001)
[19] Snir, S., Warnow, T., Rao, S.: Short quartet puzzling: A new quartet-based phylogeny reconstruction algorithm. Journal of Computational Biology 15(1), 91–103 (2008)
[20] Sonnhammer, E.L.L., Hollich, V.: Scoredist: A simple and robust protein sequence distance estimator. BMC Bioinformatics 6, 108 (2005)
[21] Strimmer, K., von Haeseler, A.: Quartet puzzling: a quartet maximum-likelihood method for reconstructing tree topologies. Mol. Biol. E 13(7), 964–969 (1996)
[22] Wu, G., Kao, M.Y., Lin, G., You, J.H.: Reconstructing phylogenies from noisy quartets in polynomial time with a high success probability. Alg. Mol. Biol. 3 (2008)
[23] Wu, G., You, J.H., Lin, G.: Quartet-based phylogeny reconstruction with answer set programming. IEEE/ACM Trans. Comput. Biol. Bioinf. 4(1), 139–152 (2007) · Zbl 05335700
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.