zbMATH — the first resource for mathematics

Distance to the stochastic part of phylogenetic varieties. (English) Zbl 07312497
Summary: Modelling the substitution of nucleotides along a phylogenetic tree is usually done by a hidden Markov process. This allows to define a distribution of characters at the leaves of the trees and one might be able to obtain polynomial relationships among the probabilities of different characters. The study of these polynomials and the geometry of the algebraic varieties defined by them can be used to reconstruct phylogenetic trees. However, not all points in these algebraic varieties have biological sense. In this paper, we explore the extent to which adding semi-algebraic conditions arising from the restriction to parameters with statistical meaning can improve existing methods of phylogenetic reconstruction. To this end, our aim is to compute the distance of data points to algebraic varieties and to the stochastic part of these varieties. Computing these distances involves optimization by nonlinear programming algorithms. We use analytical methods to find some of these distances for quartet trees evolving under the Kimura 3-parameter or the Jukes-Cantor models. Numerical algebraic geometry and computational algebra play also a fundamental role in this paper.
92D15 Problems related to evolution
60J20 Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.)
Full Text: DOI
[1] Allman, E. S.; Degnan, J. H.; Rhodes, J. A., Species tree inference by the star method and its generalizations, J. Comput. Biol., 20, 1, 50-61 (2013)
[2] Allman, E. S.; Kubatko, L. S.; Rhodes, J. A., Split scores: a tool to quantify phylogenetic signal in genome-scale data, Syst. Biol., 66, 4, 620-636 (2017)
[3] Allman, E. S.; Rhodes, J. A., Phylogenetic invariants of the general Markov model of sequence mutation, Math. Biosci., 186, 113-144 (2003) · Zbl 1031.92019
[4] Allman, E. S.; Rhodes, J. A., Mathematical Models in Biology, an Introduction (January 2004), Cambridge University Press
[5] Allman, E. S.; Rhodes, J. A., Quartets and parameter recovery for the general Markov model of sequence mutation, Appl. Math. Res. Express, 2004, 107-132 (2004) · Zbl 1077.92015
[6] Allman, E. S.; Rhodes, J. A., Phylogenetic invariants, (Gascuel, O.; Steel, M. A., Reconstructing Evolution (2007), Oxford University Press)
[7] Allman, E. S.; Rhodes, J. A., The identifiability of covarion models in phylogenetics, IEEE/ACM Trans. Comput. Biol. Bioinform., 6, 76-88 (2009)
[8] Allman, E. S.; Rhodes, J. A.; Taylor, A., A semialgebraic description of the general Markov model on phylogenetic trees, SIAM J. Discrete Math., 28 (2012)
[9] Bosma, W.; Cannon, J.; Playoust, C., The Magma algebra system. I. The user language, Computational Algebra and Number Theory. Computational Algebra and Number Theory, London, 1993. Computational Algebra and Number Theory. Computational Algebra and Number Theory, London, 1993, J. Symb. Comput., 24, 3-4, 235-265 (1997) · Zbl 0898.68039
[10] Casanellas, M.; Fernández-Sánchez, J., Performance of a new invariants method on homogeneous and nonhomogeneous quartet trees, Mol. Biol. Evol., 24, 288-293 (2007)
[11] Casanellas, M.; Fernández-Sánchez, J., Geometry of the Kimura 3-parameter model, Adv. Appl. Math., 41, 3, 265-292 (2008) · Zbl 1222.14110
[12] Casanellas, M.; Fernández-Sánchez, J., Invariant versus classical quartet inference when evolution is heterogeneous across sites and lineages, Syst. Biol., 65, 2, 280-291 (2016)
[13] Casanellas, M.; Fernández-Sánchez, J.; Kedzierska, A., The space of phylogenetic mixtures for equivariant models, Algorithms Mol. Biol., 7, 33 (2012)
[14] Casanellas, M.; Fernández-Sánchez, J.; Michałek, M., Low degree equations for phylogenetic group-based models, Collect. Math., 66, 2, 203-225 (2015) · Zbl 1332.92040
[15] Chifman, J.; Kubatko, L., Quartet inference from SNP data under the coalescent model, Bioinformatics, 30, 23, 3317-3324 (2014)
[16] Chifman, J.; Kubatko, L., Identifiability of the unrooted species tree topology under the coalescent model with time-reversible substitution processes, site-specific rate variation, and invariable sites, J. Theor. Biol., 374, 35-47 (2015) · Zbl 1341.92047
[17] Cox, D. A.; Little, J.; O’Shea, D., Ideals, Varieties, and Algorithms: An Introduction to Computational Algebraic Geometry and Commutative Algebra, Undergraduate Texts in Mathematics (2007), Springer-Verlag: Springer-Verlag Berlin, Heidelberg · Zbl 1118.13001
[18] Draisma, J.; Horobeţ, E.; Ottaviani, G.; Sturmfels, B.; Thomas, R., The Euclidean distance degree of an algebraic variety, Found. Comput. Math., 1-51 (2015)
[19] Draisma, J.; Kuttler, J., On the ideals of equivariants tree models, Math. Ann., 344, 619-644 (2009) · Zbl 1398.62338
[20] Evans, S. N.; Speed, T. P., Invariants of some probability models used in phylogenetic inference, Ann. Stat., 21, 355-377 (1993) · Zbl 0772.92012
[21] Garrote-López, M., Distance to sthocastic phylogenetic region: repository (2019)
[22] Grayson, D. R.; Stillman, M. E., Macaulay2, a software system for research in algebraic geometry, Available at
[23] Gross, E.; Davis, B.; Ho, K. L.; Bates, D. J.; Harrington, H. A., Numerical algebraic geometry for model selection and its application to the life sciences, J. R. Soc. Interface, 13 (2016)
[24] Gross, E.; Petrovic, S.; Verschelde, J., Interfacing with PHCpack, J. Softw. Algebra Geom., 5, 20-25 (2013) · Zbl 1311.13002
[25] Jukes, T.; Cantor, C., Evolution of protein molecules, (Mammalian Protein Metabolism (1969)), 21-132
[26] Kimura, M., Estimation of evolutionary distances between homologous nucleotide sequences, Proc. Natl. Acad. Sci. USA, 78, 1, 454-458 (1981) · Zbl 0511.92013
[27] Klaere, S.; Liebscher, V., An algebraic analysis of the two state Markov model on tripod trees, Math. Biosci., 237, 1, 38-48 (2012) · Zbl 1241.92057
[28] Kosta, D.; Kaie, K., Maximum likelihood estimation of symmetric group-based models via numerical algebraic geometry, Bull. Math. Biol., 81, 2, 337-360 (2019) · Zbl 1410.92077
[29] Kreinin, A.; Sidelnikova, M., Regularization algorithms for transition matrices, Algo. Res. Q., 4, 23-40 (2001)
[30] Kück, P.; Mayer, C.; Wagele, J.-W.; Misof, B., Long branch effects distort maximum likelihood phylogenies in simulations despite selection of the correct model, PLoS ONE, 7, 10 (2012)
[31] Lake, J. A., A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony, Mol. Biol. Evol., 4, 167-191 (1987)
[32] Matsen, F. A., Fourier transform inequalities for phylogenetic trees, IEEE/ACM Trans. Comput. Biol. Bioinform., 6, 1, 89-95 (2009)
[33] Michelot, C., A finite algorithm for finding the projection of a point onto the canonical simplex of rn, J. Optim. Theory Appl., 50, 1, 195-200 (July 1986)
[34] The Sage Developers, SageMath, the sage mathematics software system (version 8.6) (2019)
[35] Verschelde, J., PHCpack: a general-purpose solver for polynomial systems by homotopy continuation, ACM Trans. Math. Softw., 25, 2, 251-276 (1999) · Zbl 0961.65047
[36] Zwiernik, P.; Smith, J. Q., Implicit inequality constraints in a binary tree model, Electron. J. Stat., 5, 1276-1312 (2011) · Zbl 1274.62355
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.