×

Lie Markov models with purine/pyrimidine symmetry. (English) Zbl 1339.60111

Summary: Continuous-time Markov chains are a standard tool in phylogenetic inference. If homogeneity is assumed, the chain is formulated by specifying time-independent rates of substitutions between states in the chain. In applications, there are usually extra constraints on the rates, depending on the situation. If a model is formulated in this way, it is possible to generalise it and allow for an inhomogeneous process, with time-dependent rates satisfying the same constraints. It is then useful to require that, under some time restrictions, there exists a homogeneous average of this inhomogeneous process within the same model. This leads to the definition of “Lie Markov models” which, as we will show, are precisely the class of models where such an average exists. These models form Lie algebras and hence concepts from Lie group theory are central to their derivation. In this paper, we concentrate on applications to phylogenetics and nucleotide evolution, and derive the complete hierarchy of Lie Markov models that respect the grouping of nucleotides into purines and pyrimidines – that is, models with purine/pyrimidine symmetry. We also discuss how to handle the subtleties of applying Lie group methods, most naturally defined over the complex field, to the stochastic case of a Markov process, where the parameter values are restricted to be real and positive. In particular, we explore the geometric embedding of the cone of stochastic rate matrices within the ambient space of the associated complex Lie algebra.

MSC:

60J27 Continuous-time Markov processes on discrete state spaces
60J28 Applications of continuous-time Markov processes on discrete state spaces
22E60 Lie algebras of Lie groups
62P10 Applications of statistics to biology and medical sciences; meta analysis
92D15 Problems related to evolution
92D20 Protein sequences, DNA sequences
52B99 Polytopes and polyhedra
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Alexandrov AD (2005) Convex polyhedra. Springer Monographs in Mathematics. Springer, Berlin. ISBN 3-540-23158-7 (translated from the 1950 Russian edition by N. S. Dairbekov, S. S. Kutateladze and A. B. Sossinsky, with comments and bibliography by V. A. Zalgaller and appendices by L. A. Shor and Yu. A. Volkov)
[2] Birkhoff G (1938) Analytical groups. Trans Am Math Soc 43(1):61-101. ISSN 0002-9947. doi:10.2307/1989902 · Zbl 0018.20502
[3] Blanes S, Casas F (2004) On the convergence and optimization of the Baker-Campbell-Hausdorff formula. Linear Algebra Appl 378:135-158. ISSN 0024-3795. doi:10.1016/j.laa.2003.09.010 · Zbl 1054.17005
[4] Bogopolski O (2008) Introduction to group theory. EMS Textbooks in Mathematics, European Mathematical Society (EMS), Zürich. ISBN 978-3-03719-041-8. doi:10.4171/041 (translated, revised and expanded from the Russian original) · Zbl 1215.20001
[5] Campbell JE (1897) On a law of combination of operators (second paper). Proc Lond Math Soc 28:381-390 · JFM 28.0321.01
[6] Casanellas M, Fernández-Sánchez J (2010) Relevant phylogenetic invariants of evolutionary models. J Math Pure Appl 96:207-229 · Zbl 1230.14063 · doi:10.1016/j.matpur.2010.11.002
[7] Casanellas M, Sullivant S (2005) The strand symmetric model. In: Algebraic statistics for computational biology. Cambridge University Press, New York, pp 305-321. doi:10.1017/CBO9780511610684.020 · Zbl 1374.60139
[8] Casanellas M, Fernández-Sánchez J, Kedzierska A (2012) The space of phylogenetic mixtures for equivariant models. Algorithms Mol Biol 7:33 · doi:10.1186/1748-7188-7-33
[9] Davies EB (2010) Embeddable Markov matrices. Electron J Probab 15(47):1474-1486. ISSN 1083-6489. doi:10.1214/EJP.v15-733 · Zbl 1226.60102
[10] Donten-Bury M, Michałek M (2012) Phylogenetic invariants for group-based models. J Algebr Stat 3(1):44-63. ISSN 1309-3452 · Zbl 1353.52008
[11] Draisma J, Kuttler J (2008) On the ideals of equivariant tree models. Math Ann 344:619-644 · Zbl 1398.62338 · doi:10.1007/s00208-008-0320-6
[12] Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368-376 · doi:10.1007/BF01734359
[13] Fernández-Sánchez J (2013) Code for lie markov models with purine/pyrimidine symmetry. http://www.pagines.ma1.upc.edu/jfernandez/purine_pyrimidine.html
[14] Hasegawa M, Kishino H, Yano T (1988) Phylogenetic inference from DNA sequence data. Statistical theory and data analysis, II (Tokyo, 1986). North-Holland, Amsterdam
[15] James G, Liebeck M (2001) Representations and characters of groups, 2nd edn. Cambridge University Press, New York · Zbl 0981.20004 · doi:10.1017/CBO9780511814532
[16] Johnson JE (1985) Markov-type Lie groups in \[GL(n,{R})\] GL(n,R). J Math Phys 26:252-257 · Zbl 0554.22010 · doi:10.1063/1.526654
[17] Jukes T, Cantor C (1969) Evolution of protein molecules. In: Mammalian protein, metabolism, pp 21-132
[18] Kimura M (1980) A simple method for estimating evolutionary rates of base substitution through comparative studies of nucleotide sequences. J Mol Evol 16:111-120 · doi:10.1007/BF01731581
[19] Kimura M (1981) Estimation of evolutionary distances between homologous nucleotide sequences. Proc Natl Acad Sci 78:1454-1458
[20] Michałek M (2011) Geometry of phylogenetic group-based models. J Algebra 339:339-356. ISSN 0021-8693. doi:10.1016/j.jalgebra.2011.05.016 · Zbl 1251.14040
[21] Posada D, Crandall KA (1998) Modeltest: testing the model of DNA substitution. Bioinformatics 14:817-818 · doi:10.1093/bioinformatics/14.9.817
[22] Rotman J (1995) An introduction to the theory of groups, 4th edn, volume 148 of Graduate Texts in Mathematics. Springer, New York. ISBN 0-387-94285-8 · Zbl 0810.20001
[23] Sagan BE (2001) The symmetric group: representations, combinatorial algorithms, and symmetric functions, 2nd edn., Graduate Texts in MathematicsSpringer, Berlin · Zbl 0964.05070 · doi:10.1007/978-1-4757-6804-6
[24] Semple C, Steel M (2003) Phylogenetics. Oxford Press, Oxford · Zbl 1043.92026
[25] Stein W et al (2012) Sage Mathematics Software (Version 4.8). The Sage Development Team. http://www.sagemath.org
[26] Sumner JG, Fernández-Sánchez J, Jarvis PD (2012a) Lie Markov models. J Theor Biol 298:16-31. ISSN 0022-5193. doi:10.1016/j.jtbi.2011.12.017 · Zbl 1397.92515
[27] Sumner JG, Jarvis PD, Fernández-Sánchez J, Kaine BT, Woodhams MD, Holland BR (2012b) Is the general time-reversible model bad for molecular phylogenetics? Syst Biol 61:1069-1074 · doi:10.1093/sysbio/sys042
[28] Tavaré S (1986) Some probabilistic and statistical problems in the analysis of dna sequences. Lect Math Life Sci (American Mathematical Society) 17:57-86
[29] Yap V, Pachter L (2004) Identification of evolutionary hotspots in the rodent genomes. Genome Res 14(4):574-579 · doi:10.1101/gr.1967904
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.