Assessing phenotypic correlation through the multivariate phylogenetic latent liability model. (English) Zbl 1454.62324

Summary: Understanding which phenotypic traits are consistently correlated throughout evolution is a highly pertinent problem in modern evolutionary biology. Here, we propose a multivariate phylogenetic latent liability model for assessing the correlation between multiple types of data, while simultaneously controlling for their unknown shared evolutionary history informed through molecular sequences. The latent formulation enables us to consider in a single model combinations of continuous traits, discrete binary traits and discrete traits with multiple ordered and unordered states. Previous approaches have entertained a single data type generally along a fixed history, precluding estimation of correlation between traits and ignoring uncertainty in the history. We implement our model in a Bayesian phylogenetic framework, and discuss inference techniques for hypothesis testing. Finally, we showcase the method through applications to columbine flower morphology, antibiotic resistance in Salmonella and epitope evolution in influenza.


62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text: DOI arXiv Euclid


[1] Baele, G., Lemey, P., Bedford, T., Rambaut, A., Suchard, M. A. and Alekseyenko, A. V. (2012). Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. Mol. Biol. Evol. 29 2157-2167.
[2] Blum, M. G., Damerval, C., Manel, S. and François, O. (2004). Brownian models and coalescent structures. Theor. Popul. Biol. 65 249-261. · Zbl 1109.92027 · doi:10.1016/j.tpb.2003.11.002
[3] Boyd, D., Peters, G. A., Cloeckaert, A., Boumedine, K. S., Chaslus-Dancla, E., Imberechts, H. and Mulvey, M. R. (2001). Complete nucleotide sequence of a 43-kilobase genomic island associated with the multidrug resistance region of Salmonella enterica serovar Typhimurium DT104 and its identification in phage type DT120 and serovar Agona. J. Bacteriol. 183 5725-5732.
[4] Breslaw, J. A. (1994). Random sampling from a truncated multivariate normal distribution. Appl. Math. Lett. 7 1-6. · Zbl 0795.62060 · doi:10.1016/0893-9659(94)90042-6
[5] Bush, R. M., Bender, C. A., Subbarao, K., Cox, N. J. and Fitch, W. M. (1999). Predicting the evolution of human influenza A. Science 286 1921-1925.
[6] Cox, N. J. and Bender, C. A. (1995). The molecular epidemiology of influenza viruses. In Seminars in Virology 6 359-370. Elsevier, Amsterdam.
[7] Cybis, G. B., Sinsheimer, J. S., Bedford, T., Mather, A. E., Lemey, P. and Suchard, M. A. (2015). Supplement to “Assessing phenotypic correlation through the multivariate phylogenetic latent liability model.” . · Zbl 1454.62324 · doi:10.1214/15-AOAS821
[8] Damien, P. and Walker, S. G. (2001). Sampling truncated normal, beta, and gamma densities. J. Comput. Graph. Statist. 10 206-215. · Zbl 04567019 · doi:10.1198/10618600152627906
[9] Drummond, A. J., Ho, S. Y. W., Phillips, M. J. and Rambaut, A. (2006). Relaxed phylogenetics and dating with confidence. PLoS Biol. 4 e88.
[10] Drummond, A. J., Suchard, M. A., Xie, D. and Rambaut, A. (2012). Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29 1969-1973.
[11] Falconer, D. S. (1965). The inheritance of liability to certain diseases, estimated from the incidence among relatives. Ann. Hum. Genet. 29 51-76.
[12] Faria, N. R., Suchard, M. A., Rambaut, A., Streicker, D. G. and Lemey, P. (2013). Simultaneously reconstructing viral cross-species transmission history and identifying the underlying constraints. Philosophical Transactions of the Royal Society B : Biological Sciences 368 20120196.
[13] Felsenstein, J. (1981). Evolutionary trees from DNA sequences: A maximum likelihood approach. J. Mol. Evol. 17 368-376.
[14] Felsenstein, J. (1985). Phylogenies and the comparative method. Amer. Nat. 125 1-15.
[15] Felsenstein, J. (2005). Using the quantitative genetic threshold model for inferences between and within species. Philosophical Transactions of the Royal Society B : Biological Sciences 360 1427-1434.
[16] Felsenstein, J. (2012). A comparative method for both discrete and continuous characters using the threshold model. Amer. Nat. 179 145-156.
[17] Fitch, W. M., Leiter, J. M., Li, X. Q. and Palese, P. (1991). Positive Darwinian evolution in human influenza a viruses. Proc. Natl. Acad. Sci. USA 88 4270-4274.
[18] Freckleton, R. P. (2012). Fast likelihood calculations for comparative analyses. Methods in Ecology and Evolution 3 940-947.
[19] Gelfand, A. E., Smith, A. F. M. and Lee, T.-M. (1992). Bayesian analysis of constrained parameter and truncated data problems using Gibbs sampling. J. Amer. Statist. Assoc. 87 523-532. · doi:10.1080/01621459.1992.10475235
[20] Grafen, A. (1989). The phylogenetic regression. Philosophical Transactions of the Royal Society of London. Series B , Biological Sciences 326 119-157.
[21] Hadfield, J. D. and Nakagawa, S. (2010). General quantitative genetic methods for comparative biology: Phylogenies, taxonomies and multi-trait models for continuous and categorical characters. J. Evol. Biol. 23 494-508.
[22] Ho, L. S. T. and Ané, C. (2014). A linear-time algorithm for Gaussian and non-Gaussian trait evolution models. Systematic Biology 3 397-402.
[23] Huelsenbeck, J. P. and Rannala, B. (2003). Detecting correlation between characters in a comparative analysis with uncertain phylogeny. Evolution 57 1237-1247.
[24] Ives, A. R. and Garland, T. Jr. (2010). Phylogenetic logistic regression for binary dependent variables. Syst. Biol. 59 9-26.
[25] Jeffreys, H. (1935). Some tests of significance, treated by the theory of probability. Math. Proc. Cambridge Philos. Soc. 31 203-222. · Zbl 0011.31601
[26] Koel, B. F., Burke, D. F., Bestebroer, T. M., van der Vliet, S., Zondag, G. C., Vervaet, G., Skepner, E., Lewis, N. S., Spronken, M. I., Russell, C. A. et al. (2013). Substitutions near the receptor binding site determine major antigenic change during influenza virus evolution. Science 342 976-979.
[27] Landis, M. J., Schraiber, J. G. and Liang, M. (2013). Phylogenetic analysis using Lévy processes: Finding jumps in the evolution of continuous traits. Syst. Biol. 62 193-204.
[28] Lartillot, N. and Poujol, R. (2011). A phylogenetic model for investigating correlated evolution of substitution rates and continuous phenotypic characters. Mol. Biol. Evol. 28 729-744.
[29] Lemey, P., Rambaut, A., Welch, J. J. and Suchard, M. A. (2010). Phylogeography takes a relaxed random walk in continuous space and time. Mol. Biol. Evol. 27 1877-1885.
[30] Lewis, P. O. (2001). A likelihood approach to estimating phylogeny from discrete morphological character data. Systematic Biology 50 913-925.
[31] Liu, J. S., Liang, F. and Wong, W. H. (2000). The multiple-try method and local optimization in Metropolis sampling. J. Amer. Statist. Assoc. 95 121-134. · Zbl 1072.65505 · doi:10.2307/2669532
[32] Mather, A. E., Matthews, L., Mellor, D. J., Reeve, R., Denwood, M. J., Boerlin, P., Reid-Smith, R. J., Brown, D. J., Coia, J. E., Browning, L. M. et al. (2012). An ecological approach to assessing the epidemiology of antimicrobial resistance in animal and human populations. Proceedings of the Royal Society B : Biological Sciences 279 1630-1639.
[33] Mather, A. E., Reid, S. W. J., Maskell, D. J., Parkhill, J., Fookes, M. C., Harris, S. R., Brown, D. J., Coia, J. E., Mulvey, M. R., Gilmour, M. W. et al. (2013). Distinguishable epidemics of multidrug-resistant Salmonella Typhimurium DT104 in different hosts. Science 341 1514-1517.
[34] Minin, V. N., Bloomquist, E. W. and Suchard, M. A. (2008). Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics. Mol. Biol. Evol. 25 1459-1471.
[35] Novembre, J. and Slatkin, M. (2009). Likelihood-based inference in isolation-by-distance models using the spatial distributions of low frequency alleles. Evolution 63 2914-2925.
[36] Pagel, M. (1994). Detecting correlated evolution on phylogenies: A general method for the comparative analysis of discrete characters. Proceedings of the Royal Society of London. Series B : Biological Sciences 255 37-45.
[37] Plotkin, J. B. and Dushoff, J. (2003). Codon bias and frequency-dependent selection on the hemagglutinin epitopes of influenza a virus. Proc. Natl. Acad. Sci. USA 100 7152-7157.
[38] Pybus, O. G., Suchard, M. A., Lemey, P., Bernardin, F. J., Rambaut, A., Crawford, F. W., Gray, R. R., Arinaminpathy, N., Stramer, S. L., Busch, M. P. and Delwart, E. L. (2012). Unifying the spatial epidemiology and molecular evolution of emerging epidemics. Proc. Natl. Acad. Sci. USA 109 15066-15071.
[39] Revell, L. J. (2012). phytools: An R package for phylogenetic comparative biology (and other things). Methods in Ecology and Evolution 3 217-223.
[40] Revell, L. J. (2014). Ancestral character estimation under the threshold model from quantitative genetics. Evolution 68 743-759.
[41] Robert, C. P. (1995). Simulation of truncated normal variables. Stat. Comput. 5 121-125.
[42] Suchard, M. A., Weiss, R. E. and Sinsheimer, J. S. (2001). Bayesian selection of continuous-time Markov chain evolutionary models. Molecular Biology and Evolution 18 1001-1013.
[43] van der Niet, T. and Johnson, S. D. (2012). Phylogenetic evidence for pollinator-driven diversification of angiosperms. Trends in Ecology & Evolution 27 353-361.
[44] Whittall, J. B. and Hodges, S. A. (2007). Pollinator shifts drive increasingly long nectar spurs in columbine flowers. Nature 447 706-709.
[45] Whittall, J. B., Voelckel, C., Kliebenstein, D. J. and Hodges, S. A. (2006). Convergence, constraint and the role of gene expression during adaptive radiation: Floral anthocyanins in Aquilegia. Mol. Ecol. 15 4645-4657.
[46] Wright, S. (1934). An analysis of variability in number of digits in an inbred strain of guinea pigs. Genetics 19 506.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.