×

Distribution of distances between topologies and its effect on detection of phylogenetic recombination. (English) Zbl 1422.62323

Summary: Inferences about the evolutionary history of biological sequence data are greatly influenced by the presence of recombination, that tends to disrupt the phylogenetic signal. Current recombination detection procedures focus on the phylogenetic disagreement of the data along the aligned sequences, but only recently the link between the quantification of this disagreement and the strength of the recombination was realised. We previously described a hierarchical Bayesian procedure based on the distance between topologies of neighbouring sites and a Poisson-like prior for these distances. Here, we confirm the improvement provided by this topology distance and its prior over existing methods that neglect this information by analysing datasets simulated under a complex evolutionary model. We also show how to obtain a mosaic structure representative of the posterior sample based on a newly developed centroid method.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
60J85 Applications of branching processes
92D15 Problems related to evolution
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Akaike H. (1974) A new look at the statistical model identification. IEEE Transactions on Automatic Control 19(6): 716–723 · Zbl 0314.62039 · doi:10.1109/TAC.1974.1100705
[2] Al-Awadhi F., Hurn M., Jennison C. (2004) Improving the acceptance rate of reversible jump MCMC proposals. Statistics and Probability Letters 69(2): 189–198 · Zbl 1116.65308 · doi:10.1016/j.spl.2004.06.025
[3] Allen B., Steel M. (2001) Subtree transfer operations and their induced metrics on evolutionary trees. Annals of Combinatorics 5(1): 1–15 · Zbl 0978.05023 · doi:10.1007/s00026-001-8006-8
[4] Altekar G., Dwarkadas S., Huelsenbeck J.P., Ronquist F. (2004) Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics 20(3): 407–415 · doi:10.1093/bioinformatics/btg427
[5] Awadalla P. (2003) The evolutionary genomics of pathogen recombination. Nature Reviews Genetics 4(1): 50–60 · doi:10.1038/nrg964
[6] Beiko R.G., Hamilton N. (2006) Phylogenetic identification of lateral genetic transfer events. BMC Evolutionary Biology 6: 15 · doi:10.1186/1471-2148-6-15
[7] Carvalho L.E., Lawrence C.E. (2008) Centroid estimation in discrete high-dimensional spaces with applications in biology. Proceedings of the National Academy of Sciences USA 105(9): 3209–3214 · doi:10.1073/pnas.0712329105
[8] Dimatteo I., Genovese C., Kass R. (2001) Bayesian curve-fitting with free-knot splines. Biometrika 88(4): 1055–1071 · Zbl 0986.62026 · doi:10.1093/biomet/88.4.1055
[9] Ding Y., Chan C.Y., Lawrence C.E. (2005) Rna secondary structure prediction by centroids in a boltzmann weighted ensemble. RNA 11(8): 1157–1166 · doi:10.1261/rna.2500605
[10] Fang F., Ding J., Minin V.N., Suchard M.A., Dorman K.S. (2007) cBrother: relaxing parental tree assumptions for Bayesian recombination detection. Bioinformatics 23(4): 507–508 · doi:10.1093/bioinformatics/btl613
[11] Felsenstein J. (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17(6): 368–376 · doi:10.1007/BF01734359
[12] Felsenstein J. (2004) Inferring phylogenies. Sinauer Associates, Sunderland, MA
[13] Gelman A. (2004) Parameterization and Bayesian modeling. Journal of the American Statistical Association 99(466): 537–545 · Zbl 1117.62343 · doi:10.1198/016214504000000458
[14] Gelman A., Carlin J.B., Stern H.S., Rubin D.B. (2003) Bayesian data analysis (2nd ed). Boca Raton: FL, Chapman & Hall/CRC
[15] Hasegawa M., Kishino H., Yano T. (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22(2): 160–174 · doi:10.1007/BF02101694
[16] Kass R.E., Raftery A.E. (1995) Bayes Factors. Journal of the American Statistical Association 90(430): 773–795 · Zbl 0846.62028 · doi:10.1080/01621459.1995.10476572
[17] Minin V.N., Dorman K.S., Fang F., Suchard M.A. (2005) Dual multiple change-point model leads to more accurate recombination detection. Bioinformatics 21(13): 3034–3042 · doi:10.1093/bioinformatics/bti459
[18] Mitchell T.J., Beauchamp J.J. (1988) Bayesian variable selection in linear regression. Journal of the American Statistical Association 83(404): 1023–1032 · Zbl 0673.62051 · doi:10.1080/01621459.1988.10478694
[19] de Oliveira Martins L., Leal É., Kishino H. (2008) Phylogenetic detection of recombination with a Bayesian prior on the distance between trees. PLoS ONE 3(7): e2651 · doi:10.1371/journal.pone.0002651
[20] Posada D. (2002) Evaluation of methods for detecting recombination from dna sequences: empirical data. Molecular Biology and Evolution 19: 708–717 · doi:10.1093/oxfordjournals.molbev.a004129
[21] Posada D., Buckley T. (2004) Model selection and model averaging in phylogenetics: advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Systematic Biology 53(5): 793–808 · doi:10.1080/10635150490522304
[22] Song Y. (2003) On the combinatorics of rooted binary phylogenetic trees. Annals of Combinatorics 7(3): 365–379 · Zbl 1045.05031 · doi:10.1007/s00026-003-0192-0
[23] Spiegelhalter D., Best N., Carlin B., van der Linde A. (2002) Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society Series B 64(4): 583–639 · Zbl 1067.62010 · doi:10.1111/1467-9868.00353
[24] Suchard M., Weiss R., Dorman K., Sinsheimer J. (2003) Inferring spatial phylogenetic variation along nucleotide sequences: a multiple changepoint model. Journal of the American Statistical Association 98(462): 427–438 · Zbl 1041.62095 · doi:10.1198/016214503000215
[25] Tavaré S. (1986) Some probabilistic and statistical problems in the analysis of DNA sequences. In: Miura R.M. (eds) Some Mathematical Questions in Biology–DNA Sequence Analysis. Providence, AMS Bookstore, pp 57–86 · Zbl 0587.92015
[26] Webb-Robertson B.J.M., McCue L.A., Lawrence C.E. (2008) Measuring global credibility with application to local sequence alignment. PLoS Computational Biology 4(5): e1000077 · doi:10.1371/journal.pcbi.1000077
[27] Yang Z. (1993) Maximum-likelihood estimation of phylogeny from dna sequences when substitution rates differ over sites. Molecular Biology and Evolution 10(6): 1396–1401
[28] Yang Z. (1994) Estimating the pattern of nucleotide substitution. Journal of Molecular Evolution 39(1): 105–111
[29] Yang Z. (1994) Maximum likelihood phylogenetic estimation from dna sequences with variable rates over sites: approximate methods. Journal of Molecular Evolution 39(3): 306–314 · doi:10.1007/BF00160154
[30] Yang Z. (2007) Paml 4: phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution 24(8): 1586–1591 · doi:10.1093/molbev/msm088
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.