zbMATH — the first resource for mathematics

Bootstrapping phylogenetic trees: theory and methods. (English) Zbl 1331.62244
Summary: This is a survey of the use of the bootstrap in the area of systematic and evolutionary biology. I present the current usage by biologists of the bootstrap as a tool both for making inferences and for evaluating robustness, and propose a framework for thinking about these problems in terms of mathematical statistics.

62G09 Nonparametric statistical resampling methods
62P10 Applications of statistics to biology and medical sciences; meta analysis
92D15 Problems related to evolution
Seq-Gen; bootstrap
Full Text: DOI Euclid
[1] Aldous, D. (2001). Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today. Statist. Sci. 16 23–34. · Zbl 1127.60313
[2] Baker, C., Lento, G., Cipriano, F. and Palumbi, S. (2000). Predicted decline of protected whales based on molecular genetic monitoring of Japanese and Korean markets. Proc. Roy. Soc. London Ser. B 267 1191–1199.
[3] Baker, C. and Palumbi, S. (1994). Which whales are hunted? A molecular genetic approach to monitoring whaling. Science 265 1538–1539.
[4] Berry, V. and Gascuel, O. (1996). On the interpretation of bootstrap trees: Appropriate threshold of clade selection and induced gain. Molecular Biology and Evolution 13 999–1011.
[5] Billera, L., Holmes, S. and Vogtmann, K. (2001). Geometry of the space of phylogenetic trees. Adv. in Appl. Math. 27 733–767. · Zbl 0995.92035
[6] Breiman, L. (1996). Bagging predictors. Machine Learning 24 123–140. · Zbl 0858.68080
[7] Bremer, K. (1988). The limits of amino acid sequence data in angiosperm phylogenetic reconstruction. Evolution 42 795–803.
[8] Cooper, A. and Penny, D. (1997). Mass survival of birds across the cretaceous–tertiary boundary: Molecular evidence. Science 275 1109–1113.
[9] Diaconis, P. (1989). A generalization of spectral analysis with application to ranked data. Ann. Statist. 17 949–979. JSTOR: · Zbl 0688.62005
[10] Diaconis, P. and Holmes, S. (1998). Matchings and phylogenetic trees. Proc. Nat. Acad. Sci. U.S.A. 95 14,600–14,602. · Zbl 0908.92023
[11] Diaconis, P. and Holmes, S. (2002). Random walks on trees and matchings. Electronic Journal of Probability 7 17 pages. · Zbl 1007.60071
[12] Durbin, R., Eddy, S., Krogh, A. and Mitchison, G. (1998). Biological Sequence Analysis. Cambridge Univ. Press. · Zbl 0929.92010
[13] Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman and Hall, London. · Zbl 0835.62038
[14] Efron, B. and Tibshirani, R. (1998). The problem of regions. Ann. Statist. 26 1687–1718. · Zbl 0954.62031
[15] Efron, B., Halloran, E. and Holmes, S. (1996). Bootstrap confidence levels for phylogenetic trees. Proc. Nat. Acad. Sci. U.S.A. 93 13,429–13,434. · Zbl 0871.62092
[16] Felsenstein, J. (1983). Statistical inference of phylogenies (with discussion). J. Roy. Statist. Soc. Ser. A 146 246–272. · Zbl 0528.62090
[17] Felsenstein, J. (2003). Inferring Phylogenies. Sinauer, Boston.
[18] Felsenstein, J. and Churchill, G. A. (1996). A hidden Markov model approach to variation among sites in rate of evolution. Molecular Biology and Evolution 13 93–104.
[19] Fitch, W. (1971a). The nonidentity of invariable positions in the cytochromes \(c\) of different species. Biochemical Genetics 5 231–241.
[20] Fitch, W. (1971b). Rate of change of concomitantly variable codons. Journal of Molecular Evolution 1 84–96.
[21] Fitch, W. M. and Markowitz, E. (1970). An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochemical Genetics 4 579–593.
[22] Freedman, D. A. and Peters, S. C. (1984a). Bootstrapping a regression equation: Some empirical results. J. Amer. Statist. Assoc. 79 97–106. JSTOR:
[23] Freedman, D. A. and Peters, S. C. (1984b). Bootstrapping an econometric model: Some empirical results. J. Bus. Econom. Statist. 2 150–158.
[24] Gong, G. (1986). Cross-validation, the jackknife, and the bootstrap: Excess error estimation in forward logistic regression. J. Amer. Statist. Assoc. 81 108–113.
[25] Green, P. J. (1981). Peeling bivariate data. In Interpreting Multivariate Data (V. Barnett, ed.) 3–19. Wiley, New York. · Zbl 0597.62002
[26] Hall, P. (1987). On the bootstrap and likelihood-based confidence regions. Biometrika 74 481–493. JSTOR: · Zbl 0635.62033
[27] Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel, W. A. (1986). Robust Statistics : The Approach Based on Influence Functions . Wiley, New York. · Zbl 0593.62027
[28] Hedges, S. B. (1992). The number of replications needed for accurate estimation of the bootstrap \(p\)-value in phylogenetic studies. Molecular Biology and Evolution 9 366–369.
[29] Hendy, M. D. and Penny, D. (1993). Spectral analysis of phylogenetic data. J. Classification 10 5–23. · Zbl 0772.92014
[30] Hendy, M. D., Penny, D. and Steel, M. A. (1994). A discrete Fourier analysis for evolutionary trees. Proc. Nat. Acad. Sci. U.S.A. 91 3339–3343. · Zbl 0791.92017
[31] Hillis, D. M. and Bull, J. J. (1993). An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Systematic Biology 42 182–192.
[32] Holmes, S. (1999). Phylogenies: An overview. In Statistics in Genetics (M. E. Halloran and S. Geisser, eds.) 81–119. Springer, New York. · Zbl 0939.92024
[33] Huber, P. J. (1996). Robust Statistical Procedures , 2nd ed. SIAM, Philadelphia. · Zbl 0859.62003
[34] Künsch, H. R. (1989). The jackknife and the bootstrap for general stationary observations. Ann. Statist. 17 1217–1241. JSTOR: · Zbl 0684.62035
[35] LANL (2002). HIV database. Available at URL: http://hiv-web.lanl.gov/content/hiv-db/mainpage.html.
[36] Lauritzen, S. L. (1988). Extremal Families and Systems of Sufficient Statistics. Lecture Notes in Statist. 49 . Springer, Berlin. · Zbl 0681.62009
[37] Lento, G. M., Cipriano, F., Patenaude, N. J., Palumbi, S. R. and Baker, C. S. (1998). Taking stock of minke whale in the North Pacific: The origins of products for sale in Japan and Korea. Technical Report SC/50/RMP15, Scientific Committee, International Whaling Commission.
[38] Li, S., Pearl, D. K. and Doss, H. (2000). Phylogenetic tree construction using Markov chain Monte Carlo. J. Amer. Statist. Assoc. 95 493–508.
[39] Li, W. H. (1997). Molecular Evolution. Sinauer, Boston. · Zbl 0854.01041
[40] Li, W. H. and Zharkikh, A. (1995). Statistical tests of DNA phylogenies. Systematic Biology 44 49–63. · Zbl 0812.62020
[41] Liu, R. Y., Parelius, J. M. and Singh, K. (1999). Multivariate analysis by data depth: Descriptive statistics, graphics and inference (with discussion). Ann. Statist. 77 783–858. · Zbl 0984.62037
[42] Lockhart, P. J., Larkum, A. W. D., Steel, M. A., Waddell, P. J. and Penny, D. (1996). Evolution of chlorophyll and bacteriochlorophyll: The problem of invariant sites in sequence analysis. Proc. Nat. Acad. Sci. U.S.A. 93 1930–1934.
[43] Maddison, D. (1991). The discovery and importance of multiple islands of most parsimonious trees. Systematic Zoology 40 315–328.
[44] Mau, B., Newton, M. A. and Larget, B. (1999). Bayesian phylogenetic inference via Markov chain Monte Carlo methods. Biometrics 55 1–12. JSTOR: · Zbl 1059.62675
[45] Nei, M., Kumar, S. and Takahashi, K. (1998). The optimization principle in phylogenetic analysis tends to give incorrect topologies when the number of nucleotides or amino acids used is small. Proc. Nat. Acad. Sci. U.S.A. 95 12,390–12,397.
[46] Newton, M. A. (1996). Bootstrapping phylogenies: Large deviations and dispersion effects. Biometrika 83 315–328. JSTOR: · Zbl 0864.62077
[47] Page, R. and Holmes, E. (2000). Molecular Evolution : A Phylogenetic Approach . Blackwell Science, Oxford, UK.
[48] Rambaut, A. and Grassly, N. C. (1997). Seq-gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Computational Applied Bioscience 13 235–238.
[49] Ramsay, J. O. (1978). Confidence regions for multidimensional scaling analysis. Psychometrika 43 145–160. · Zbl 0384.62045
[50] Robbins, H. (1980). An empirical Bayes estimation problem. Proc. Nat. Acad. Sci. U.S.A. 77 6,988–6,989. JSTOR: · Zbl 0456.62029
[51] Robbins, H. (1983). Some thoughts on empirical Bayes estimation. Ann. Statist. 11 713–723. JSTOR: · Zbl 0522.62024
[52] Robbins, H. (1985). Linear empirical Bayes estimation of means and variances. Proc. Nat. Acad. Sci. U.S.A. 82 1571–1574. JSTOR: · Zbl 0559.62034
[53] Rodrigo, A. G. (1993). Calibrating the bootstrap test of monophyly. International Journal for Parasitology 23 507–514.
[54] Sanderson, M. J. (1995). Objections to bootstrapping phylogenies: A critique. Systematic Biology 44 299–320.
[55] Schröder, E. (1870). Vier combinatorische Probleme. Zeitschrift für Mathematik und Physik 15 361–376. · JFM 02.0108.04
[56] Tang, H. and Lewontin, R. (1999). Locating regions of differential variability in DNA and protein sequences. Genetics 153 485–495.
[57] Tuffley, C. and Steel, M. (1998). Modeling the covarion hypothesis of nucleotide substitution. Math. Biosci. 147 63–91. · Zbl 0897.92025
[58] Tukey, J. (1975). Mathematics and the picturing of data. In Proc. International Congress of Mathematicians (R. D. James, ed.) 523–531. Vancouver. · Zbl 0347.62002
[59] Yang, Z. (1994). Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods. J. Molecular Evolution 39 306–314.
[60] Yang, Z. and Rannala, B. (1997). Bayesian phylogenetic inference using DNA sequences: A Markov chain Monte Carlo method. Molecular Biology and Evolution 14 717–724.
[61] Zharkikh, A. and Li, W.-H. (1995). Estimation of confidence in phylogeny: The complete-and-partial bootstrap technique. Molecular Phylogenetics and Evolution 4 44–63.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.