zbMATH — the first resource for mathematics

Applying the bootstrap in phylogeny reconstruction. (English) Zbl 1331.62441
Summary: With the increasing emphasis in biology on reconstruction of phylogenetic trees, questions have arisen as to how confident one should be in a given phylogenetic tree and how support for phylogenetic trees should be measured. Felsenstein suggested that bootstrapping be applied across characters of a taxon-by-character data matrix to produce replicate “bootstrap data sets”, each of which is then analyzed phylogenetically, with a consensus tree constructed to summarize the results of all replicates. The proportion of trees/replicates in which a grouping is recovered is presented as a measure of support for that group. Bootstrapping has become a common feature of phylogenetic analysis. However, the interpretation of bootstrap values remains open to discussion, and phylogeneticists have used these values in multiple ways. The usefulness of phylogenetic bootstrapping is potentially limited by a number of features, such as the size of the data matrix and the underlying assumptions of the phylogeny reconstruction program. Recent studies have explored the application of bootstrapping to large data sets and the relative performance of bootstrapping and jackknifing.

62P10 Applications of statistics to biology and medical sciences; meta analysis
62G09 Nonparametric statistical resampling methods
92D15 Problems related to evolution
Full Text: DOI Euclid
[1] Bull, J. J., Cunningham, C. W., Molineux, I. J., Badgett, M. R. and Hillis, D. M. (1993). Experimental molecular evolution of bacteriophage T7. Evolution 47 993–1007. · Zbl 0930.92018
[2] Carpenter, J. M. (1992). Random cladistics. Cladistics 8 147–153.
[3] Carpenter, J. M. (1996). Uninformative bootstrapping. Cladistics 12 177–181.
[4] Cavender, J. A. (1978). Taxonomy with confidence. Math. Biosci. 40 271–280. · Zbl 0391.92015 · doi:10.1016/0025-5564(78)90089-5
[5] Cavender, J. A. (1981). Tests of phylogenetic hypotheses under generalized models. Math. Biosci. 54 217–229. · Zbl 0456.92013 · doi:10.1016/0025-5564(81)90087-0
[6] Chase, M. W., Soltis, D. E., Olmstead, R. G., Morgan, D., Les, D. H., Mishler, B. D., Duvall, M. R., Price, R. A., Hills, H. G., Qiu, Y.-L., Kron, K. A., Rettig, J. H., Conti, E., Palmer, J. D., Manhart, J. R., Sytsma, K. J., Michaels, H. J., Kress, W. J., Karol, K. G., Clark, W. D., Hedrén, M., Gaut, B. S., Jansen, R. K., Kim, K. J., Wimpee, C. F., Smith, J. F., Furnier, G. R., Strauss, S. H., Xiang, Q.-Y., Plunkett, G. M., Soltis, P. S., Swensen, S. M., Williams, S. E., Gadek, P. A., Quinn, C. J., Eguiarte, L. E., Golenberg, E., Learn, G. H., Jr., Graham, S. W., Barrett, S. C. H., Dayanandan, S. and Albert, V. A. (1993). Phylogenetics of seed plants: An analysis of nucleotide sequences from the plastid gene rbcL . Annals of the Missouri Botanical Garden 80 528–580.
[7] Darwin, C. (1859). On the Origin of Species by Means of Natural Selection . J. Murray, London.
[8] DeBry, R. W. and Olmstead, R. G. (2000). A simulation study of reduced tree-search effort in bootstrap resampling analysis. Systematic Biology 49 171–179.
[9] Diaconis, P. and Efron, B. (1983). Computer-intensive methods in statistics. Scientific American 249 116–130.
[10] Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Ann. Statist. 7 1–26. JSTOR: · Zbl 0406.62024 · doi:10.1214/aos/1176344552 · links.jstor.org
[11] Efron, B. (1982). The Jackknife, the Bootstrap, and Other Resampling Plans . SIAM, Philadelphia. · Zbl 0496.62036
[12] Efron, B. (1985). Bootstrap confidence intervals for a class of parametric problems. Biometrika 72 45–58. JSTOR: · Zbl 0567.62025 · doi:10.1093/biomet/72.1.45 · links.jstor.org
[13] Efron, B. and Gong, G. (1983). A leisurely look at the bootstrap, the jackknife, and cross-validation. Amer. Statist. 37 36–48. JSTOR: · doi:10.2307/2685844 · links.jstor.org
[14] Efron, B., Halloran, E. and Holmes, S. (1996). Bootstrap confidence levels for phylogenetic trees. Proc. Nat. Acad. Sci. U.S.A. 93 13,429–13,434. · Zbl 0871.62092 · doi:10.1073/pnas.93.23.13429
[15] Faith, D. P. and Cranston, P. S. (1991). Could a cladogram this short have arisen by chance alone? On permutation tests for cladistic structure. Cladistics 7 1–28.
[16] Farris, J. S. (1983). The logical basis of phylogenetic analysis. In Advances in Cladistics 2 (N. I. Platnick and V. A. Funk, eds.) 7–36. Columbia Univ. Press.
[17] Farris, J. S., Albert, V. A., Källersjö, M., Lipscomb, D. and Kluge, A. G. (1996). Parsimony jackknifing outperforms neighbor-joining. Cladistics 12 99–124. JSTOR: · Zbl 0846.46046 · doi:10.1090/S0002-9939-96-03016-X · links.jstor.org
[18] Felsenstein, J. (1978). Cases in which parsimony or compatibility methods will be positively misleading. Systematic Zoology 27 401–410.
[19] Felsenstein, J. (1985). Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39 783–791.
[20] Felsenstein, J. (1988). Phylogenies from molecular sequences: Inference and reliability. Annual Review of Genetics 22 521–565.
[21] Felsenstein, J. and Kishino, H. (1993). Is there something wrong with the bootstrap on phylogenies? A reply to Hillis and Bull. Systematic Biology 42 193–200.
[22] Graybeal, A. (1998). Is it better to add taxa or characters to a difficult phylogenetic problem? Systematic Biology 47 9–17.
[23] Harshman, J. (1994). The effect of irrelevant characters on bootstrap values. Systematic Biology 43 419–424.
[24] Hedges, S. B. (1992). The number of replications needed for accurate estimation of the bootstrap \(p\)-value in phylogenetic studies. Molecular Biology and Evolution 9 366–369.
[25] Hennig, W. (1966). Phylogenetic Systematics . Univ. Illinois Press, Urbana.
[26] Hillis, D. M. (1996). Inferring complex phylogenies. Nature 383 130–131.
[27] Hillis, D. M. and Bull, J. J. (1993). An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Systematic Biology 42 182–192.
[28] Hillis, D. M. and Dixon, M. T. (1989). Vertebrate phylogeny: Evidence from 28S ribosomal DNA sequences. In The Hierarchy of Life (B. Fernholm, K. Bremer and H. Jörnvall, eds.) 355–367. Elsevier, Amsterdam.
[29] Huelsenbeck, J. P. (1995). Performance of phylogenetic methods in simulation. Systematic Biology 44 17–48.
[30] Huelsenbeck, J. P. and Crandall, K. A. (1997). Phylogeny estimation and hypothesis testing using maximum likelihood. Annual Review of Ecology and Systematics 28 437–466.
[31] Huelsenbeck, J. P. and Hillis, D. M. (1993). Success of phylogenetic methods in the four-taxon case. Systematic Biology 42 247–264.
[32] Källersjö, M., Farris, J. S., Chase, M. W., Bremer, B., Fay, M. F., Humphries, C. J., Petersen, G., Seberg, O. and Bremer, K. (1998). Simultaneous parsimony jackknife analysis of 2538 rbcL DNA sequences reveals support for major clades of green plants, land plants, seed plants, and flowering plants. Plant Systematics and Evolution 213 259–287.
[33] Kluge, A. G. (1997). Testability and the refutation and corroboration of cladistic hypotheses. Cladistics 13 81–96.
[34] Kluge, A. G. (1999). The science of phylogenetic systematics: Explanation, prediction, and test. Cladistics 15 429–436.
[35] Kluge, A. G. and Wolf, A. J. (1993). Cladistics: What’s in a word? Cladistics 9 183–199.
[36] Lanyon, S. (1985). Detecting internal inconsistencies in distance data. Systematic Zoology 34 397–403.
[37] Miller, R. G. (1974). The jackknife—a review. Biometrika 61 1–15. JSTOR: · Zbl 0275.62035 · doi:10.2307/2334280 · links.jstor.org
[38] Mort, M. E., Soltis, P. S., Soltis, D. E. and Mabry, M. (2000). Comparison of three methods for estimating internal support on phylogenetic trees. Systematic Biology 49 160–171.
[39] Mueller, L. D. and Ayala, F. J. (1982). Estimation and interpretation of genetic distance in empirical studies. Genetical Research 40 127–137.
[40] Newton, M. A. (1996). Bootstrapping phylogenies: Large deviations and dispersion effects. Biometrika 83 315–328. JSTOR: · Zbl 0864.62077 · doi:10.1093/biomet/83.2.315 · www3.oup.co.uk
[41] Penny, D., Foulds, L. R. and Hendy, M. D. (1982). Testing the theory of evolution by comparing phylogenetic trees constructed from 5 different protein sequences. Nature 297 197–200.
[42] Penny, D. and Hendy, M. D. (1985). Testing methods of evolutionary tree construction. Cladistics 1 266–278.
[43] Platnick, N. I. and Gaffney, E. S. (1977). Review of The Logic of Scientific Discovery and Conjectures and Refutations , by K. R. Popper. Systematic Zoology 26 361–365.
[44] Platnick, N. I. and Gaffney, E. S. (1978). Evolutionary biolo- gy: A Popperian perspective. Systematic Zoology 27 138–141.
[45] Rodrigo, A. (1993). Calibrating the bootstrap test of monophyly. International Journal for Parasitology 23 507–514.
[46] Sanderson, M. J. (1989). Confidence limits on phylogenies: The bootstrap revisited. Cladistics 5 113–129.
[47] Sanderson, M. J. (1995). Objections to bootstrapping phylogenies: A critique. Systematic Biology 44 299–320.
[48] Sanderson, M. J. and Wojciechowski, M. F. (2000). Improved bootstrap confidence limits in large-scale phylogenies, with an example from Neo-Astragalus (Leguminosae). Systematic Biology 49 671–685.
[49] Savolainen, V., Chase, M. W., Morton, C. M., Hoot, S. B., Soltis, D. E., Bayer, C., Fay, M. F., De Bruijn, A., Sullivan, S. and Qiu, Y.-L. (2000). Phylogenetics of flowering plants based upon a combined analysis of plastid atpB and rbcL gene sequences. Systematic Biology 49 306–362.
[50] Soltis, D. E., Soltis, P. S., Mort, M. E., Chase, M. W., Savolainen, V., Hoot, S. B. and Morton, C. M. (1998). Inferring complex phylogenies using parsimony: An empirical approach using three large DNA data sets for angiosperms. Systematic Biology 47 32–42.
[51] Soltis, D. E., Soltis, P. S., Chase, M. W., Mort, M. E., Albach, D. C., Zanis, M., Savolainen, V., Hahn, W. H., Hoot, S. B., Fay, M. F., Axtell, M., Swensen, S. M., Prince, L. M., Kress, W. J., Nixon, K. C. and Farris, J. S. (2000). Angiosperm phylo- geny inferred from 18S rDNA, rbcL , and atpB sequences. Botanical Journal of the Linnean Society 133 381–461.
[52] Soltis, P. S. and Novak, S. J. (1997). Polyphyly of the tuberous Lomatiums (Apiaceae): cpDNA evidence for morphological convergence. Systematic Botany 22 99–112.
[53] Soltis, P. S., Soltis, D. E. and Chase, M. W. (1999). Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology. Nature 402 402–404.
[54] Swofford, D. L. (1998). PAUP* 4.0: Phylogenetic analysis using parsimony (and other methods), Beta version 4.0. Sinauer, Sunderland, MA.
[55] Templeton, A. R. (1983). Phylogenetic inference from restriction endonuclease cleavage site maps with particular reference to the evolution of humans and the apes. Evolution 37 221–244. · Zbl 0524.92016
[56] Wendel, J. F. and Albert, V. A. (1992). Phylogenetics of the cotton genus (Gossypium): Character-state weighted parsimony analysis of chloroplast-DNA restriction site data and its systematic and biogeographic implications. Systematic Botany 17 115–143.
[57] Wiley, E. O. (1975). Karl R. Popper, systematics, and classification: A reply to Walter Bock and other evolutionary taxonomists. Systematic Zoology 24 233–243.
[58] Zanis, M. J., Soltis, D. E., Soltis, P. S., Mathews, S. and Donoghue, M. J. (2002). The root of the angiosperms revisited. Proc. Nat. Acad. Sci. U.S.A. 99 6848–6853.
[59] Zharkikh, A. and Li, W.-H. (1992a). Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. I. Four taxa with a molecular clock. Molecular Biology and Evolution 9 1119–1147.
[60] Zharkikh, A. and Li, W.-H. (1992b). Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. II. Four taxa without a molecular clock. J. Molecular Evolution 35 356–366.
[61] Zharkikh, A. and Li, W.-H. (1995). Estimation of confidence in phylogeny: The complete-and-partial bootstrap technique. Molecular Phylogenetics and Evolution 4 44–63.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.