## Distribution and asymptotic behavior of the phylogenetic transfer distance.(English)Zbl 1418.92082

Summary: The transfer distance (TD) was introduced in the classification framework and studied in the context of phylogenetic tree matching. Recently, F. Lemoine et al. [“Renewing Felsenstein’s phylogenetic bootstrap in the era of big data”, Nature 556, No. 7702, 452–456 (2018; doi:10\.1038/s41586-018-0043-0)] showed that TD can be a powerful tool to assess the branch support on large phylogenies, thus providing a relevant alternative to Felsenstein’s bootstrap. This distance allows a reference branch $$\beta$$ in a reference tree $${\mathcal{T}}$$ to be compared to a branch $$b$$ from another tree $$T$$ (typically a bootstrap tree), both on the same set of $$n$$ taxa. The TD between these branches is the number of taxa that must be transferred from one side of $$b$$ to the other in order to obtain $$\beta$$. By taking the minimum TD from $$\beta$$ to all branches in $$T$$ we define the transfer index, denoted by $$\phi (\beta ,T)$$, measuring the degree of agreement of $$T$$ with $$\beta$$. Let us consider a reference branch $$\beta$$ having $$p$$ tips on its light side and define the transfer support (TS) as $$1 - \phi (\beta ,T)/(p-1)$$. Lemoine et al. [loc. cit.] used computer simulations to show that the TS defined in this manner is close to 0 for random “bootstrap” trees. In this paper, we demonstrate that result mathematically: when $$T$$ is randomly drawn, TS converges in probability to 0 when $$n$$ tends to $$\infty$$. Moreover, we fully characterize the distribution of $$\phi (\beta ,T)$$ on caterpillar trees, indicating that the convergence is fast, and that even when $$n$$ is small, moderate levels of branch support cannot appear by chance.

### MSC:

 92D15 Problems related to evolution 62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text:

### References:

 [1] André, D., Solution directe du problème résolu par M. Bertrand, CR Acad Sci Paris, 105, 436-437, (1887) · JFM 19.0200.05 [2] Boc, A.; Philippe, H.; Makarenkov, V., Inferring and validating horizontal gene transfer events using bipartition dissimilarity, Syst Biol, 59, 195-211, (2010) [3] Bogdanowicz, D.; Giaro, K., Matching split distance for unrooted binary phylogenetic trees, IEEE/ACM Trans Comput Biol Bioinf, 9, 150-160, (2012) [4] Charon, I.; Denœud, L.; Guénoche, A.; Hudry, O., Maximum transfer distance between partitions, J Classif, 23, 103-121, (2006) · Zbl 1244.05023 [5] Day, WHE, The complexity of computing metric distances between partitions, Math Soc Sci, 1, 269-287, (1981) · Zbl 0497.62049 [6] Day, WHE, Optimal algorithms for comparing trees with labeled leaves, J Classif, 2, 7-28, (1985) · Zbl 0589.62044 [7] Denœud, L., Transfer distance between partitions, Adv Data Anal Classif, 2, 279-294, (2008) · Zbl 1284.05319 [8] Dubhashi DP, Panconesi A (2009) Concentration of measure for the analysis of randomized algorithms. Cambridge University Press, Cambridge · Zbl 1213.60006 [9] Farris, JS, Methods for computing wagner trees, Syst Zool, 19, 83-92, (1970) [10] Feller W (1968) An introduction to probability theory and its applications, vol 1. Wiley, New York · Zbl 0155.23101 [11] Felsenstein, J., Confidence limits on phylogenies: an approach using the bootstrap, Evolution, 39, 783-791, (1985) [12] Fitch, WM, Toward defining the course of evolution: minimum change for a specific tree topology, Syst Zool, 20, 406-416, (1971) [13] Fréchet M (1940) Les probabilités associées à un système d’événements compatibles et dépendants. Actualités scientifiques et industrielles, Hermann & Cie · JFM 66.1299.02 [14] Hartigan, JA, Minimum mutation fits to a given tree, Biometrics, 29, 53-65, (1973) [15] Hillis, DM; Bull, JJ, An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis, Syst Biol, 42, 182-192, (1993) [16] Hoeffding, W., Probability inequalities for sums of bounded random variables, J Am Stat Assoc, 58, 13-30, (1963) · Zbl 0127.10602 [17] Lemoine, F.; Domelevo Entfellner, JB; Wilkinson, E.; Correia, D.; Dávila Felipe, M.; Oliveira, T.; Gascuel, O., Renewing Felsenstein’s phylogenetic bootstrap in the era of big data, Nature, 556, 452-456, (2018) [18] Lin, Y.; Rajan, V.; Moret, BME, A metric for phylogenetic trees based on matching, IEEE/ACM Trans Comput Biol Bioinf, 9, 1014-1022, (2012) [19] Mohanty G (1979) Lattice path counting and applications. In: Probability and mathematical statistics. Academic Press, Cambridge · Zbl 0455.60013 [20] Régnier, S., Sur quelques aspects mathématiques des problèmes de classification automatique, ICC Bull, 4, 175-191, (1965) [21] Robinson, D.; Foulds, L., Comparison of phylogenetic trees, Math Biosci, 53, 131-147, (1981) · Zbl 0451.92006 [22] Semple C, Steel M (2003) Phylogenetics. Oxford University Press, Oxford · Zbl 1043.92026 [23] Steel M (2016) Phylogeny: discrete and random processes in evolution. CBMS-NSF Regional Conference Series on Mathematics, Society for Industrial and Applied Mathematics · Zbl 1361.92001 [24] Steel M, Penny D (2005) Maximum parsimony and the phylogenetic information in multi-state characters. In: Albert V (ed) Parsimony, phylogeny and genomics, chap 9. Oxford University Press, Oxford, pp 163-178
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.