Representations of partial leaf sets in phylogenetic tree space. (English) Zbl 1432.92064

The metric space of phylogenetic trees provides a natural geometric setting for describing collections of trees on one set of taxa. However, sometimes it is necessary to analyze collections of trees on non-identical sets of taxa. The authors refine and adapt the algorithm for working with metric trees to give a complete characterization of the subspace of subtree extensions. The algorithm can be used to determine and search for the space of possible super-trees, for a set of tree fragments with different sets of leaves, and also to measure their compatibility. The authors’ approach to this problem uses a reduction in the dimension of the tree, which displays a map from the tree-like space to a lower-dimensional tree-like space containing all the trees with a subset of leaves. This mapping is induced by the natural projection of the subspace. First, a prototype of this map is constructed, which can be used to restore information about the source tree from images. This mapping is also the foundation for previous applications, which are solved by comparing the map with its inverse images in a common domain space. This exact problem associated with the general analysis of trees with a different number of taxa in the BHV tree space (Billera-Holmes-Vogtmann) has already been considered: there is a theory underlying the combinatorial step to compare trees with different sets of taxa. But the article gives an algorithm that refines these results and shows their value for calculating the reduction in the dimension of a tree and its inverse image. Analysis in the BHV space, of course, is not the only way to solve problems of this type. Any tool can be used to analyze sets of trees with the same taxa. In the context of reconstructing a species tree from gene trees, the relationship between these trees is modeled by the merging process, and algorithms and approaches specific to this situation can use this model. In order not to make simplifying assumptions, software packages are also currently available that use Bayesian merge methods to combine several parallel incomplete data samples into one tree, based on the origin data, and not on the trees. Algorithms also exist based on the assumption that differences in topology arise from recombination events that aggregate metric data into phylogenetic networks. These algorithms can often also contain heterogeneous data. However, they have the same drawback as most classical phylogenetic tree algorithms, in that they create a single tree or tree object, but not a region of possible trees in the space of trees. But none of these methods guarantees that the filled distance matrix is additive, and thus, although the matrix can be successfully used in further analysis, it may not correspond directly to the completed tree, as in our structure. There is also the problem of reconstructing a supertree, the purpose of which is to unite partially overlapping phylogenies into a common tree. With the geometric structure proposed in this article, it is possible to identify and calculate some useful objects. To do this, use the error stability parameters for individual trees to determine two parameters that measure the degree of metric distortion for a collection of trees that satisfy the combinatorial compatibility condition. The parameters represent the minimum error (uniform or proportional) needed to build the supertree. These parameters will be obtained as a result of linear optimization problems associated with equations defining spaces of approximate solutions, and can be directly calculated using the most effective linear programming methods. The authors consider the prospect of work an extension of definitions, calculations, and parameters for sets of trees that should not be combinatorially compatible. In addition, the probabilistic framework for random trees on different leaf sets will be able to attach importance to the threshold tests, which are heuristically set in this article.


92D15 Problems related to evolution
52C45 Combinatorial complexity of geometric structures
05C05 Trees
Full Text: DOI Link


[1] W. A. Akanni, M. Wilkinson, C. J. Creevey, P. G. Foster, and D. Pisani, Implementing and testing Bayesian and maximum-likelihood supertree methods in phylogenetics, Royal Soc. Open Sci., 2 (2015), 140436.
[2] M. Bačák, Computing medians and means in Hadamard spaces, SIAM J. Optim., 24 (2014), pp. 1542-1566, https://doi.org/10.1137/140953393. · Zbl 1306.49046
[3] P. Benner, M. Bačák, and P. Y. Bourguignon, Point estimates in phylogenetic reconstructions, Bioinformatics, 30 (2014), pp. i534-i540.
[4] L. Billera, S. Holmes, and K. Vogtmann, Geometry of the space of phylogenetic trees, Adv. in Appl. Math., 27 (2001), pp. 733-767. · Zbl 0995.92035
[5] O. R. Bininda-Emonds, Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, Springer Science & Business Media, Dordrecht, 2004. · Zbl 1060.68083
[6] D. G. Brown and M. Owen, Mean and Variance of Phylogenetic Trees, preprint, https://arxiv.org/abs/1708.00294, 2017.
[7] P. Buneman, F. R. Hodson, D. G. Kendall, and P. Tautau, The recovery of trees from measure of dissimilarity, in Mathematics and the Archaeological and Historical Sciences, Edinburgh University Press, Edinburgh, Scotland, 1971, pp. 387-395.
[8] A. J. Drummond and A. Rambaut, BEAST: Bayesian evolutionary analysis by sampling trees, BMC Evol. Biol., 7 (2007), 214.
[9] K. Gori, T. Suchan, N. Alvarez, N. Goldman, and C. Dessimoz, Clustering genes of common evolutionary history, Mol. Biol. Evol., 33 (2016), pp. 1590-1605.
[10] J. Heled and A. J. Drummond, Bayesian inference of species trees from multilocus data, Mol. Biol. Evol., 27 (2009), pp. 570-580.
[11] L. Liu, BEST: Bayesian estimation of species trees under the coalescent model, Bioinformatics, 24 (2008), pp. 2542-2543.
[12] W. P. Maddison, Gene trees in species trees, Syst. Biol., 46 (1997), pp. 523-536.
[13] E. Miller, M. Owen, and J. S. Provan, Polyhedral computational geometry for averaging metric phylogenetic trees, Adv. in Appl. Math., 68 (2015), pp. 51-91. · Zbl 1329.68266
[14] S. Mirarab and T. Warnow, ASTRAL-II: Coalescent-based species tree estimation with many hundreds of taxa and thousands of genes, Bioinformatics, 31 (2015), pp. i44-i52.
[15] M. Owen, Computing geodesic distances in tree space, SIAM J. Discrete Math., 25 (2011), pp. 1506-1529, https://doi.org/10.1137/090751396. · Zbl 1237.05045
[16] M. Owen and S. Provan, A fast algorithm for computing geodesic distances in tree space, IEEE/ACM Trans. Comput. Biol. Bioinform., 8 (2011), pp. 2-13.
[17] Y. Ren, S. Zha, J. Bi, J. A. Sanchez, C. Monical, M. Delcourt, R. Guzman, and R. Davidson, A Combinatorial Method for Connecting BHV Spaces Representing Different Numbers of Taxa, preprint, https://arxiv.org/abs/1708.02626, 2017.
[18] J. A. Rhodes, Topological Metrizations of Trees, and New Quartet Methods of Tree Inference, preprint, https://arxiv.org/abs/1704.02004, 2017.
[19] C. Than, D. Ruths, and L. Nakhleh, PhyloNet: A software package for analyzing and reconstructing reticulate evolutionary histories, BMC Bioinform., 9 (2008), 322.
[20] S. Tsukiyama, M. Ide, H. Ariyoshi, and I. Shirakawa, A new algorithm for generating all the maximal independent sets, SIAM J. Comput., 6 (1977), pp. 505-517, https://doi.org/10.1137/0206036. · Zbl 0364.05027
[21] D. M. de Vienne, S. Ollier, and G. Aguileta, Phylo-MCOA: A fast and efficient method to detect outlier genes and species in phylogenomics using multiple co-inertia analysis, Mol. Biol. Evol., 29 (2012), pp. 1587-1598.
[22] T. Warnow, Supertree Construction: Opportunities and Challenges, preprint, https://arxiv.org/abs/1805.03530, 2018.
[23] G. Weyenberg, P. M. Huggins, C. L. Schardl, D. K. Howe, and R. Yoshida, KDETREES: Non-parametric estimation of phylogenetic tree distributions, Bioinformatics, 30 (2014), pp. 2280-2287.
[24] M. Wilkinson, J. A. Cotton, C. Creevey, O. Eulenstein, S. R. Harris, F.-J. Lapointe, C. Levasseur, J. O. Mcinerney, D. Pisani, and J. L. Thorley, The shape of supertrees to come: Tree shape related properties of fourteen supertree methods, Syst. Biol., 54 (2005), pp. 419-431.
[25] A. Willis, Confidence sets for phylogenetic trees, J. Amer. Statist. Assoc., 114 (2019), pp. 235-244. · Zbl 1462.62693
[26] N. Yasui, C. Vogiatzis, R. Yoshida, and K. Fukumizu, imPhy: Imputing phylogenetic trees with missing information using mathematical programming, IEEE/ACM Trans. Comput. Biol. Bioinform., submitted.
[27] S. Zairis, H. Khiabanian, A. J. Blumberg, and R. Rabadan, Moduli spaces of phylogenetic trees describing tumor evolutionary patterns, in Brain Informatics and Health, Lecture Notes in Comput. Sci. 8609, Springer, Cham, 2014, pp. 528-539, https://doi.org/10.1007/978-3-319-09891-3_48.
[28] S. Zairis, H. Khiabanian, A. J. Blumberg, and R. Rabadan, Genomic Data Analysis in Tree Spaces, preprint, https://arxiv.org/abs/1607.07503, 2016.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.