## A partial order and cluster-similarity metric on rooted phylogenetic trees.(English)Zbl 1435.92045

This article presents an alternative metric based on cluster similarity, which has several advantages. The metric is based on graded partial order, which means that the rank can be used to estimate tree distances. The metric also relies on a natural local operation of moving in tree space, which makes it easy to calculate the neighborhood of a given tree – a particularly useful property when examining the MCMC (Markov Chain Monte Carlo) tree space. Finally, trees have much larger neighborhoods than other local operation metrics.
Although the calculation of distances using a metric is nontrivial, the authors give an approximation of the upper boundary, which corresponds to the true distance in most cases in experimental modeling. This approximation takes polynomial time, and the simulation assumes that the upper bound for the metric does not skew (in contrast to the Robinson-Foulds distance), so there is hope that this metric will also not be skewed.
The metric is based on the concept of a hierarchical map that links trees that have similar hierarchies. It is assumed that the new metric will surpass the Robinson-Foulds metric in distinguishing tree sets from real data, as computational experiments have shown that the current metric remains successful in recognizing bifurcated trees in a particular case.
Finally, since the approximation of the upper bound is easy to calculate and relatively accurate, it will also reduce problems with the speed of computation. Section 2 introduces the concept of preserving a hierarchy map between trees; it is shown that maps preserving the hierarchy induce a partial order on the set of root phylogenetic trees. Section 3 proposes a metric based on a partial order Hasse diagram induced by mappings preserving the hierarchy. Section 4 gives an algorithm for calculating the upper boundary of a metric and initial results on its properties. Section 5 shows some results of calculations on the program for calculating the upper bound of a metric. Thus, the new metric on the phylogenetic tree-space presented in this article has a number of useful properties for biological applications: as far as it’s a cluster similarity metric, the concept of distance between two trees corresponds to the similarity of their hierarchies. Unlike other cluster similarity metrics, this metric has a simple local operation for moving through the space of trees, providing easy calculation of neighborhoods.
It can be expected that this function, combined with the property of cluster similarity, will help MCMC searches for the tree-like space around trees of similar hierarchies. In addition, the distribution of distances in a given space of trees seems quite symmetrical and also has a reasonable spread of values. This makes it possible to distinguish trees in the required manner in biological research. Finally, the concept of preserving a hierarchy of cards may be of independent mathematical interest.

### MSC:

 92D15 Problems related to evolution 05C05 Trees 06A07 Combinatorics of partially ordered sets

### Keywords:

phylogenetic tree; partial order; metric; hierarchy; algorithm

### Software:

PhyloNetwork; GitHub; ClusterMetric
Full Text:

### References:

 [1] Alberich, R.; Cardona, G.; Rosselló, F.; Valiente, G., An algebraic metric for phylogenetic trees, Appl Math Lett, 22, 9, 1320-1324 (2009) · Zbl 1173.05314 [2] Bogdanowicz, D.; Giaro, K., On a matching distance between rooted phylogenetic trees, Int J Appl Math Comput Sci, 23, 3, 669-684 (2013) · Zbl 1294.05066 [3] Cardona, G.; Llabrés, M.; Rosselló, F.; Valiente, G., Nodal distances for rooted phylogenetic trees, J Math Biol, 61, 2, 253-276 (2009) · Zbl 1202.92060 [4] Cole, Sr; Wright, Df; Ausich, Wi, Phylogenetic community paleoecology of one of the earliest complex crinoid faunas (Brechin Lagerstätte, Ordovician), Palaeogeogr Palaeoclimatol Palaeoecol, 521, 82-98 (2019) [5] Guo, M-Z; Li, J-F; Liu, Y., A topological transformation in evolutionary tree search methods based on maximum likelihood combining p-ECR and neighbor joining, BMC Bioinform, 9, Suppl 6, S4 (2008) [6] Hein, J., Reconstructing evolution of sequences subject to recombination using parsimony, Math Biosci, 98, 2, 185-200 (1990) · Zbl 0692.92014 [7] Hendriksen M (2019) Clustermetric. https://github.com/mahendriksen/ClusterMetric. GitHub repository [8] Kuhner, Mk; Yamato, J., Practical performance of tree comparison metrics, Syst Biol, 64, 2, 205-214 (2014) [9] Moore, Gw; Goodman, M.; Barnabas, J., An iterative approach from the standpoint of the additive hypothesis to the dendrogram problem posed by molecular data sets, J Theor Biol, 38, 3, 423-457 (1973) [10] Moulton, V.; Taoyang, W., A parsimony-based metric for phylogenetic trees, Adv Appl Math, 66, 22-45 (2015) · Zbl 1315.05034 [11] Okumura, K.; Shimomura, Y.; Murayama, S.; Yagi, J.; Ubukata, K.; Kirikae, T.; Miyoshi-Akiyama, T., Evolutionary paths of streptococcal and staphylococcal superantigens, BMC Genom, 13, 1, 404 (2012) [12] Robinson, Df; Foulds, Lr, Comparison of phylogenetic trees, Math Biosci, 53, 1-2, 131-147 (1981) · Zbl 0451.92006 [13] Sevillya, G.; Snir, S., Synteny footprints provide clearer phylogenetic signal than sequence data for prokaryotic classification, Mol Phylogenet Evol, 136, 128-137 (2019) [14] Shuguang, L.; Zhihui, L., Algorithms for computing cluster dissimilarity between rooted phylogenetic trees, Open Cybern Syst J, 9, 1, 2218-2223 (2015) [15] Steel, Ma, Distribution of the symmetric difference metric on phylogenetic trees, SIAM J Discrete Math, 1, 4, 541-551 (1988) · Zbl 0675.05024 [16] Steel, M., Phylogeny: discrete and random processes in evolution (2016), Philadelphia: Society for Industrial and Applied Mathematics, Philadelphia · Zbl 1361.92001 [17] Zhang, L-N; Ma, P-F; Zhang, Y-X; Zeng, C-X; Zhao, L.; Li, D-Z, Using nuclear loci and allelic variation to disentangle the phylogeny of Phyllostachys (Poaceae, Bambusoideae), Mol Phylogenet Evol, 137, 222-235 (2019)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.