×

zbMATH — the first resource for mathematics

Optimal algorithms for comparing trees with labeled leaves. (English) Zbl 0589.62044
Summary: Let \(R_ n\) denote the set of rooted trees with n leaves in which: the leaves are labeled by the integer in \(\{\) 1,...,n\(\}\) ; and among interior vertices only the root may have degree two. Associated with each interior vertex v in such a tree is the subset, or cluster, of leaf labels in the subtree rooted at v. Cluster \(\{\) 1,...,n\(\}\) is called trivial.
Clusters are used in quantitative measures of similarity, dissimilarity and consensus among trees. For any k trees in \(R_ n\), the strict consensus tree \(C(T_ 1,...,T_ k)\) is that tree in \(R_ n\) containing exactly those clusters common to every one of the k trees. Similarity between trees \(T_ 1\) and \(T_ 2\) in \(R_ n\) is measured by the number \(S(T_ 1,T_ 2)\) of nontrivial clusters in both \(T_ 1\) and \(T_ 2\); dissimilarity, by the number \(D(T_ 1,T_ 2)\) of clusters in \(T_ 1\) or \(T_ 2\) but not in both. Algorithms are known to compute \(C(T_ 1,...,T_ k)\) in \(O(kn^ 2)\) time, and \(S(T_ 1,T_ 2)\) and \(D(T_ 1,T_ 2)\) in \(O(n^ 2)\) time.
I propose a special representation of the clusters of any tree T in \(R_ n\), one that permits testing in constant time whether a given cluster exists in T. I describe algorithms that exploit this representation to compute \(C(T_ 1,...,T_ k)\) in O(kn) time, and \(S(T_ 1,T_ 2)\) and \(D(T_ 1,T_ 2)\) in O(n) time. These algorithms are optimal in a technical sense. They enable well-known indices of consensus between two trees to be computed in O(n) time. All these results apply as well to comparable problems involving unrooted trees with labeled leaves.

MSC:
62H30 Classification and discrimination; cluster analysis (statistical aspects)
68Q25 Analysis of algorithms and problem complexity
62-04 Software, source code, etc. for problems pertaining to statistics
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] ADAMS, E. N., III (1972), ”Consensus Techniques and the Comparison of Taxonomic Trees,”Systematic Zoology, 21, 390–397.
[2] AHO, A. V., HOPCROFT, J. E., and ULLMAN, J. D. (1974),The Design and Analysis of Computer Algorithms, Reading, Massachusetts: Addison-Wesley. · Zbl 0326.68005
[3] BOURQUE, M. (1978), ”Arbres de Steiner et Réseaux dont Certains Sommets sont à Localisation Variable,” Ph.D. dissertation, Université de Montréal, Quebec, Canada.
[4] BROWN, E. K., and DAY, W. H. E. (1984), ”A Computationally Efficient Approximation to the Nearest Neighbor Interchange Metric,”Journal of Classification, 1, 93–124. · Zbl 0563.62033
[5] CAVALLI-SFORZA, L. L., and EDWARDS, A. W. F. (1967), ”Phylogenetic Analysis Models and Estimation Procedures,”American Journal of Human Genetics, 19, 233–257.
[6] COLLESS, D. H. (1980), ”Congruence between Morphometric and Allozyme Data forMenidia Species: A Reappraisal,”Systematic Zoology, 29, 288–299.
[7] DAY, W. H. E. (1983), ”The Role of Complexity in Comparing Classifications,”Mathematical Biosciences, 66, 97–114. · Zbl 0525.62058
[8] HARARY, F. (1969),Graph Theory, Reading, Massachusetts: Addison-Wesley. · Zbl 0182.57702
[9] HENDY, M. D., LITTLE, C. H. C., and PENNY, D. (1984), ”Comparing Trees with Pendant Vertices Labelled,”SIAM Journal on Applied Mathematics Theory, 44, 1054–1065. · Zbl 0557.92010
[10] MARCZEWSKI, E., and STEINHAUS, H. (1958), ”On a Certain Distance of Sets and the Corresponding Distance of Functions,”Colloquium Mathematicum, 6, 319–327. · Zbl 0089.03502
[11] MARGUSH, T. (1982), ”Distances Between Trees,”Discrete Applied Mathematics, 4, 281–290. · Zbl 0504.06002
[12] MARGUSH, T., and McMORRIS, F.R. (1981), ”Consensus n-Trees,”Bulletin of Mathematical Biology, 43, 239–244. · Zbl 0455.92019
[13] McMORRIS, F.R., MERONK, D.B., and NEUMANN, D.A. (1983), ”A View of some Consensus Methods for Trees,” inNumerical Taxonomy: Proceedings of a NATO Advanced Study Institute, ed. J. Felsenstein, Berlin: Springer-Verlag, 122–126.
[14] McMORRIS, F.R., and NEUMANN, D. (1983), ”Consensus Functions Defined on Trees,”Mathematical Social Sciences, 4, 131–136. · Zbl 0511.90010
[15] MICKEVICH, M.F. (1978), ”Taxonomic Congruence,”Systematic Zoology, 27, 143–158.
[16] NELSON, G. (1979), ”Cladistic Analysis and Synthesis: Principles and Definitions, with a Historical Note on Adanson’sFamilles des Plantes (1763–1764),”Systematic Zoology, 28, 1–21.
[17] NELSON, G., and PLATNICK, N. (1981),Systematics and Biogeography: Cladistics and Vicariance, New York: Columbia University Press.
[18] NEUMANN, D.A. (1983), ”Faithful Consensus Methods for n-Trees,”Mathematical Biosciences, 63, 271–287. · Zbl 0539.62073
[19] RESTLE, F. (1959), ”A Metric and an Ordering on Sets,”Psychometrika, 24, 207–220. · Zbl 0087.15903
[20] ROBINSON, D.F. (1971), ”Comparison of Labeled Trees with Valency Three,”Journal of Combinatorial Theory, 11, 105–119. · Zbl 0217.31201
[21] ROBINSON, D.F., and FOULDS, L.R. (1981), ”Comparison of Phylogenetic Trees,”Mathematical Biosciences, 53, 131–147. · Zbl 0451.92006
[22] ROHLF, F.J. (1982), ”Consensus Indices for Comparing Classifications,”Mathematical Biosciences, 59, 131–144.
[23] ROHLF, F.J. (1983), ”Numbering Binary Trees with Labeled Terminal Vertices,”Bulletin of Mathematical Biology, 45, 33–40. · Zbl 0513.05027
[24] SCHUH, R.T., and FARRIS, J.S. (1981), ”Methods for Investigating Taxonomic Congruence and Their Application to the Leptopodomorpha,”Systematic Zoology, 30, 331–351.
[25] SHAO, K. (1983), ”Consensus Methods in Numerical Taxonomy,” Ph.D. dissertation, State University of New York, Stony Brook, New York.
[26] SOKAL, R.R., and ROHLF, F.J. (1981), ”Taxonomic Congruence in the Leptopodomorpha Re-examined,”Systematic Zoology, 30, 309–325.
[27] STANDISH, T.A. (1980),Data Structure Techniques, Reading, Massachusetts: Addison-Wesley.
[28] STINEBRICKNER, R. (1984), ”s-Consensus Trees and Indices,”Bulletin of Mathematical Biology, 46, 923–935. · Zbl 0546.92001
[29] TATENO, Y., NEI, M., and TAJIMA, F. (1982), ”Accuracy of Estimated Phylogenetic Trees from Molecular Data I. Distantly Related Species,”Journal of Molecular Evolution, 18, 387–404.
[30] WATERMAN, M.S., and SMITH, T.F. (1978), ”On the Similarity of Dendrograms,”Journal of Theoretical Biology, 73, 789–800.
[31] WEIDE, B. (1977), ”A Survey of Analysis Techniques for Discrete Algorithms,”Computing Surveys, 9, 291–313. · Zbl 0366.68040
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.