A method for comparing two hierarchical clusterings. (English) Zbl 0545.62042

Summary: This article concerns the derivation and use of a measure of similarity between two hierarchical clusterings. The measure, \(B_ k\), is derived from the matching matrix, \([m_{ij}]\), formed by cutting the two hierarchical trees and counting the number of matching entries in the k clusters in each tree. The mean and variance of \(B_ k\) are determined under the assumption that the margins of \([m_{ij}]\) are fixed. Thus, \(B_ k\) represents a collection of measures for \(k=2,...,n-1.\) (k, \(B_ k)\) plots are found to be useful in portraying the similarity of two \(clusterings.\)
\(B_ k\) is compared to other measures of similarity proposed respectively by F. B. Baker [Stability of two hierarchical grouping techniques. Case I: Sensitivity to data errors. ibid. 69, 440-445 (1974)] and W. M. Rand [Objective criteria for evaluation of clustering methods. ibid. 66, 846-850 (1971)]. The use of \((k,B_ k)\) plots for studying clustering methods is explored by a series of Monte-Carlo sampling experiments. An example of the use of \((k,B_ k)\) on real data is given.
The paper is commented upon by D. L. Wallace, J. W. van Ness, I. T. Jolliffe and B. J. T. Morgan, M. A. Wong, and D. W. Turner.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
Full Text: DOI