×

zbMATH — the first resource for mathematics

Efficient algorithms for divisive hierarchical clustering with the diameter criterion. (English) Zbl 0739.62050
Divisive hierarchical clustering algorithms with the diameter criterion proceed by recursively selecting the cluster with largest diameter and partitioning it into two clusters whose largest diameter is the smallest possible. We provide two such algorithms with complexities \(O(\overline{M} N^ 2)\) and \(O(N^ 2\log N)\), respectively, where \(\overline{M}\) denotes the maximum number of clusters in a partition and \(N\) the number of entities to be clustered.
The former algorithm, an efficient implementation of an algorithm of L. Hubert [see J. Am. Stat. Assoc. 69, 698-704 (1974; Zbl 0291.62071)] allows to find all partitions into at most \(\overline{M}\) clusters and is in \(O(N^ 2)\) for fixed \(\overline{M}\). Moreover, if in each partitioning the size of the largest cluster is bounded by \(p\) times the number of entities in the set to be partitioned, with \(1/2\leq p<1\), it provides a complete hierarchy of partitions in \(O(N^ 2 \log N)\) time. The latter algorithm, allows to build a complete hierarchy of partitions in \(O(N^ 2\log N)\) time without any restriction. Comparative computational experiments with both algorithms and with an agglomerative hierarchical algorithm are reported. (From the authors’ abstract).

MSC:
62H30 Classification and discrimination; cluster analysis (statistical aspects)
68Q25 Analysis of algorithms and problem complexity
Software:
heapsort
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] AHO, A. V., HOPCROFT, J. F., and ULLMAN, J. D. (1974).The Design and Analysis of Computer Algorithms, Reading, Massachusetts: Addison-Wesley. · Zbl 0326.68005
[2] BENZÉCRI, J. P. (1973),L’Analyse de Données. 1, La Taxinomie, Paris: Dunod.
[3] BENZÉCRI, J. P. (1982), ”Construction d’une classification ascendante hiérarchique par la recherche en chaîne des voisins réciproques,”Les Cahiers de l’Analyse des Données, (VII 2, 209–218.
[4] BRUCKER, P. (1978), ”On the Complexity of Clustering Problems,” inOptimization and Operations Research, Lecture Notes in Economics and Mathematical Systems, 157, Eds. M. Beckmann and H. Kunzi, Heidelberg: Springer-Verlag, 45–54.
[5] DAY, H. E., and EDELSBRUNNER, H. (1984), ”Effcient Algorithrns for Agglomerative Hierarchical Clustering Methods,”Journal of Classification, 1, 7–24. · Zbl 0563.62034 · doi:10.1007/BF01890115
[6] DEFAYS, D. (1977), ”An Efficient Algorithm for a Complete Link Method,”The Computer Journal, 20(4) 364–366. · Zbl 0364.68038 · doi:10.1093/comjnl/20.4.364
[7] DELATTRE, M., and HANSEN, P. (1980), ”Bicriterion Cluster Analysis,”IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-2(4), 277–291. · Zbl 0458.62049 · doi:10.1109/TPAMI.1980.4767027
[8] DIDAY, E., et al. (1979),Optimisation en Classification Automatique, Le Chesnay: INRIA. · Zbl 0471.62056
[9] DIJKSTRA, E. W. (1959), ”A Note on Two Problems in Connexion with Graphs,”Numerische Mathematik, 1, 269–271. · Zbl 0092.16002 · doi:10.1007/BF01386390
[10] DORING, V., HANSEN, P., and JAUMARD, B. (1990), ”Divisive Hierarchical Clustering with the Diameter Criterion: Hubet’s Algorithm,”Cahiers du GERAD (forthcoming).
[11] DORING, V., HANSEN, P., and JAUMARD, B. (1990), ”Divisive Hierarchical Clustering with the Diameter Criterion: Algorithm Alltrees,”Cahiers du GERAD (forthcoming).
[12] EDWARDS, A. F. S., and CAVALLI-SFORZA, L. L. (1965), ”A Method for Cluster Analysis,”Biometrics, 21(2), 362–375. · doi:10.2307/2528096
[13] FISHER, R. A. (1936), ”The Use of Multiple Measurements in Taxonomic Problems,”Annals of Eugenics, VII(II), 179–188.
[14] GAREY, M. R., and JOHNSON, D. S. (1979),Computers and Intractability: A Guide to the Theory of NP-Completeness. San Francisco: Freeman. · Zbl 0411.68039
[15] GROTSCHEL, M., and WAKABAYASHI, Y. (1989), ”A Cutting-plane Algorithm for a Clustering Problem,”Mathematical Programing, 45(1),Series B, 59–96. · Zbl 0675.90072 · doi:10.1007/BF01589097
[16] GUÉNOCHE, A. (1989), ”Partitions with Minimum Diameter,” Conference of the I.F.C.S., Charlottesville. · Zbl 0782.05082
[17] HANSEN, P., and DELATTRE, M. (1978), ”Complete Link Cluster Analysis by Graph Coloring,”Journal of the American Statistical Association, 73, 397–403. · Zbl 0432.05004 · doi:10.2307/2286672
[18] HANSEN, P., and JAUMARD, B. (1987), ”Minimum Sum of Diameters Clustering,”Journal of Classification, 4, 215–226. · Zbl 0689.62042 · doi:10.1007/BF01896987
[19] HARMAN, H. H. (1967),Modern Factor Analysis Chicago: University of Chicago Press. · Zbl 0161.39805
[20] HUBERT, L. (1973), ”Monotone Invariant Clustering Procedures,”Psychometrika, 38(1), 47–62. · Zbl 0251.92013 · doi:10.1007/BF02291173
[21] JAMBU, M. (1978),Classification automatique pour l’analyse des données, Paris: Dunod (Cluster Analysis and Data Analysis, Amsterdam: North Holland, 1983).
[22] JUAN, J. (1982), ”Programme de classification hiérarchique par l’algorithme de la recherche en chaîne des voisns réciproques,”Les Cahiers de l’Analyse des Données, VII(2), 219–225. · Zbl 0505.62042
[23] KRUSKAL, J.B. (1956), ”On the Shortest Spanning Subtree of a Graph,”Proceedings of the American Mathematical Society, 7, 48–50. · Zbl 0070.18404 · doi:10.1090/S0002-9939-1956-0078686-7
[24] LANCE, G. N., and WILLIAMS, W. T. (1967), ”Mixed-data Classificatory Systems: I. Agglomerative Systems,”Australian Computer Journal, 1, 15–20.
[25] LECLERC, B. (1986), ”Caractérisation, construction et dénombrement des ultramétriques supérieures minimales,”Statistique et Analyse des Données, 11(2), 26–50.
[26] McNAUGHTON-SMITH, P., WILLIAMS, W. T., DALE, M. B., and MOCKETT, L. G. (1964), ”Dissimilarity Analysis,” Nature, 202, 1034–1035. · Zbl 0128.38503 · doi:10.1038/2021034a0
[27] MONMA, C., and SURI, S. (1989), ”Partitioning Points and Graphs to Minimize the Maximum or the Sum of Diameters,”Proceedings of the Sixth International Conference on the Theory and Applications of Graphs, New York: Wiley. · Zbl 0841.05048
[28] MURTAGH, F. (1983), ”A Survey of Recent Advances in Hierarchical Algorithms,”The Computer Journal, 26(4), 354–359. · Zbl 0523.68030
[29] MURTAGH, F. (1984), ”Complexities of Hierarchic Clustering Algorithms: State of the Art,”Computational Statistics Quarterly, 1(2), 101–113. · Zbl 0614.62076
[30] PRIM, R.C. (1957), ”Shortest Connection Networks and Some Generalizations”Bell System Technical Journal, 36, 1389–1401.
[31] RAO, M. R. (1971), ”Cluster Analysis and Mathematical Programming,”Journal of the American Statistical Association, 66, 622–626. · Zbl 0238.90042 · doi:10.2307/2283542
[32] VICHI, M. (1985), ”On a Flexible and Computationally Feasible Divisive Clustering Technique,”Revista di Statistics, 18(4), 200–208.
[33] WELCH, W.J. (1982), ”Algorithmic Complexity: Three NP-Hard Problems in Computational Statistics,”Journal of Statistical Computing and Simulation, 15, 17–25. · Zbl 0482.65080 · doi:10.1080/00949658208810560
[34] WILLIAMS, J.W.J. (1964), ”Algorithm 232, Heapsort,”Communications of the Association for Computing Machinery, 7(6), 347–348.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.