×

Clustering on trees. (English) Zbl 0900.62327

Summary: Contiguity-constrained clustering problems in which the underlying graph is a tree arise in a variety of applications: hierarchical database paging, communication and distribution network districting, biological taxonomy, and others. They are formulated here as optimal tree partitioning problems. Their computational complexity is seen to depend strongly on the nature of the objective function. On the one hand, we give a polynomial-time algorithm for split maximization, and on the other, we prove that both the minimum diameter problem and the minimum inner dissimilarity ones are NP-complete even for stars. Finally, we describe two simple heuristics for the minimum inner dissimilarity problem and derive a priori bounds on their relative approximation errors, exploiting the submodularity of an associated set function.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
65Y20 Complexity and performance of numerical algorithms

Software:

clusfind
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Aho, A. V.; Hopcroft, J. E.; Ullman, J. D.: The design and analysis of computer algorithms. (1975) · Zbl 0307.68053
[2] Anderberg, M. R.: Cluster analysis for applications. (1973) · Zbl 0299.62029
[3] Ben-Bassat, M.; Zaidenberg, L.: Contextual template matching: a distance measure for patterns with hierarchically dependent features. IEEE trans. PAMI 6, 201-211 (1984) · Zbl 0528.68068
[4] Day, W. H. E: Complexity theory: an introduction for practitioners of classification. Clustering and classification (1996) · Zbl 0904.62075
[5] Delattre, M.; Hansen, P.: Bicriterion cluster analysis. IEEE trans. Pattern analysis machine intelligence 2, 277-291 (1980) · Zbl 0458.62049
[6] Ferligoj, A.; Batagelj, V.: Clustering with relational constraints. Psychometrika 47, 413-426 (1982) · Zbl 0568.62059
[7] Fisher, R. D.: On grouping for maximum homogeneity. J. amer. Statist. assoc. 53, 789-798 (1958) · Zbl 0084.35904
[8] Fisher, M. L.; Nemhauser, G. L.; Wolsey, L. A.: Analysis of approximations for maximizing a sub-modular set function II. Math. programming 14, 265-294 (1978) · Zbl 0374.90045
[9] Garey, M. R.; Johnson, D. S.: Computers and intractability: a guide to the theory of NP-completeness. (1979) · Zbl 0411.68039
[10] Golumbic, M. C.; Jamison, R. E.: Edge and vertex intersection of paths in trees. Discrete math. 55, 151-159 (1985) · Zbl 0568.05023
[11] Gordon, A. D.: Classification: methods for exploratory analysis of multivariate data. (1981) · Zbl 0507.62057
[12] Guthery, S. B.: Partition regression. J. amer. Statist. assoc. 69, 945-947 (1974) · Zbl 0296.62059
[13] Hansen, P.; Jaumard, B.; Musitu, K.: Weight-constrained maximum split clustering. J. classification 7, 217-240 (1990)
[14] Hansen, P.; Jaumard, B.; Simeone, B.; Doring, V.: Maximum split clustering under connectivity constraints. Cahier G-93-06 (1993) · Zbl 1083.91063
[15] Hartigan, J. A.: Clustering algorithms. (1975) · Zbl 0372.62040
[16] Jardine, N.; Sibson, R.: Mathematical taxonomy. (1971) · Zbl 0322.62065
[17] Kaufman, L.; Rousseeuw, P. T.: Finding groups in data: an introduction to cluster analysis. (1989) · Zbl 1345.62009
[18] Lefkovitch, L. P.: Conditional clustering. Biometrics 36, 43-58 (1980) · Zbl 0424.62041
[19] Lovász, L.: Combinatorial problems and exercies. (1979) · Zbl 0439.05001
[20] Maravalle, M.; Simeone, B.: A greedy algorithm for maximum split clustering on trees. Research report 4 (Series A) (1984) · Zbl 0825.62291
[21] Maravalle, M.; Simeone, B.: A spanning tree heuristic for regional clustering. Comm. statist. Theory methods 24, 625-639 (1995) · Zbl 0825.62291
[22] Mulvey, J. M.; Beck, M. P.: Solving capacitated clustering problems. Technical report EES-82-2 (1982) · Zbl 0547.62039
[23] Mulvey, J. M.; Crowder, H. P.: Cluster analysis: an application of Lagrangian relaxation. Management sci. 25, 329-340 (1979) · Zbl 0415.90085
[24] Murtagh, F.: A survey of algorithms for contiguity-constrained clustering and related problems. Comput. J. 28, 82-88 (1985)
[25] Perruchet, C.: Constrained agglomerative hierarchical classification. Pattern recognition 16, 213-217 (1983)
[26] Rao, M. R.: Cluster analysis and mathematical programming. J. amer. Statist. assoc. 66, 622-626 (1971) · Zbl 0238.90042
[27] Schkolnick, M.: A clustering algorithm for hierarchical structures. ACM trans. Database systems 2, 27-44 (1977)
[28] Scott, A. J.: Combinatorial programming, spatial analysis and planning. (1971) · Zbl 0249.90047
[29] Späth, H.: Cluster analysis algorithms for data reduction and classification of objects. (1980) · Zbl 0435.62059
[30] Späth, H.: Anticlustering: maximizing the variance criterion. Control cybernet. 15, 213-218 (1986) · Zbl 0621.62066
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.