×

zbMATH — the first resource for mathematics

Asymptotics of a clustering criterion for smooth distributions. (English) Zbl 1336.62172
Summary: We develop a clustering framework for observations from a population with a smooth probability distribution function and derive its asymptotic properties. A clustering criterion based on a linear combination of order statistics is proposed. The asymptotic behavior of the point at which the observations are split into two clusters is examined. The results obtained can then be utilized to construct an interval estimate of the point which splits the data and develop tests for bimodality and presence of clusters.

MSC:
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62E20 Asymptotic distribution theory in statistics
62F05 Asymptotic properties of parametric tests
62G30 Order statistics; empirical distribution functions
60F17 Functional limit theorems; invariance principles
PDF BibTeX XML Cite
Full Text: DOI Euclid arXiv
References:
[1] Adler, R. J. (1990). An Introduction to Continuity, Extrema, and Related Topics for General Gaussian Processes. Lecture Notes-Monograph Series 12 . · Zbl 0747.60039
[2] Arnold, S. J. (1979). A Test for Clusters. Journal of Marketing Research 16 545-551.
[3] Bharath, K., Pozdnyakov, V. and Dey, D. K. (2012). Asymptotics of Empirical Cross-over Function. Unpublished manuscript . · Zbl 1334.62089
[4] Billingsley, P. (1968). Convergence of Probability Measures . John Wiley and Sons, New York. · Zbl 0172.21201
[5] Bock, H. H. (1985). On Some Significance Tests in Cluster Analysis. Journal of Classification 2 77-108. · Zbl 0587.62048
[6] Cuesta-Albertos, J. A., Gordaliza, A. and Matrán, C. (1997). Trimmed \(k\)-means: An Attempt to Robustify Quantizers. Annals of Statistics 25 553-576. · Zbl 0878.62045
[7] Devroye, L. (1981). Laws of Iterated Logarithm for Order Statistics of Uniform Spacings. Annals of Probability 9 860-867. · Zbl 0465.60038
[8] Engleman, L. and Hartigan, J. A. (1969). Percentage Points of a Test for Clusters. Journal of the American Statistical Association 64 1647-1648.
[9] García-Escudero, L. A., Gordaliza, A. and Matrán, C. (1999). A Central Limit Theorem for Multivariate Generalized Trimmed \(k\)-means. Annals of Statistics 27 1061-1079. · Zbl 0984.62042
[10] Hartigan, J. (1978). Asymptotic Distributions for Clustering Criteria. Annals of Statistics 6 117-131. · Zbl 0377.62033
[11] Hartigan, J. A. and Hartigan, P. M. (1985). A Dip Test of Unimodality. Annals of Statistics 13 70-84. · Zbl 0575.62045
[12] Holzmann, H. and Vollmer, S. (2008). A Likelihood Ratio Test for Bimodality in Two-component Mixtures with Application to Regional Income Distribution in the EU. Advances in Statistical Analysis 92 57-69. · Zbl 1171.62013
[13] Pollard, D. (1981). Strong Consistency for \(K\)-Means Clustering. Annals of Statistics 9 135-140. · Zbl 0451.62048
[14] Pollard, D. (1982). A Central Limit Theorem for \(k\)-means Clustering. Annals of Statistics 10 919-926. · Zbl 0502.62055
[15] Pollard, D. (1984). Convergence of Stochastic Processes . Springer-Verlag, New York. · Zbl 0544.60045
[16] Schwab, J., Podsiadlowski, P. H. and Rappaport, S. (2012). Further Evidence for the Bimodal Distribution of Neutron-Star Masses. The Astrophysical Journal 719 722-727.
[17] Serfling, R. (1980). Approximation Theorems for Mathematical Statistics . John Wiley, New york. · Zbl 0538.62002
[18] Serinko, R. J. and Babu, G. J. (1992). Weak Limit Theorems for Univariate \(k\)-means Clustering under Nonregular Conditions. Journal of Multivariate Analysis 49 188-203. · Zbl 0753.60026
[19] Stigler, S. M. (1973). The Asymptotic Distribution of the Trimmed Mean. Annals of Statistics 1 472-477. · Zbl 0261.62016
[20] Wolfe, J. H. (1970). Pattern Clustering by Multivariate Mixture Analysis . Multivariate Behavioral Research 5 329-350.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.