×

Trimmed \(k\)-means: An attempt to robustify quantizers. (English) Zbl 0878.62045

Summary: A class of procedures based on “impartial trimming” (self-determined by the data) is introduced with the aim of robustifying \(k\)-means, hence the associated clustering analysis. We include a detailed study of optimal regions, showing that only nonpathological regions can arise from impartial trimming procedures. The asymptotic results provided in the paper focus on strong consistency of the suggested methods under widely general conditions. A section is devoted to exploring the performance of the procedure to detect anomalous data in simulated data sets.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62F35 Robustness and adaptive procedures (parametric inference)
60F15 Strong limit theorems

Software:

clusfind
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] ARCONES, M. A. and GINE, E. 1992. On the bootstrap of M-estimators and other statistical Ź. functions. In Exploring the Limits of Bootstrap R. Lepage and L. Billard, eds. 13 47. Wiley, New York. Z. · Zbl 0842.62027
[2] CAMBANIS, S. and GERR, N. L. 1983. A simple class of asy mptotically optimal quantiziers. IEEE Trans. Inform. Theory IT-29 664 676. Z. CUESTA-ALBERTOS, J. A. and MATRAN, C. 1988. The strong law of large numbers for k-means ánd best possible nets of Banach valued random variables. Probab. Theory Related Fields 78 523 534. Z. CUESTA-ALBERTOS, J. A., GORDALIZA, A. and MATRAN, C. 1995. On the Cauchy mean value ṕroperty for -means. Multivariate Statistics. Proceedings of the Fifth Tartu Conference on Multivariate Statistics 247 265. VSP TEV, Vilnius, Lithuania. Z. CUESTA-ALBERTOS, J. A., GORDALIZA, A. and MATRAN, C. 1996. Trimmed k-nets. Preprint. Ź.
[3] GORDALIZA, A. 1991a. Best approximations to random variables based on trimming procedures. J. Approx. Theory 64 162 180. Z. · Zbl 0745.41030
[4] GORDALIZA, A. 1991b. On the breakdown point of multivariate location estimators based on trimming procedures. Statist. Probab. Lett. 11 387 394. Z. · Zbl 0732.62051
[5] HARTIGAN, J. 1975. Clustering Algorithms. Wiley, New York. Z. · Zbl 0372.62040
[6] HARTIGAN, J. 1978. Asy mptotic distributions for clustering criteria. Ann. Statist. 6 117 131. Z. IEEE 1982. IEEE Trans. Inform. Theory IT-28. Z. · Zbl 0377.62033
[7] KAUFMAN, L. and ROUSSEEUW, P. J. 1990. Finding Groups in Data. An Introduction to Cluster Analy sis. Wiley, New York. Z. · Zbl 1345.62009
[8] POLLARD, D. 1981. Strong consistency of k-means clustering. Ann. Statist. 9 135 140. Z. · Zbl 0451.62048
[9] POLLARD, D. 1982. A central limit theorem for k-mean clustering. Ann. Probab. 10 919 926. Z. · Zbl 0502.62055
[10] ROUSSEEUW, P. J. and LEROY, A. 1987. Robust Regression and Outliers Detection. Wiley, New York. Z. · Zbl 0711.62030
[11] SERINKO, R. J. and BABU, G. J. 1992. Weak limit theorems for univariate k-mean clustering under a nonregular condition. J. Multivariate Anal. 41 273 296. Z. SVERDRUP-THy GESON, H. 1981. Strong law of large numbers for measures of central tendency and dispersion of random variables in compact metrics spaces. Ann. Statist. 9 141 145. Z. · Zbl 0753.60026
[12] TARPEY, T., LI, L. and FLURY, B. 1995. Principal points and self-consistent points of elliptical distributions. Ann. Statist. 23 103 112. · Zbl 0822.62042
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.