×

A graph-based estimator of the number of clusters. (English) Zbl 1187.62114

Summary: Assessing the number of clusters of a statistical population is one of the essential issues of unsupervised learning. Given \(n\) independent observations \(X_1,\dots,X_n\) drawn from an unknown multivariate probability density \(f\), we propose a new approach to estimate the number of connected components, or clusters, of the \(t\)-level set \(\mathcal L(t)=\{x:f(x) \geq t\}\). The basic idea is to form a rough skeleton of the set \(\mathcal L(t)\) using any preliminary estimator of \(f\), and to count the number of connected components of the resulting graph. Under mild analytic conditions on \(f\), and using tools from differential geometry, we establish the consistency of our method.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G05 Nonparametric estimation
05C90 Applications of graph theory
68T05 Learning and adaptive systems in artificial intelligence
62G20 Asymptotic properties of nonparametric inference

Software:

ElemStatLearn
PDFBibTeX XMLCite
Full Text: DOI Numdam EuDML

References:

[1] G.E. Bredon , Topology and Geometry , Springer-Verlag, New York, Graduate Texts in Mathematics 139 ( 1993 ). MR 1224675 | Zbl 0791.55001 · Zbl 0791.55001
[2] M.R. Brito , E.L. Chavez , A.J. Quiroz and J.E. Yukich , Connectivity of the mutual \(k\)-nearest neighbor graph in clustering and outlier detection . Statist. Probab. Lett. 35 ( 1997 ) 33 - 42 . Zbl 0898.62077 · Zbl 0898.62077 · doi:10.1016/S0167-7152(96)00213-1
[3] B. Cadre , Kernel estimation of density level sets . J. Multivariate Anal. 97 ( 2006 ) 999 - 1023 . Zbl 1085.62039 · Zbl 1085.62039 · doi:10.1016/j.jmva.2005.05.004
[4] I. Chavel , Riemannian Geometry: A Modern Introduction . Cambridge University Press, Cambridge ( 1993 ). MR 1271141 | Zbl 0810.53001 · Zbl 0810.53001
[5] T.H. Cormen , C.E. Leiserson and R.L. Rivest , Introduction to Algorithms . The MIT Press, Cambridge ( 1990 ). MR 1066870 | Zbl 1047.68161 · Zbl 1047.68161
[6] A. Cuevas , M. Febrero and R. Fraiman , Estimating the number of clusters . Canad. J. Statist. 28 ( 2000 ) 367 - 382 . Zbl 0981.62054 · Zbl 0981.62054 · doi:10.2307/3315985
[7] A. Cuevas , M. Febrero and R. Fraiman , Cluster analysis: a further approach based on density estimation . Comput. Statist. Data Anal. 36 ( 2001 ) 441 - 459 . Zbl 1053.62537 · Zbl 1053.62537 · doi:10.1016/S0167-9473(00)00052-9
[8] L. Devroye and G. Wise , Detection of abnormal behavior via nonparametric estimation of the support . SIAM J. Appl. Math. 38 ( 1980 ) 480 - 488 . Zbl 0479.62028 · Zbl 0479.62028 · doi:10.1137/0138038
[9] R.O. Duda , P.E. Hart and D.G. Stork , Pattern Classification , 2nd edition. Wiley-Interscience, New York ( 2000 ). MR 1802993 | Zbl 0968.68140 · Zbl 0968.68140
[10] L. Györfi , M. Kohler , A. Krzyżak and H. Walk , A Distribution-Free Theory of Nonparametric Regression . Springer-Verlag, New York ( 2002 ). MR 1920390 | Zbl 1021.62024 · Zbl 1021.62024 · doi:10.1007/b97848
[11] J.A. Hartigan , Clustering Algorithms . John Wiley, New York ( 1975 ). MR 405726 | Zbl 0372.62040 · Zbl 0372.62040
[12] T. Hastie , R. Tibshirani and J. Friedman , The Elements of Statistical Learning: Data Mining , Inference, and Prediction. Springer, New York ( 2001 ). MR 1851606 | Zbl 0973.62007 · Zbl 0973.62007
[13] S. Kobayashi and K. Nomizu , Foundations of Differential Geometry , Vol. I & II, 2nd edition. Wiley, New York ( 1996 ). MR 1393940 | Zbl 0119.37502 · Zbl 0119.37502
[14] U. von Luxburg and S. Ben-David , Towards a statistical theory of clustering . PASCAL Workshop on Statistics and Optimization of Clustering ( 2005 ).
[15] M.D. Penrose , A strong law for the longest edge of the minimal spanning tree . Ann. Probab. 27 ( 1999 ) 246 - 260 . Article | Zbl 0944.60015 · Zbl 0944.60015 · doi:10.1214/aop/1022677261
[16] A. Polonik , Measuring mass concentrations and estimating density contour clusters-an excess mass approach . Ann. Statist. 23 ( 1995 ) 855 - 881 . Article | Zbl 0841.62045 · Zbl 0841.62045 · doi:10.1214/aos/1176324626
[17] B.L.S. Prakasa Rao , Nonparametric Functional Estimation . Academic Press, Orlando ( 1983 ). MR 740865 | Zbl 0542.62025 · Zbl 0542.62025
[18] A.B. Tsybakov , On nonparametric estimation of density level sets . Ann. Statist. 25 ( 1997 ) 948 - 969 . Article | Zbl 0881.62039 · Zbl 0881.62039 · doi:10.1214/aos/1069362732
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.