A population background for nonparametric density-based clustering. (English) Zbl 1426.62181

Summary: Despite its popularity, it is widely recognized that the investigation of some theoretical aspects of clustering has been relatively sparse. One of the main reasons for this lack of theoretical results is surely the fact that, whereas for other statistical problems the theoretical population goal is clearly defined (as in regression or classification), for some of the clustering methodologies it is difficult to specify the population goal to which the data-based clustering algorithms should try to get close. This paper aims to provide some insight into the theoretical foundations of clustering by focusing on two main objectives: to provide an explicit formulation for the ideal population goal of the modal clustering methodology, which understands clusters as regions of high density; and to present two new loss functions, applicable in fact to any clustering methodology, to evaluate the performance of a data-based clustering algorithm with respect to the ideal population goal. In particular, it is shown that only mild conditions on a sequence of density estimators are needed to ensure that the sequence of modal clusterings that they induce is consistent.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G07 Density estimation
Full Text: DOI arXiv Euclid


[1] Ackerman, M. and Ben-David, S. (2009). Measures of clustering quality: A working set of axioms for clustering. In Advances in Neural Information Processing Systems21 (D. Koller, D. Schuurmans, Y. Bengio and L. Bottou, eds.) 121-128. Curran Associates, Red Hook, NY. Available at http://papers.nips.cc/paper/3491-measures-of-clustering-quality-a-working-set-of-axioms-for-clustering.pdf.
[2] Agrachev, A. A., Pallaschke, D. and Scholtes, S. (1997). On Morse theory for piecewise smooth functions. J. Dyn. Control Syst.3 449-469. · Zbl 0951.49024
[3] Arabie, P. and Boorman, S. A. (1973). Multidimensional scaling of measures of distance between partitions. J. Math. Psych.10 148-203. · Zbl 0263.92003
[4] Arias-Castro, E., Mason, D. and Pelletier, B. (2013). On the estimation of the gradient lines of a density and the consistency of the mean-shift algorithm. Preprint. · Zbl 1318.62125
[5] Arnold, V. I., Goryunov, V. V., Lyashko, O. V. and Vasil’ev, V. A. (1998). Singularity Theory. I. Springer, Berlin. · Zbl 0901.58001
[6] Azzalini, A. and Torelli, N. (2007). Clustering via nonparametric density estimation. Stat. Comput.17 71-80.
[7] Ben-David, S., von Luxburg, U. and Pál, D. (2006). A sober look at clustering stability. In Learning Theory (G. Lugosi and H.-U. Simon, eds.). Lecture Notes in Computer Science4005 5-19. Springer, Berlin. · Zbl 1143.68520
[8] Bertrand-Retali, M. (1978). Convergence uniforme d’un estimateur de la densité par la méthode du noyau. Rev. Roumaine Math. Pures Appl.23 361-385. · Zbl 0375.62080
[9] Burkard, R., Dell’Amico, M. and Martello, S. (2009). Assignment Problems. SIAM, Philadelphia, PA.
[10] Cadre, B., Pelletier, B. and Pudlo, P. (2013). Estimation of density level sets with a given probability content. J. Nonparametr. Stat.25 261-272. · Zbl 1297.62070
[11] Carlsson, G. (2009). Topology and data. Bull. Amer. Math. Soc. (N.S.) 46 255-308. · Zbl 1172.62002
[12] Carlsson, G. and Mémoli, F. (2013). Classifying clustering schemes. Found. Comput. Math.13 221-252. · Zbl 1358.62057
[13] Chacón, J. E. (2009). Data-driven choice of the smoothing parametrization for kernel density estimators. Canad. J. Statist.37 249-265. · Zbl 1176.62028
[14] Chacón, J. E. (2012). Clusters and water flows: A novel approach to modal clustering through Morse theory. Preprint. Available at arXiv:1212.1384.
[15] Chacón, J. E. and Duong, T. (2013). Data-driven density derivative estimation, with applications to nonparametric clustering and bump hunting. Electron. J. Stat.7 499-532. · Zbl 1337.62067
[16] Chacón, J. E. and Monfort, P. (2014). A comparison of bandwidth selectors for mean shift clustering. In Theoretical and Applied Issues in Statistics and Demography (C. H. Skiadas, ed.) 47-59. International Society for the Advancement of Science and Technology (ISAST), Athens.
[17] Charon, I., Denœud, L., Guénoche, A. and Hudry, O. (2006). Maximum transfer distance between partitions. J. Classification23 103-121. · Zbl 1244.05023
[18] Chaudhuri, K. and Dasgupta, S. (2010). Rates of convergence for the cluster tree. In Advances in Neural Information Processing Systems (J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel and A. Culotta, eds.) 23 343-351. Curran Associates, Red Hook, NY.
[19] Chazal, F., Guibas, L. J., Oudot, S. Y. and Skraba, P. (2013). Persistence-based clustering in Riemannian manifolds. J. ACM60 Art. 41, 38. · Zbl 1281.68176
[20] Cuevas, A., Febrero, M. and Fraiman, R. (2000). Estimating the number of clusters. Canad. J. Statist.28 367-382. · Zbl 0981.62054
[21] Cuevas, A., Febrero, M. and Fraiman, R. (2001). Cluster analysis: A further approach based on density estimation. Comput. Statist. Data Anal.36 441-459. · Zbl 1053.62537
[22] Cuevas, A. and Fraiman, R. (2010). Set estimation. In New Perspectives in Stochastic Geometry (W. Kendall and I. Molchanov, eds.) 374-397. Oxford Univ. Press, Oxford. · Zbl 1192.62164
[23] Cuevas, A. and González Manteiga, W. (1991). Data-driven smoothing based on convexity properties. In Nonparametric Functional Estimation and Related Topics (Spetses, 1990) (G. Roussas, ed.). NATO Adv. Sci. Inst. Ser. C Math. Phys. Sci.335 225-240. Kluwer Academic, Dordrecht. · Zbl 0806.62024
[24] Day, W. H. E. (1980/81). The complexity of computing metric distances between partitions. Math. Social Sci.1 269-287. · Zbl 0497.62049
[25] Deheuvels, P. (1974). Conditions nécessaires et suffisantes de convergence ponctuelle presque sûre et uniforme presque sûre des estimateurs de la densité. C. R. Acad. Sci. Paris Sér. A278 1217-1220. · Zbl 0281.62041
[26] Denœud, L. (2008). Transfer distance between partitions. Adv. Data Anal. Classif.2 279-294. · Zbl 1284.05319
[27] Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Applications of Mathematics (New York) 31. Springer, New York. · Zbl 0853.68150
[28] Donoho, D. L. (1988). One-sided inference about functionals of a density. Ann. Statist.16 1390-1420. · Zbl 0665.62040
[29] Edelsbrunner, H. and Harer, J. (2008). Persistent homology—A survey. In Surveys on Discrete and Computational Geometry. Contemp. Math.453 257-282. Amer. Math. Soc., Providence, RI. · Zbl 1145.55007
[30] Einbeck, J. (2011). Bandwidth selection for mean-shift based unsupervised learning techniques: A unified approach via self-coverage. Journal of Pattern Recognition Research6 175-192.
[31] Everitt, B. S., Landau, S., Lesse, M. and Stahl, D. (2011). Cluster Analysis, 5th ed. Wiley, Chichester.
[32] Fasy, B. T., Lecci, F., Rinaldo, A., Wasserman, L., Balakrishnan, S. and Singh, A. (2014). Statistical inference for persistent homology: Confidence sets for persistence diagrams. Available at arXiv:1303.7117v2. · Zbl 1310.62059
[33] Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. J. Amer. Statist. Assoc.97 611-631. · Zbl 1073.62545
[34] Fukunaga, K. and Hostetler, L. D. (1975). The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. Inform. TheoryIT-21 32-40. · Zbl 0297.62025
[35] Genovese, C. R., Perone-Pacifico, M., Verdinelli, I. and Wasserman, L. (2015). Non-parametric inference for density modes. J. R. Stat. Soc. Ser. B. Stat. Methodol. To appear. DOI:10.1111/rssb.12111. · Zbl 1191.62062
[36] Graf, S. and Luschgy, H. (2000). Foundations of Quantization for Probability Distributions. Lecture Notes in Math.1730. Springer, Berlin. · Zbl 0951.60003
[37] Hand, D., Mannila, H. and Smyth, P. (2001). Principles of Data Mining. MIT Press, Cambridge, MA.
[38] Hartigan, J. A. (1975). Clustering Algorithms. Wiley, New York. · Zbl 0372.62040
[39] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, New York. · Zbl 1273.62005
[40] Izenman, A. J. (2008). Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning. Springer, New York. · Zbl 1155.62040
[41] Jost, J. (2011). Riemannian Geometry and Geometric Analysis, 6th ed. Universitext. Springer, Heidelberg. · Zbl 1227.53001
[42] Klemelä, J. (2009). Smoothing of Multivariate Data: Density Estimation and Visualization. Wiley, Hoboken, NJ. · Zbl 1218.62027
[43] Li, J., Ray, S. and Lindsay, B. G. (2007). A nonparametric statistical approach to clustering via mode identification. J. Mach. Learn. Res.8 1687-1723. · Zbl 1222.62076
[44] MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66) 281-297. Univ. California Press, Berkeley. · Zbl 0214.46201
[45] Mason, D. M. and Polonik, W. (2009). Asymptotic normality of plug-in level set estimates. Ann. Appl. Probab.19 1108-1142. · Zbl 1180.62048
[46] Matsumoto, Y. (2002). An Introduction to Morse Theory. Translations of Mathematical Monographs208. Amer. Math. Soc., Providence, RI. · Zbl 0990.57001
[47] Meilă, M. (2005). Comparing clusterings—an axiomatic view. In Proceedings of the International Machine Learning Conference (ICML) (S. Wrobel and L. De Raedt, eds.) 577-584. ACM Press, New York.
[48] Meilă, M. (2007). Comparing clusterings—an information based distance. J. Multivariate Anal.98 873-895. · Zbl 1298.91124
[49] Meilă, M. (2012). Local equivalences of distances between clusterings—a geometric perspective. Mach. Learn.86 369-389. · Zbl 1267.68187
[50] Menardi, G. and Azzalini, A. (2014). An advancement in clustering via nonparametric density estimation. Stat. Comput.24 753-767. · Zbl 1322.62175
[51] Milnor, J. (1963). Morse Theory. Princeton Univ. Press, Princeton, NJ. · Zbl 0108.10401
[52] Nugent, R. and Stuetzle, W. (2010). Clustering with confidence: A low-dimensional binning approach. In Classification as a Tool for Research (H. Locarek-Junge and C. Weihs, eds.) 117-125. Springer, Berlin.
[53] Pollard, D. (1981). Strong consistency of \(k\)-means clustering. Ann. Statist.9 135-140. · Zbl 0451.62048
[54] Ray, S. and Lindsay, B. G. (2005). The topography of multivariate normal mixtures. Ann. Statist.33 2042-2065. · Zbl 1086.62066
[55] Rinaldo, A., Singh, A., Nugent, R. and Wasserman, L. (2012). Stability of density-based clustering. J. Mach. Learn. Res.13 905-948. · Zbl 1283.62130
[56] Romano, J. P. (1988). On weak convergence and optimality of kernel density estimates of the mode. Ann. Statist.16 629-647. · Zbl 0658.62053
[57] Silverman, B. W. (1978). Weak and strong uniform consistency of the kernel estimate of a density and its derivatives. Ann. Statist.6 177-184. · Zbl 0376.62024
[58] Stuetzle, W. (2003). Estimating the cluster type of a density by analyzing the minimal spanning tree of a sample. J. Classification20 25-47. · Zbl 1055.62075
[59] Thom, R. (1949). Sur une partition en cellules associée à une fonction sur une variété. C. R. Acad. Sci. Paris228 973-975. · Zbl 0034.20802
[60] Tsybakov, A. B. (1997). On nonparametric estimation of density level sets. Ann. Statist.25 948-969. · Zbl 0881.62039
[61] von Luxburg, U. and Ben-David, S. (2005). Towards a statistical theory for clustering. In PASCAL Workshop on Statistics and Optimization of Clustering.
[62] Wand, M. P. and Jones, M. C. (1993). Comparison of smoothing parameterizations in bivariate kernel density estimation. J. Amer. Statist. Assoc.88 520-528. · Zbl 0775.62105
[63] Wand, M. P. and Jones, M. C. (1995). Kernel Smoothing. Monographs on Statistics and Applied Probability60. Chapman & Hall, London. · Zbl 0854.62043
[64] Wang, X., Qiu, W. and Zamar, R. H. (2007). CLUES: A non-parametric clustering method based on local shrinking. Comput. Statist. Data Anal.52 286-298. · Zbl 1452.62474
[65] Wishart, D. (1969). Mode analysis: A generalization of nearest neighbor which reduces chaining effects. In Numerical Taxonomy (A. J. Cole, ed.) 282-311. Academic Press, New York.
[66] Zadeh, R.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.