×

zbMATH — the first resource for mathematics

Generalized cluster trees and singular measures. (English) Zbl 1421.62050
Summary: In this paper we study the \(\alpha\)-cluster tree (\(\alpha\)-tree) under both singular and nonsingular measures. The \(\alpha\)-tree uses probability contents within a set created by the ordering of points to construct a cluster tree so that it is well defined even for singular measures. We first derive the convergence rate for a density level set around critical points, which leads to the convergence rate for estimating an \(\alpha\)-tree under nonsingular measures. For singular measures, we study how the kernel density estimator (KDE) behaves and prove that the KDE is not uniformly consistent but pointwise consistent after rescaling. We further prove that the estimated \(\alpha\)-tree fails to converge in the \(L_{\infty}\) metric but is still consistent under the integrated distance. We also observe a new type of critical points – the dimensional critical points (DCPs) – of a singular measure. DCPs are points that contribute to cluster tree topology but cannot be defined using density gradient. Building on the analysis of the KDE and DCPs, we prove the topological consistency of an estimated \(\alpha\)-tree.

MSC:
62G20 Asymptotic properties of nonparametric inference
62G05 Nonparametric estimation
62G07 Density estimation
62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDF BibTeX XML Cite
Full Text: DOI Euclid arXiv
References:
[1] Balakrishnan, S., Narayanan, S., Rinaldo, A., Singh, A. and Wasserman, L. (2012). Cluster trees on manifolds. In Advances in Neural Information Processing Systems (C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani and K. Q. Weinberger, eds.) 26 2679-2687. Curran Associates, Red Hook, NY.
[2] Baryshnikov, Y., Bubenik, P. and Kahle, M. (2014). Min-type Morse theory for configuration spaces of hard spheres. Int. Math. Res. Not. IMRN9 2577-2592. · Zbl 1315.55011
[3] Bobrowski, O., Mukherjee, S. and Taylor, J. E. (2017). Topological consistency via kernel estimation. Bernoulli23 288-328. · Zbl 1395.62073
[4] Bubenik, P. (2015). Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res.16 77-102. · Zbl 1337.68221
[5] Cadre, B. (2006). Kernel estimation of density level sets. J. Multivariate Anal.97 999-1023. · Zbl 1085.62039
[6] Cadre, B., Pelletier, B. and Pudlo, P. (2009). Clustering by estimation of density level sets at a fixed probability. Available at https://hal.archives-ouvertes.fr/file/index/docid/397437/filename/tlevel.pdf. · Zbl 1297.62070
[7] Carlsson, G. (2009). Topology and data. Bull. Amer. Math. Soc. (N.S.) 46 255-308. · Zbl 1172.62002
[8] Chaudhuri, K. and Dasgupta, S. (2010). Rates of convergence for the cluster tree. In Advances in Neural Information Processing Systems 343-351.
[9] Chaudhuri, K., Dasgupta, S., Kpotufe, S. and von Luxburg, U. (2014). Consistent procedures for cluster tree estimation and pruning. IEEE Trans. Inform. Theory60 7900-7912. · Zbl 1359.62234
[10] Chazal, F., Fasy, B., Lecci, F., Michel, B., Rinaldo, A. and Wasserman, L. (2017). Robust topological inference: Distance to a measure and kernel distance. J. Mach. Learn. Res.18 Paper No. 159, 40. · Zbl 1435.62452
[11] Chen, Y.-C. (2017). A tutorial on kernel density estimation and recent advances. ArXiv preprint. Available at arXiv:1704.03924.
[12] Chen, Y.-C. (2019). Supplement to “Generalized cluster trees and singular measures”. DOI:10.1214/18-AOS1744SUPP.
[13] Chen, Y.-C. and Dobra, A. (2017). Measuring human activity spaces with density ranking based on GPS data. ArXiv preprint. Available at arXiv:1708.05017.
[14] Chen, Y.-C., Genovese, C. R. and Wasserman, L. (2015). Asymptotic theory for density ridges. Ann. Statist.43 1896-1928. · Zbl 1327.62303
[15] Chen, Y.-C., Genovese, C. R. and Wasserman, L. (2016). A comprehensive approach to mode clustering. Electron. J. Stat.10 210-241. · Zbl 1332.62200
[16] Chen, Y.-C., Genovese, C. R. and Wasserman, L. (2017). Density level sets: Asymptotics, inference, and visualization. J. Amer. Statist. Assoc.112 1684-1696.
[17] Chen, Y.-C., Kim, J., Balakrishnan, S., Rinaldo, A. and Wasserman, L. (2016). Statistical inference for cluster trees. ArXiv preprint. Available at arXiv:1605.06416.
[18] Cohen-Steiner, D., Edelsbrunner, H. and Harer, J. (2007). Stability of persistence diagrams. Discrete Comput. Geom.37 103-120. · Zbl 1117.54027
[19] Edelsbrunner, H. and Harer, J. (2008). Persistent homology—A survey. In Surveys on Discrete and Computational Geometry. Contemp. Math.453 257-282. Amer. Math. Soc., Providence, RI. · Zbl 1145.55007
[20] Edelsbrunner, H. and Morozov, D. (2013). Persistent homology: Theory and practice. In European Congress of Mathematics 31-50. Eur. Math. Soc., Zürich. · Zbl 1364.55008
[21] Einmahl, U. and Mason, D. M. (2005). Uniform in bandwidth consistency of kernel-type function estimators. Ann. Statist.33 1380-1403. · Zbl 1079.62040
[22] Eldridge, J., Belkin, M. and Wang, Y. (2015). Beyond hartigan consistency: Merge distortion metric for hierarchical clustering. In Proceedings of the 28th Conference on Learning Theory 588-606.
[23] Fasy, B. T., Lecci, F., Rinaldo, A., Wasserman, L., Balakrishnan, S. and Singh, A. (2014). Confidence sets for persistence diagrams. Ann. Statist.42 2301-2339. · Zbl 1310.62059
[24] Federer, H. (1959). Curvature measures. Trans. Amer. Math. Soc.93 418-491. · Zbl 0089.38402
[25] Genovese, C. R., Perone-Pacifico, M., Verdinelli, I. and Wasserman, L. (2009). On the path density of a gradient field. Ann. Statist.37 3236-3271. · Zbl 1191.62062
[26] Genovese, C. R., Perone-Pacifico, M., Verdinelli, I. and Wasserman, L. (2014). Nonparametric ridge estimation. Ann. Statist.42 1511-1545. · Zbl 1310.62045
[27] Giné, E. and Guillou, A. (2002). Rates of strong uniform consistency for multivariate kernel density estimators. Ann. Inst. Henri Poincaré B, Probab. Stat.38 907-921. · Zbl 1011.62034
[28] Goresky, M. and MacPherson, R. (1980). Intersection homology theory. Topology19 135-162. · Zbl 0448.55004
[29] Goresky, M. and MacPherson, R. (1988). Stratified Morse Theory. Ergebnisse der Mathematik und Ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)] 14. Springer, Berlin. · Zbl 0639.14012
[30] Hartigan, J. A. (1981). Consistency of single linkage for high-density clusters. J. Amer. Statist. Assoc.76 388-394. · Zbl 0468.62053
[31] Klemelä, J. (2004). Visualization of multivariate density estimates with level set trees. J. Comput. Graph. Statist.13 599-620.
[32] Klemelä, J. (2006). Visualization of multivariate density estimates with shape trees. J. Comput. Graph. Statist.15 372-397.
[33] Klemelä, J. (2009). Smoothing of Multivariate Data: Density Estimation and Visualization. Wiley, Hoboken, NJ. · Zbl 1218.62027
[34] Kpotufe, S. and Luxburg, U. V. (2011). Pruning nearest neighbor cluster trees. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) (L. Getoor and T. Scheffer, eds.) 225-232. International Machine Learning Society, Madison, WI.
[35] Laloe, T. and Servien, R. (2013). Nonparametric estimation of regression level sets. J. Korean Statist. Soc. · Zbl 1294.62064
[36] Lee, J. M. (2013). Introduction to Smooth Manifolds, 2nd ed. Graduate Texts in Mathematics218. Springer, New York. · Zbl 1258.53002
[37] Mammen, E. and Polonik, W. (2013). Confidence regions for level sets. J. Multivariate Anal.122 202-214. · Zbl 1280.62056
[38] Mason, D. M. and Polonik, W. (2009). Asymptotic normality of plug-in level set estimates. Ann. Appl. Probab.19 1108-1142. · Zbl 1180.62048
[39] Mattila, P. (1995). Geometry of Sets and Measures in Euclidean Spaces: Fractals and Rectifiability. Cambridge Studies in Advanced Mathematics44. Cambridge Univ. Press, Cambridge. · Zbl 0819.28004
[40] Milnor, J. (1963). Morse Theory. Based on Lecture Notes by M. Spivak and R. Wells. Annals of Mathematics Studies, No. 51. Princeton Univ. Press, Princeton, NJ.
[41] Molchanov, I. S. (1990). Empirical estimation of quantiles of distributions of random closed sets. Teor. Veroyatn. Primen.35 586-592. · Zbl 0727.62041
[42] Morse, M. (1925). Relations between the critical points of a real function of \(n\) independent variables. Trans. Amer. Math. Soc.27 345-396. · JFM 51.0451.01
[43] Morse, M. (1930). The foundations of a theory of the calculus of variations in the large in \(m\)-space. II. Trans. Amer. Math. Soc.32 599-631. · JFM 56.1079.01
[44] Polonik, W. (1995). Measuring mass concentrations and estimating density contour clusters—An excess mass approach. Ann. Statist.23 855-881. · Zbl 0841.62045
[45] Preiss, D. (1987). Geometry of measures in \(\textbf{R}^{n}\): Distribution, rectifiability, and densities. Ann. of Math. (2) 125 537-643. · Zbl 0627.28008
[46] Rinaldo, A. and Wasserman, L. (2010). Generalized density clustering. Ann. Statist.38 2678-2722. · Zbl 1200.62066
[47] Rinaldo, A., Singh, A., Nugent, R. and Wasserman, L. (2012). Stability of density-based clustering. J. Mach. Learn. Res.13 905-948. · Zbl 1283.62130
[48] Scott, D. W. (2015). Multivariate Density Estimation: Theory, Practice, and Visualization, 2nd ed. Wiley, Hoboken, NJ. · Zbl 1311.62004
[49] Singh, A., Scott, C. and Nowak, R. (2009). Adaptive Hausdorff estimation of density level sets. Ann. Statist.37 2760-2782. · Zbl 1173.62019
[50] Steinwart, I. (2011). Adaptive density level set clustering. In COLT 703-738.
[51] Stuetzle, W. (2003). Estimating the cluster type of a density by analyzing the minimal spanning tree of a sample. J. Classification20 25-47. · Zbl 1055.62075
[52] Tsybakov, A. B. (1997). On nonparametric estimation of density level sets. Ann. Statist.25 948-969. · Zbl 0881.62039
[53] Tu, L. W. (2008). An Introduction to Manifolds. Springer, New York. · Zbl 1144.58001
[54] Walther, G. (1997). Granulometric smoothing. Ann. Statist.25 2273-2299. · Zbl 0919.62026
[55] Wasserman, L. (2006). All of Nonparametric Statistics. Springer Texts in Statistics. Springer, New York. · Zbl 1099.62029
[56] Wasserman, L. (2018). Topological data analysis. Ann. Rev. Stat. Appl.5 501-535.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.