×

Local angles and dimension estimation from data on manifolds. (English) Zbl 1422.62182

Summary: For data living in a manifold \(M \subseteq \mathbb{R}^m\) and a point \(p \in M\), we consider a statistic \(U_{k, n}\) which estimates the variance of the angle between pairs \((X_i - p, X_j - p)\) of vectors, for data points \(X_i, X_j\), near \(p\), and we evaluate this statistic as a tool for estimation of the intrinsic dimension of \(M\) at \(p\). Consistency of the local dimension estimator is established and the asymptotic distribution of \(U_{k, n}\) is found under minimal regularity assumptions. Performance of the proposed methodology is compared against state-of-the-art methods on simulated data and real datasets.

MSC:

62H12 Estimation in multivariate analysis
62F12 Asymptotic properties of parametric estimators
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Arcones, M. A.; Gine, E., Limit theorems for \(U\)-processes, Ann. Probab., 21, 1494-1542 (1993) · Zbl 0789.60031
[2] Atkinson, K.; Han, W., Spherical Harmonics and Approximations on the Unit Sphere: An Introduction (2012), Springer: Springer Heidelberg · Zbl 1254.41015
[3] Belkin, M.; Niyogi, P., Semi-supervised learning on Riemannian manifolds, Mach. Learn., 56, 209-239 (2004) · Zbl 1089.68086
[4] Borwein, J. M.; Chamberland, M., Integer powers of arcsin, Int. J. Math. Math. Sci., Article 19381 pp. (2007), 10 · Zbl 1137.33300
[5] Bowman, A. W.; Foster, P. J., Adaptive smoothing and density-based tests of multivariate normality, J. Amer. Statist. Assoc., 88, 529-537 (1993) · Zbl 0775.62086
[6] Breiding, P.; Kališnik, S.; Sturmfels, B.; Weinstein, M., Learning algebraic varieties from samples, Rev. Mat. Complut., 31, 545-593 (2018) · Zbl 1481.13040
[7] Brito, M. R.; Quiroz, A. J.; Yukich, J. E., Graph-theoretic procedures for dimension identification, J. Multivariate Anal., 81, 67-84 (2002) · Zbl 1006.60024
[8] Brito, M. R.; Quiroz, A. J.; Yukich, J. E., Intrinsic dimension identification via graph-theoretic methods, J. Multivariate Anal., 116, 263-277 (2013) · Zbl 1359.62279
[9] Cai, T.; Fan, J.; Jiang, T., Distributions of angles in random packing on spheres, J. Mach. Learn. Res., 14, 1837-1864 (2013) · Zbl 1318.60017
[10] Campadelli, P.; Casiraghi, E.; Ceruti, C.; Rozza, A., Intrinsic dimension estimation: relevant techniques and a benchmark framework, Math. Probl. Eng., 2015 (2015) · Zbl 1395.68244
[11] Ceruti, C.; Bassis, S.; Rozza, A.; Lombardi, G.; Casiraghi, E.; Campadelli, P., : an intrinsic dimensionality estimator exploiting angle and norm concentration, Pattern Recognit., 47, 2569-2581 (2014) · Zbl 1339.68219
[12] Costa, J. A.; Girotra, A.; Hero, A., Estimating local intrinsic dimension with k-nearest neighbor graphs, (Statistical Signal Processing, 2005 IEEE/SP 13th Workshop on (2005), IEEE), 417-422
[13] Devroye, L.; Györfi, L.; Lugosi, G., A Probabilistic Theory of Pattern Recognition (1996), Springer: Springer New York · Zbl 0853.68150
[14] Farahmand, A.m.; Szepesvári, C.; Audibert, J.-Y., Manifold-adaptive dimension estimation, (Proceedings of the 24th International Conference on Machine Learning. Proceedings of the 24th International Conference on Machine Learning, ICML ’07 (2007), ACM: ACM New York, NY, USA), 265-272
[15] Grassberger, P.; Procaccia, I., Measuring the strangeness of strange attractors, Physica D, 9, 189-208 (1983) · Zbl 0593.58024
[16] Hein, M.; Audibert, J.-Y., Intrinsic dimensionality estimation of submanifolds in \(R^d\), (Proceedings of the 22nd International Conference on Machine Learning. Proceedings of the 22nd International Conference on Machine Learning, ICML ’05 (2005), ACM: ACM New York, NY, USA), 289-296
[17] Janson, S., On concentration of probability, (Contemporary Combinatorics, János Bolyai Math. Soc. (2002)), 289-301 · Zbl 0998.60019
[18] Kaufmann, E.; Reiß, R.-D., On conditional distributions of nearest neighbors, J. Multivariate Anal., 42, 67-76 (1992) · Zbl 0773.60037
[19] LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P., Gradient-based learning applied to document recognition, Proc. IEEE, 86, 2278-2324 (1998)
[20] Levina, E.; Bickel, P. J., Maximum likelihood estimation of intrinsic dimension, (Advances in neural information processing systems (2005)), 777-784
[21] Lombardi, G.; Rozza, A.; Ceruti, C.; Casiraghi, E.; Campadelli, P., Minimum neighbor distance estimators of intrinsic dimension, (Joint European Conference on Machine Learning and Knowledge Discovery in Databases (2011), Springer), 374-389
[22] Penrose, M. D.; Yukich, J. E., Central limit theorems for some graphs in computational geometry, Ann. Appl. Probab., 11, 1005-1041 (2001) · Zbl 1044.60016
[23] Penrose, M. D.; Yukich, J. E., Limit theory for point processes in manifolds, Ann. Appl. Probab., 23, 2161-2211 (2013) · Zbl 1285.60021
[24] Pettis, K. W.; Bailey, T. A.; Jain, A. K.; Dubes, R. C., An intrinsic dimensionality estimator from near-neighbor information, IEEE Trans. Pattern Anal. Mach. Intell., 25-37 (1979) · Zbl 0418.68074
[25] Randles, R. H.; Wolfe, D. A., Introduction to the Theory of Nonparametric Statistics (1979), John Wiley & Sons: John Wiley & Sons New York · Zbl 0529.62035
[26] Roweis, S. T.; Saul, L. K., Nonlinear dimensionality reduction by locally linear embedding, Science, 290, 2323-2326 (2000)
[27] Serfling, R. J., Approximation Theorems of Mathematical Statistics (1980), Wiley: Wiley New York · Zbl 0538.62002
[28] Sindhwani, V.; Belkin, M.; Niyogi, P., The geometric basis of semi-supervised learning, (Chapelle, O.; Schlkopf, B.; Zien, A., Semi-Supervised Learning (2006), The MIT Press), 35-54
[29] Södergren, A., On the distribution of angles between the \(N\) shortest vectors in a random lattice, J. Lond. Math. Soc. (2), 84, 749-764 (2011) · Zbl 1268.60068
[30] Sricharan, K.; Raich, R.; Hero, A. O., Optimized intrinsic dimension estimator using nearest neighbor graphs, (Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on (2010), IEEE), 5418-5421
[31] Tenenbaum, J. B.; De Silva, V.; Langford, J. C., A global geometric framework for nonlinear dimensionality reduction, Science, 290, 2319-2323 (2000)
[32] Villani, C., Optimal Transport: Old and New (2009), Springer: Springer Berlin · Zbl 1156.53003
[33] J.A. Wellner, Lecture notes for course in advanced theory of statistical inference, https://www.stat.washington.edu/jaw/COURSES/580s/581/lectnotes.18html; J.A. Wellner, Lecture notes for course in advanced theory of statistical inference, https://www.stat.washington.edu/jaw/COURSES/580s/581/lectnotes.18html
[34] Yukich, J. E., Probability Theory of Classical Euclidean Optimization Problems (1998), Springer-Verlag: Springer-Verlag Berlin · Zbl 0902.60001
[35] Zhang, K., Spherical cap packing asymptotics and rank-extreme detection, IEEE Trans. Inform. Theory, 63, 4572-4584 (2017) · Zbl 1370.94579
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.