×

A weighted \(k\)-nearest neighbor density estimate for geometric inference. (English) Zbl 1274.62264

Summary: Motivated by a broad range of potential applications in topological and geometric inference, we introduce a weighted version of the \(k\)-nearest neighbor density estimate. Various pointwise consistency results of this estimate are established. We present a general central limit theorem under the lightest possible conditions. In addition, a strong approximation result is obtained and the choice of the optimal set of weights is discussed. In particular, the classical \(k\)-nearest neighbor estimate is not optimal in a sense described in the manuscript. The proposed method has been implemented to recover level sets in both simulated and real-life data.

MSC:

62G07 Density estimation
62G05 Nonparametric estimation
62G20 Asymptotic properties of nonparametric inference

Software:

CGAL; ANN
PDFBibTeX XMLCite
Full Text: DOI Euclid

References:

[1] E. Arias-Castro, D.L. Donoho, and X. Huo. Adaptive multiscale detection of filamentary structures in a background of uniform random points., The Annals of Statistics , 34:326-349, 2006. · Zbl 1091.62095 · doi:10.1214/009053605000000787
[2] M.S. Bartlett. Statistical estimation of density functions., Sankhyā. Series A , 25:245-254, 1963. · Zbl 0129.32302
[3] P.K. Bhattacharya and Y.P. Mack. Weak convergence of, k -NN density and regression estimators with varying k and applications. The Annals of Statistics , 15:976-994, 1987. · Zbl 0643.62027 · doi:10.1214/aos/1176350487
[4] G. Biau, B. Cadre, D.M. Mason, and B. Pelletier. Asymptotic normality in density support estimation., Electronic Journal of Probability , 14 :2617-2635, 2009. · Zbl 1185.62071 · doi:10.1214/EJP.v14-722
[5] G. Biau, B. Cadre, and B. Pelletier. Exact rates in density support estimation., Journal of Multivariate Analysis , 99 :2185-2207, 2008. · Zbl 1151.62027 · doi:10.1016/j.jmva.2008.02.021
[6] P. Billingsley., Probability and Measure. Third Edition. John Wiley and Sons, New York, 1995. · Zbl 0822.60002
[7] F. Bolley, A. Guillin, and C. Villani. Quantitative concentration inequalities for empirical measures on non-compact spaces., Probability Theory and Related Fields , 137:541-593, 2007. · Zbl 1113.60093 · doi:10.1007/s00440-006-0004-7
[8] CGAL. Computational Geometry Algorithms Library., . · Zbl 1322.68279
[9] F. Chazal, D. Cohen-Steiner, and A. Lieutier. Normal cone approximation and offset shape isotopy., Computational Geometry: Theory and Applications , 42:566-581, 2009. · Zbl 1169.65015 · doi:10.1016/j.comgeo.2008.12.002
[10] F. Chazal, D. Cohen-Steiner, and A. Lieutier. A sampling theory for compact sets in Euclidean space., Discrete and Computational Geometry , 41:461-479, 2009. · Zbl 1165.68061 · doi:10.1007/s00454-009-9144-8
[11] F. Chazal, D. Cohen-Steiner, A. Lieutier, and B. Thibert. Stability of curvature measures., Computer Graphics Forum , 28 :1485-1496, 2009.
[12] F. Chazal, D. Cohen-Steiner, and Q. Mérigot. Geometric inference for measures based on distance functions. Research Report 6930, INRIA, 2009., .
[13] F. Chazal, D. Cohen-Steiner, and Q. Mérigot. Boundary measures for geometric inference., Journal on Foundations of Computational Mathematics , 10:221-240, 2010. · Zbl 1196.52006 · doi:10.1007/s10208-009-9056-2
[14] A. Cuevas, R. Fraiman, and A. Rodríguez-Casal. A nonparametric approach to the estimation of lengths and surface areas., The Annals of Statistics , 35 :1031-1051, 2007. · Zbl 1124.62017 · doi:10.1214/009053606000001532
[15] A. Cuevas and A. Rodríguez-Casal. Set estimation: An overview and some recent developments. In M.G. Akritas and D.N. Politis, editors, Recent Advances and Trends in Nonparametric Statistics , pages 251-264, Amsterdam, 2003. North-Holland. · doi:10.1016/B978-044451378-6/50017-X
[16] P. Deheuvels. Estimation non paramétrique de la densité par histogrammes généralisés., Publications de l’Institut de Statistique de l’Université de Paris , 22:1-24, 1977. · Zbl 0375.62038
[17] L. Devroye., Non-Uniform Random Variate Generation . Springer-Verlag, New York, 1986. · Zbl 0593.65005
[18] L. Devroye, L. Györfi, and G. Lugosi., A Probabilistic Theory of Pattern Recognition . Springer-Verlag, New York, 1996. · Zbl 0853.68150
[19] L. Devroye and G. Wise. Detection of abnormal behavior via nonparametric estimation of the support., SIAM Journal on Applied Mathematics , 38:480-488, 1980. · Zbl 0479.62028 · doi:10.1137/0138038
[20] L.P. Devroye and T.J. Wagner., Nonparametric discrimination and density estimation . Technical Report 183, Information Systems Research Laboratory, Electronics Research Center, The University of Texas at Austin, Austin, 1976. · Zbl 0345.62044
[21] L.P. Devroye and T.J. Wagner. The strong uniform consistency of nearest neighbor density estimates., The Annals of Statistics , 5:536-540, 1977. · Zbl 0367.62061 · doi:10.1214/aos/1176343851
[22] V.A. Epanechnikov. Non-parametric estimation of a multivariate probability density., Theory of Probability and its Applications , 14:153-158, 1969. · Zbl 0194.50001
[23] E. Fix and J.L. Hodges., Discriminatory analysis. Nonparametric discrimination: Consistency properties . Technical Report 4, Project Number 21-49-004, USAF School of Aviation Medicine, Randolph Field, Texas, 1951. · Zbl 0715.62080
[24] K. Fukunaga and T.E. Flick. An optimal global nearest neighbor metric., IEEE Transactions on Pattern Analysis and Machine Intelligence , 6:314-318, 1984. · Zbl 0534.62041 · doi:10.1109/TPAMI.1984.4767523
[25] K. Fukunaga and L.D. Hostetler. Optimization of, k -nearest neighbor density estimates. IEEE Transactions on Information Theory , 19:320-326, 1973. · Zbl 0276.62039 · doi:10.1109/TIT.1973.1055003
[26] C.R. Genovese, M. Perone-Pacifico, I. Verdinelli, and L. Wasserman. On the path density of a gradient field., The Annals of Statistics , 37 :3236-3271, 2009. · Zbl 1191.62062 · doi:10.1214/08-AOS671
[27] C.R. Genovese, M. Perone-Pacifico, I. Verdinelli, and L. Wasserman., Nonparametric filament Estimation , 2010. . · Zbl 1261.62030
[28] T. Hastie and W. Stuetzle. Principal curves., Journal of the American Statistical Association , 84:502-516, 1989. · Zbl 0679.62048 · doi:10.2307/2289936
[29] J. Komlós, P. Major, and G. Tusnády. An approximation of partial sums of independent RV’-s, and the sample DF. I., Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete , 32:111-131, 1975. · Zbl 0308.60029 · doi:10.1007/BF00533093
[30] A.P. Korostelev and A.B. Tsybakov., Minimax Theory of Image Reconstruction , volume 82 of Lecture Notes in Statistics . Springer-Verlag, New York, 1993. · Zbl 0833.62039
[31] D.O. Loftsgaarden and C.P. Quesenberry. A nonparametric estimate of a multivariate density function., The Annals of Mathematical Statistics , 36 :1049-1051, 1965. · Zbl 0132.38905 · doi:10.1214/aoms/1177700079
[32] Y.P. Mack. Asymptotic normality of multivariate, k -NN density estimates. Sankhyā. Series A , 42:53-63, 1980. · Zbl 0492.62038
[33] Y.P. Mack and M. Rosenblatt. Multivariate, k -nearest neighbor density estimates. Journal of Multivariate Analysis , 9:1-15, 1979. · Zbl 0406.62023 · doi:10.1016/0047-259X(79)90065-4
[34] D.M. Mason., Coupling via the KMT quantile inequality . Technical Report, University of Delaware, Newark, 2011.
[35] K.S. Miller., Multidimensional Gaussian Distributions . Wiley, New York, 1964. · Zbl 0231.62066
[36] D.S. Moore and J.W. Yackel. Consistency properties of nearest neighbor density function estimators., The Annals of Statistics , 5:143-154, 1977. · Zbl 0358.60053 · doi:10.1214/aos/1176343747
[37] D.S. Moore and J.W. Yackel. Large sample properties of nearest neighbor density function estimators. In S.S. Gupta and D.S. Moore, editors, Statistical Decision Theory and Related Topics. II , pages 269-279, New York, 1977. Academic Press. · Zbl 0419.62036
[38] D.M. Mount and S. Arya. ANN: A Library for Approximate Nearest Neighbor Searching., . · Zbl 1204.68103
[39] P. Niyogi, S. Smale, and S. Weinberger., A topological view of unsupervised learning from noisy data . Technical Report, University of Chicago, Chicago, 2008. . · Zbl 1230.62085
[40] A. Petrunin. Semiconcave functions in Alexandrov’s geometry. In, Surveys in Differential Geometry. 11 , pages 137-201. International Press, Somerville, 2007. · Zbl 1166.53001
[41] C. Rodríguez and J. Van Ryzin. Maximum entropy histograms., Statistics and Probability Letters , 3:117-120, 1985. · Zbl 0573.62035 · doi:10.1016/0167-7152(85)90047-1
[42] C. Rodríguez and J. Van Ryzin. Large sample properties of maximum entropy histograms., IEEE Transactions on Information Theory , 32:751-759, 1986. · Zbl 0631.62048 · doi:10.1109/TIT.1986.1057231
[43] C.C. Rodríguez., On a new class of density estimators . Technical Report, Department of Mathematics and Statistics, State University of New York at Albany, New York, 1986. .
[44] C.C. Rodríguez., Weak convergence of multivariate k-nn . Technical Report, Department of Mathematics and Statistics, State University of New York at Albany, New York, 1992. .
[45] C.C. Rodríguez. Optimal recovery of local truth. In, Bayesian Inference and Maximum Entropy Methods in Science and Engineering: 19th International Workshop , volume 567, pages 80-115. AIP Conference Proceedings, 2001.
[46] R.J. Samworth., Optimal weighted nearest neighbour classifiers . Technical Report, University of Cambridge, Cambridge, 2011. . · Zbl 1373.62317
[47] R.D. Short and K. Fukunaga. The optimal distance measure for nearest neighbor classification., IEEE Transactions on Information Theory , 27:622-627, 1981. · Zbl 0464.62047 · doi:10.1109/TIT.1981.1056403
[48] USGS. US Geological Survey., .
[49] A.W. van der Vaart and J.A. Wellner., Weak Convergence and Empirical Processes. With Applications to Statistics . Springer-Verlag, New York, 1996. · Zbl 0862.60002
[50] C. Villani., Topics in Optimal Transportation . American Mathematical Society, Providence, 2003. · Zbl 1106.90001
[51] R.L. Wheeden and A. Zygmund., Measure and Integral. An Introduction to Real Analysis . Marcel Dekker, New York, 1977. · Zbl 0362.26004
[52] R. Willink. Relationships between central moments and cumulants, with formulae for the central moments of gamma distributions., Communications in Statistics - Theory and Methods , 32:701-704, 2003. · Zbl 1048.62017 · doi:10.1081/STA-120018823
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.