zbMATH — the first resource for mathematics

A nearest neighbor estimate of the residual variance. (English) Zbl 1395.62088
Summary: We study the problem of estimating the smallest achievable mean-squared error in regression function estimation. The problem is equivalent to estimating the second moment of the regression function of \(Y\) on \(X\in\mathbb R^d\). We introduce a nearest-neighbor-based estimate and obtain a normal limit law for the estimate when \(X\) has an absolutely continuous distribution, without any condition on the density. We also compute the asymptotic variance explicitly and derive a non-asymptotic bound on the variance that does not depend on the dimension \(d\). The asymptotic variance does not depend on the smoothness of the density of \(X\) or of the regression function. A non-asymptotic exponential concentration inequality is also proved. We illustrate the use of the new estimate through testing whether a component of the vector \(X\) carries information for predicting \(Y\).

62G08 Nonparametric regression and quantile regression
62G20 Asymptotic properties of nonparametric inference
Full Text: DOI Euclid
[1] Biau, G. and Devroye, L.:, Lectures on the Nearest Neighbor Method, Springer-Verlag, New York, 2015. · Zbl 1330.68001
[2] Biau, G. and Györfi, L.: On the asymptotic properties of a nonparametric \(l_1\)-test statistic of homogeneity., IEEE Transactions on Information Theory, 51 :3965-3973, 2005. · Zbl 1286.62041
[3] Blum, J. R., Chernoff, H., Rosenblatt, M. and Teicher, H.: Central limit theorems for interexchangeable processes., Canadian Journal of Mathematics, 10:222-229, 1958. · Zbl 0081.35203
[4] Boucheron, S., Lugosi, G., and Massart, P.:, Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 2013. · Zbl 1279.60005
[5] De Brabanter, K., Ferrario, P. G. and Györfi, L.: Detecting ineffective features for nonparametric regression. In, Regularization, Optimization, Kernels, and Support Vector Machines, ed. by J. A. K. Suykens, M. Signoretto, A. Argyriou, pp. 177-194, Chapman & Hall/CRC Machine Learning and Pattern Recognition Series, 2014.
[6] Devroye, L., Ferrario, P., Györfi, L. and Walk, H.: Strong universal consistent estimate of the minimum mean squared error. In, Empirical Inference - Festschrift in Honor of Vladimir N. Vapnik, ed. by B. Schölkopf, Z. Luo, and V. Vovk, pp. 143-160, Springer, Heidelberg, 2013. · Zbl 1325.62086
[7] Devroye, L., Györfi, L. and Lugosi, G.:, A Probabilistic Theory of Pattern Recognition, Springer-Verlag, New York, 1996.
[8] Devroye, L., Györfi, L., Lugosi, G. and Walk, H.: On the measure of Voronoi cells., Journal of Applied Probability, 54:394-408, 2017. · Zbl 1400.60012
[9] Devroye, L. and Lugosi, G.: Almost sure classification of densities., Journal of Nonparametric Statistics, 14:675-698, 2002. · Zbl 1013.62035
[10] Devroye, L., Schäfer, D., Györfi, L. and Walk, H.: The estimation problem of minimum mean squared error., Statistics and Decisions, 21:15-28, 2003.
[11] Efron, B. and Stein, C.: The jackknife estimate of variance., Annals of Statistics, 9:586-596, 1981. · Zbl 0481.62035
[12] Evans, D. and Jones, A. J.: Non-parametric estimation of residual moments and covariance., Proceedings of the Royal Society, A 464 :2831-2846, 2008. · Zbl 1152.62335
[13] Ferrario, P. G. and Walk, H.: Nonparametric partitioning estimation of residual and local variance based on first and second nearest neighbors., Journal of Nonparametric Statistics, 24 :1019-1039, 2012. · Zbl 1284.62214
[14] Gretton, A. and Györfi, L.: Consistent nonparametric tests of independence., Journal of Machine Learning Research, 11 :1391-1423, 2010. · Zbl 1242.62033
[15] Györfi, L., Kohler, M., Krzyżak, A. and Walk, H.:, A Distribution-Free Theory of Nonparametric Regression. Springer-Verlag, New York, 2002.
[16] Györfi, L. and Walk, H.: On the asymptotic normality of an estimate of a regression functional., Journal of Machine Learning Research, 16 :1863-1877, 2015. · Zbl 1351.62091
[17] Liitiäinen, E., Corona, F. and Lendasse, A.: On nonparametric residual variance estimation., Neural Processing Letters, 28:155-167, 2008.
[18] Liitiäinen, E., Corona, F. and Lendasse, A.: Residual variance estimation using a nearest neighbor statistic., Journal of Multivariate Analysis, 101:811-823, 2010. · Zbl 1181.62035
[19] Liitiäinen, E., Verleysen, M, Corona, F. and Lendasse, A.: Residual variance estimation in machine learning., Neurocomputing, 72 :3692-3703, 2009.
[20] Petrov, V. V.:, Sums of Independent Random Variables. Springer-Verlag, Berlin, 1975. · Zbl 0322.60042
[21] Weber, N. C.: A martingale approach to central limit theorems for exchangeable random variables., Journal of Applied Probability, 17:662-673, 1980. · Zbl 0444.60016
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.