A nearest neighbor estimate of the residual variance. (English) Zbl 1395.62088
Summary: We study the problem of estimating the smallest achievable mean-squared error in regression function estimation. The problem is equivalent to estimating the second moment of the regression function of $$Y$$ on $$X\in\mathbb R^d$$. We introduce a nearest-neighbor-based estimate and obtain a normal limit law for the estimate when $$X$$ has an absolutely continuous distribution, without any condition on the density. We also compute the asymptotic variance explicitly and derive a non-asymptotic bound on the variance that does not depend on the dimension $$d$$. The asymptotic variance does not depend on the smoothness of the density of $$X$$ or of the regression function. A non-asymptotic exponential concentration inequality is also proved. We illustrate the use of the new estimate through testing whether a component of the vector $$X$$ carries information for predicting $$Y$$.

##### MSC:
 62G08 Nonparametric regression and quantile regression 62G20 Asymptotic properties of nonparametric inference
