# zbMATH — the first resource for mathematics

On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification. (English) Zbl 1198.62048
Summary: Let $$X_1, \dots , X_n$$ be identically distributed random vectors in $$\mathbb R^d$$, independently drawn according to some probability density. An observation $$X_i$$ is said to be a layered nearest neighbour (LNN) of a point $$x$$ if the hyperrectangle defined by $$x$$ and $$X_i$$ contains no other data points. We first establish consistency results on $$L_n(x)$$, the number of LNN of $$x$$. Then, given a sample $$(X, Y), (X_1, Y_1), \dots , (X_n, Y_n)$$ of independent identically distributed random vectors from $$\mathbb R^d \times \mathbb R$$, one may estimate the regression function $$r(x) = \mathbb E[Y|X = x]$$ by the LNN estimate $$r_n(x)$$, defined as an average over the $$Y_i$$’s corresponding to those $$X_i$$ which are LNN of $$x$$. Under mild conditions on $$r$$, we establish the consistency of $$\mathbb E|r_n(x) -r(x)|^p$$ towards 0 as $$n \rightarrow \infty$$, for almost all $$x$$ and all $$p\geq 1$$, and discuss the links between $$r_n$$ and the random forest estimates of L. Breiman [Mach. Learn. 45, No. 1, 5–32 (2001; Zbl 1007.68152)]. We finally show the universal consistency of the bagged (bootstrap-aggregated) nearest neighbour method for regression and classification.

##### MSC:
 62H12 Estimation in multivariate analysis 62G20 Asymptotic properties of nonparametric inference 62G08 Nonparametric regression and quantile regression 62H30 Classification and discrimination; cluster analysis (statistical aspects)
Full Text:
##### References:
