Biased and unbiased cross-validation in density estimation. (English) Zbl 0648.62037

We introduce some biased cross-validation criteria for selection of smoothing parameters for kernel and histogram density estimators, closely related to one investigated by the first author and L. E. Factor [ibid. 76, 9-15 (1981; Zbl 0465.62036)]. These criteria are obtained by estimating \(L_ 2\) norms of derivatives of the unknown density and provide slightly biased estimates of the average squared \(L_ 2\) error or mean integrated squared error. These criteria are roughly the analog of G. Wahba’s [Ann. Stat. 9, 146-156 (1981; Zbl 0463.62034)] generalized cross-validation procedure for orthogonal series density estimators.
We present the relationship of the biased cross-validation procedure to the least squares cross-validation procedure, which provides unbiased estimates of the mean integrated squared error. Both methods are shown to be based on U statistics. We compare the two methods by theoretical calculation of the noise in the cross-validation functions and corresponding cross-validated smoothing parameters, by Monte Carlo simulation, and by example.
Surprisingly large gains in asymptotic efficiency are observed when biased cross-validation is compared with unbiased cross-validation if the underlying density is sufficiently smooth. The theoretical results explain some of the small sample behavior of cross-validation functions: we show that cross-validation algorithms can be unreliable for sample sizes that are “too small.”


62G05 Nonparametric estimation
62G20 Asymptotic properties of nonparametric inference
Full Text: DOI Link