We introduce some biased cross-validation criteria for selection of smoothing parameters for kernel and histogram density estimators, closely related to one investigated by the first author and L. E. Factor [ibid. 76, 9-15 (1981; Zbl 0465.62036)]. These criteria are obtained by estimating norms of derivatives of the unknown density and provide slightly biased estimates of the average squared error or mean integrated squared error. These criteria are roughly the analog of G. Wahba’s [Ann. Stat. 9, 146-156 (1981; Zbl 0463.62034)] generalized cross-validation procedure for orthogonal series density estimators.
We present the relationship of the biased cross-validation procedure to the least squares cross-validation procedure, which provides unbiased estimates of the mean integrated squared error. Both methods are shown to be based on U statistics. We compare the two methods by theoretical calculation of the noise in the cross-validation functions and corresponding cross-validated smoothing parameters, by Monte Carlo simulation, and by example.
Surprisingly large gains in asymptotic efficiency are observed when biased cross-validation is compared with unbiased cross-validation if the underlying density is sufficiently smooth. The theoretical results explain some of the small sample behavior of cross-validation functions: we show that cross-validation algorithms can be unreliable for sample sizes that are “too small.”