## Consistency of cross validation for comparing regression procedures.(English)Zbl 1129.62039

Summary: Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finite-dimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistency of cross validation when applied to compare between parametric and nonparametric methods or within nonparametric methods. We show that under some conditions, with an appropriate choice of data splitting ratio, cross validation is consistent in the sense of selecting the better procedure with probability approaching 1.
Our results reveal interesting behavior of cross validation. When comparing two models (procedures) converging at the same nonparametric rate, in contrast to the parametric case, it turns out that the proportion of data used for evaluation in CV does not need to be dominating in size. Furthermore, it can even be of a smaller order than the proportion for estimation while not affecting the consistency property.

### MSC:

 62G08 Nonparametric regression and quantile regression 62G20 Asymptotic properties of nonparametric inference 65C60 Computational problems in statistics (MSC2010) 62G05 Nonparametric estimation

model selection

alr3
Full Text:

### References:

 [1] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Proc. 2nd Int. Symp. Information Theory (B. N. Petrov and F. Csáki, eds.) 267-281. Akadémiai Kiadó, Budapest. · Zbl 0283.62006 [2] Allen, D. M. (1974). The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16 125-127. · Zbl 0286.62044 [3] Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees . Wadsworth, Belmont, CA. · Zbl 0541.62042 [4] Burman, P. (1989). A comparative study of ordinary cross-validation, $$v$$-fold cross-validation and the repeated learning-testing methods. Biometrika 76 503-514. · Zbl 0677.62065 [5] Burman, P. (1990). Estimation of optimal transformations using $$v$$-fold cross validation and repeated learning-testing methods. Sankhyā Ser. A 52 314-345. · Zbl 0745.62073 [6] Craven, P. and Wahba, G. (1979). Smoothing noisy data with spline functions. Numer. Math. 31 377-403. · Zbl 0377.65007 [7] Donoho, D. L. and Johnstone, I. M. (1998). Minimax estimation via wavelet shrinkage. Ann. Statist. 26 879-921. · Zbl 0935.62041 [8] Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications . Chapman and Hall, London. · Zbl 0873.62037 [9] Geisser, S. (1975). The predictive sample reuse method with applications. J. Amer. Statist. Assoc. 70 320-328. · Zbl 0321.62077 [10] Györfi, L., Kohler, M., Krzyzak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression . Springer, New York. · Zbl 1021.62024 [11] Hall, P. and Johnstone, I. (1992). Empirical functional and efficient smoothing parameter selection (with discussion). J. Roy. Statist. Soc. Ser. B 54 475-530. · Zbl 0786.62050 [12] Härdle, W., Hall, P. and Marron, J. S. (1988). How far are automatically chosen regression smoothing parameters from their optimum? (with discussion). J. Amer. Statist. Assoc. 83 86-101. · Zbl 0644.62048 [13] Hart, J. D. (1997). Nonparametric Smoothing and Lack-of-Fit Tests . Springer, New York. · Zbl 0886.62043 [14] Li, K.-C. (1984). Consistency for cross-validated nearest neighbor estimates in nonparametric regression. Ann. Statist. 12 230-240. · Zbl 0538.62030 [15] Li, K.-C. (1987). Asymptotic optimality for $$C_p$$, $$C_L$$, cross-validation and generalized cross-validation: Discrete index set. Ann. Statist. 15 958-975. · Zbl 0653.62037 [16] Nemirovski, A. (2000). Topics in nonparametric statistics. Lectures on Probability Theory and Statistics ( Saint-Flour , 1998 ). Lecture Notes in Math. 1738 85-277. Springer, Berlin. · Zbl 0998.62033 [17] Opsomer, J., Wang, Y. and Yang, Y. (2001). Nonparametric regression with correlated errors. Statist. Sci. 16 134-153. · Zbl 1059.62537 [18] Pollard, D. (1984). Convergence of Stochastic Processes . Springer, New York. · Zbl 0544.60045 [19] Shao, J. (1993). Linear model selection by cross-validation. J. Amer. Statist. Assoc. 88 486-494. · Zbl 0773.62051 [20] Shao, J. (1997). An asymptotic theory for linear model selection (with discussion). Statist. Sinica 7 221-264. · Zbl 1003.62527 [21] Simonoff, J. S. (1996). Smoothing Methods in Statistics . Springer, New York. · Zbl 0859.62035 [22] Speckman, P. (1985). Spline smoothing and optimal rates of convergence in nonparametric regression models. Ann. Statist. 13 970-983. · Zbl 0585.62074 [23] Stone, C. J. (1980). Optimal rates of convergence for nonparametric estimators. Ann. Statist. 8 1348-1360. · Zbl 0451.62033 [24] Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression. Ann. Statist. 10 1040-1053. · Zbl 0511.62048 [25] Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions (with discussion). J. Roy. Statist. Soc. Ser. B 36 111-147. · Zbl 0308.62063 [26] van der Laan, M. J. and Dudoit, S. (2003). Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive epsilon-net estimator: Finite sample oracle inequalities and examples. U.C. Berkeley Div. Biostatistics Working Paper Series. Available at www.bepress.com/ucbbiostat/paper130. [27] van der Laan, M. J., Dudoit, S. and van der Vaart, A. W. (2006). The cross-validated adaptive epsilon-net estimator. Statist. Decisions 24 373-395. · Zbl 1111.62003 [28] van der Vaart, A. W., Dudoit, S. and van der Laan, M. J. (2006). Oracle inequalities for multi-fold cross-validation. Statist. Decisions 24 351-371. · Zbl 1117.62042 [29] Wahba, G. (1990). Spline Models for Observational Data . SIAM, Philadelphia. · Zbl 0813.62001 [30] Wegkamp, M. (2003). Model selection in nonparametric regression. Ann. Statist. 31 252-273. · Zbl 1019.62037 [31] Weisberg, S. (2005). Applied Linear Regression , 3rd ed. Wiley, Hoboken, NJ. · Zbl 1068.62077 [32] Wong, W. H. (1983). On the consistency of cross-validation in kernel nonparametric regression. Ann. Statist. 11 1136-1141. · Zbl 0539.62046 [33] Yang, Y. (2001). Adaptive regression by mixing. J. Amer. Statist. Assoc. 96 574-588. · Zbl 1018.62033 [34] Yang, Y. (2003). Regression with multiple candidate models: Selecting or mixing? Statist. Sinica 13 783-809. · Zbl 1028.62021 [35] Zhang, P. (1993). Model selection via multifold cross validation. Ann. Statist. 21 299-313. · Zbl 0770.62053
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.