×

Asymptotic optimality and efficient computation of the leave-subject-out cross-validation. (English) Zbl 1296.62096

Summary: Although the leave-subject-out cross-validation (CV) has been widely used in practice for tuning parameter selection for various nonparametric and semiparametric models of longitudinal data, its theoretical property is unknown and solving the associated optimization problem is computationally expensive, especially when there are multiple tuning parameters. In this paper, by focusing on the penalized spline method, we show that the leave-subject-out CV is optimal in the sense that it is asymptotically equivalent to the empirical squared error loss function minimization. An efficient Newton-type algorithm is developed to compute the penalty parameters that optimize the CV criterion. Simulated and real data are used to demonstrate the effectiveness of the leave-subject-out CV in selecting both the penalty parameters and the working correlation matrix.

MSC:

62G08 Nonparametric regression and quantile regression
62G05 Nonparametric estimation
62G20 Asymptotic properties of nonparametric inference
62H12 Estimation in multivariate analysis
41A15 Spline approximation

Software:

gamair; gss

References:

[1] Anderson, T. W. and Das Gupta, S. (1963). Some inequalities on characteristic roots of matrices. Biometrika 50 522-524. · Zbl 0133.41602 · doi:10.1093/biomet/50.3-4.522
[2] Bénasséni, J. (2002). A complementary proof of an eigenvalue property in correspondence analysis. Linear Algebra Appl. 354 49-51. · Zbl 1016.62501 · doi:10.1016/S0024-3795(02)00349-X
[3] Cai, T. T. and Yuan, M. (2011). Optimal estimation of the mean function based on discretely sampled functional data: Phase transition. Ann. Statist. 39 2330-2355. · Zbl 1231.62040 · doi:10.1214/11-AOS898
[4] Chiang, C.-T., Rice, J. A. and Wu, C. O. (2001). Smoothing spline estimation for varying coefficient models with repeatedly measured dependent variables. J. Amer. Statist. Assoc. 96 605-619. · Zbl 1018.62034 · doi:10.1198/016214501753168280
[5] Claeskens, G., Krivobokova, T. and Opsomer, J. D. (2009). Asymptotic properties of penalized spline estimators. Biometrika 96 529-544. · Zbl 1170.62031 · doi:10.1093/biomet/asp035
[6] Craven, P. and Wahba, G. (1979). Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31 377-403. · Zbl 0377.65007 · doi:10.1007/BF01404567
[7] Diggle, P. J., Heagerty, P. J., Liang, K.-Y. and Zeger, S. L. (2002). Analysis of Longitudinal Data , 2nd ed. Oxford Statistical Science Series 25 . Oxford Univ. Press, Oxford. · Zbl 1031.62002
[8] Fan, J. and Zhang, J.-T. (2000). Two-step estimation of functional linear models with applications to longitudinal data. J. R. Stat. Soc. Ser. B Stat. Methodol. 62 303-322. · doi:10.1111/1467-9868.00233
[9] Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and Generalized Linear Models : A Roughness Penalty Approach. Monographs on Statistics and Applied Probability 58 . Chapman & Hall, London. · Zbl 0832.62032
[10] Gu, C. (2002). Smoothing Spline ANOVA Models . Springer, New York. · Zbl 1051.62034
[11] Gu, C. and Ma, P. (2005). Optimal smoothing in nonparametric mixed-effect models. Ann. Statist. 33 1357-1379. · Zbl 1072.62027 · doi:10.1214/009053605000000110
[12] Gu, C. and Wahba, G. (1991). Minimizing GCV/GML scores with multiple smoothing parameters via the Newton method. SIAM J. Sci. Statist. Comput. 12 383-398. · Zbl 0727.65009 · doi:10.1137/0912021
[13] Han, C. and Gu, C. (2008). Optimal smoothing with correlated data. Sankhyā 70 38-72. · Zbl 1193.62066
[14] Hoover, D. R., Rice, J. A., Wu, C. O. and Yang, L.-P. (1998). Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika 85 809-822. · Zbl 0921.62045 · doi:10.1093/biomet/85.4.809
[15] Huang, J. Z., Wu, C. O. and Zhou, L. (2002). Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika 89 111-128. · Zbl 0998.62024 · doi:10.1093/biomet/89.1.111
[16] Kaslow, R. A., Ostrow, D. G., Detels, R., Phair, J. P., Polk, B. F. and Rinaldo, C. R. Jr. (1987). The multicenter AIDS Cohort study: Rationale, organization, and selected characteristics of the participants. Am. J. Epidemiol. 126 310-318.
[17] Li, K.-C. (1986). Asymptotic optimality of \(C_L\) and generalized cross-validation in ridge regression with application to spline smoothing. Ann. Statist. 14 1101-1112. · Zbl 0629.62043 · doi:10.1214/aos/1176350052
[18] Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73 13-22. · Zbl 0595.62110 · doi:10.1093/biomet/73.1.13
[19] Lin, X. and Carroll, R. J. (2000). Nonparametric function estimation for clustered data when the predictor is measured without/with error. J. Amer. Statist. Assoc. 95 520-534. · Zbl 0995.62043 · doi:10.2307/2669396
[20] Lin, D. Y. and Ying, Z. (2001). Semiparametric and nonparametric regression analysis of longitudinal data. J. Amer. Statist. Assoc. 96 103-126. · Zbl 1015.62038 · doi:10.1198/016214501750333018
[21] Rice, J. A. and Silverman, B. W. (1991). Estimating the mean and covariance structure nonparametrically when the data are curves. J. Roy. Statist. Soc. Ser. B 53 233-243. · Zbl 0800.62214
[22] Wang, Y. (1998). Mixed effects smoothing spline analysis of variance. J. R. Stat. Soc. Ser. B Stat. Methodol. 60 159-174. · Zbl 0909.62034 · doi:10.1111/1467-9868.00115
[23] Wang, N. (2003). Marginal nonparametric kernel regression accounting for within-subject correlation. Biometrika 90 43-52. · Zbl 1034.62035 · doi:10.1093/biomet/90.1.43
[24] Wang, N., Carroll, R. J. and Lin, X. (2005). Efficient semiparametric marginal estimation for longitudinal/clustered data. J. Amer. Statist. Assoc. 100 147-157. · Zbl 1117.62440 · doi:10.1198/016214504000000629
[25] Wang, L., Li, H. and Huang, J. Z. (2008). Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. J. Amer. Statist. Assoc. 103 1556-1569. · Zbl 1286.62034 · doi:10.1198/016214508000000788
[26] Welsh, A. H., Lin, X. and Carroll, R. J. (2002). Marginal longitudinal nonparametric regression: Locality and efficiency of spline and kernel methods. J. Amer. Statist. Assoc. 97 482-493. · Zbl 1073.62529 · doi:10.1198/016214502760047014
[27] Wood, S. N. (2004). Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Amer. Statist. Assoc. 99 673-686. · Zbl 1117.62445 · doi:10.1198/016214504000000980
[28] Wood, S. N. (2006). Generalized Additive Models : An Introduction with \(R\) . Chapman & Hall/CRC, Boca Raton, FL. · Zbl 1087.62082
[29] Wu, C. O. and Chiang, C.-T. (2000). Kernel smoothing on varying coefficient models with longitudinal dependent variable. Statist. Sinica 10 433-456. · Zbl 0945.62047
[30] Wu, H. and Zhang, J.-T. (2006). Nonparametric Regression Methods for Longitudinal Data Analysis . Wiley, Hoboken, NJ. · Zbl 1127.62041 · doi:10.1002/0470009675
[31] Xu, G. and Huang, J. Z. (2012). Supplement to “Asymptotic optimality and efficient computation of the leave-subject-out cross-validation.” . · Zbl 1296.62096
[32] Zeger, S. L. and Diggle, P. J. (1994). Semiparametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters. Biometrics 50 689-699. · Zbl 0821.62093 · doi:10.2307/2532783
[33] Zhang, D., Lin, X., Raz, J. and Sowers, M. (1998). Semiparametric stochastic mixed models for longitudinal data. J. Amer. Statist. Assoc. 93 710-719. · Zbl 0918.62039 · doi:10.2307/2670121
[34] Zhu, Z., Fung, W. K. and He, X. (2008). On the asymptotics of marginal regression splines with longitudinal data. Biometrika 95 907-917. · Zbl 1437.62679 · doi:10.1093/biomet/asn041
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.