Ma, Shujie; Song, Qiongxia; Wang, Li Simultaneous variable selection and estimation in semiparametric modeling of longitudinal/clustered data. (English) Zbl 1259.62021 Bernoulli 19, No. 1, 252-274 (2013). Summary: We consider the problem of simultaneous variable selection and estimation in additive, partially linear models for longitudinal/clustered data. We propose an estimation procedure via polynomial splines to estimate the nonparametric components and apply proper penalty functions to achieve sparsity in the linear part. Under reasonable conditions, we obtain the asymptotic normality of the estimators for the linear components and the consistency of the estimators for the nonparametric components. We further demonstrate that, with proper choice of the regularization parameter, the penalized estimators of the non-zero coefficients achieve the asymptotic oracle property. The finite sample behavior of the penalized estimators is evaluated with simulation studies and illustrated by a longitudinal CD4 cell count data set. Cited in 24 Documents MSC: 62G08 Nonparametric regression and quantile regression 62G20 Asymptotic properties of nonparametric inference 62H12 Estimation in multivariate analysis 62P10 Applications of statistics to biology and medical sciences; meta analysis 65C60 Computational problems in statistics (MSC2010) Keywords:additive partially linear model; longitudinal data; model selection; penalized least squares; splines × Cite Format Result Cite Review PDF Full Text: DOI arXiv Euclid References: [1] Antoniadis, A. (1997). Wavelets in statistics: A review (with discussion). Italian Jour. Statist. 6 , 97-144. [2] Cai, J., Fan, J., Li, R. and Zhou, H. (2005). Variable selection for multivariate failure time data. Biometrika 92 303-316. · Zbl 1094.62123 · doi:10.1093/biomet/92.2.303 [3] Carroll, R.J., Maity, A., Mammen, E. and Yu, K. (2009). Nonparametric additive regression for repeatedly measured data. Biometrika 96 383-398. · Zbl 1163.62028 · doi:10.1093/biomet/asp015 [4] Chiou, J.M. and Müller, H.G. (2005). Estimated estimating equations: Semiparametric inference for clustered and longitudinal data. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 531-553. · Zbl 1095.62046 · doi:10.1111/j.1467-9868.2005.00514.x [5] Diggle, P.J., Heagerty, P.J., Liang, K.Y. and Zeger, S.L. (2002). Analysis of Longitudinal Data , 2nd ed. Oxford Statistical Science Series 25 . Oxford: Oxford Univ. Press. · Zbl 1031.62002 [6] Fan, J., Feng, Y. and Song, R. (2010). Nonparametric independence screening in sparse ultra-high dimensional additive models. J. Amer. Statist. Assoc. 106 544-557. · Zbl 1232.62064 · doi:10.1198/jasa.2011.tm09779 [7] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1360. · Zbl 1073.62547 · doi:10.1198/016214501753382273 [8] Fan, J. and Li, R. (2002). Variable selection for Cox’s proportional hazards model and frailty model. Ann. Statist. 30 74-99. · Zbl 1012.62106 · doi:10.1214/aos/1015362185 [9] Fan, Y. and Li, Q. (2003). A kernel-based method for estimating additive partially linear models. Statist. Sinica 13 739-762. · Zbl 1028.62023 [10] Fu, W.J. (2003). Penalized estimating equations. Biometrics 59 126-132. · Zbl 1210.62016 · doi:10.1111/1541-0420.00015 [11] Hall, P., Müller, H.G. and Wang, J.L. (2006). Properties of principal component methods for functional and longitudinal data analysis. Ann. Statist. 34 1493-1517. · Zbl 1113.62073 · doi:10.1214/009053606000000272 [12] Härdle, W., Liang, H. and Gao, J. (2000). Partially Linear Models : Contributions to Statistics . Heidelberg: Physica-Verlag. [13] Huang, J.Z., Wu, C.O. and Zhou, L. (2004). Polynomial spline estimation and inference for varying coefficient models with longitudinal data. Statist. Sinica 14 763-788. · Zbl 1073.62036 [14] Huang, J.Z., Zhang, L. and Zhou, L. (2007). Efficient estimation in marginal partially linear models for longitudinal/clustered data using splines. Scand. J. Statist. 34 451-477. · Zbl 1150.62020 · doi:10.1111/j.1467-9469.2006.00550.x [15] Li, Q. (2000). Efficient estimation of additive partially linear models. Internat. Econom. Rev. 41 1073-1092. · doi:10.1111/1468-2354.00096 [16] Li, R. and Liang, H. (2008). Variable selection in semiparametric regression modeling. Ann. Statist. 36 261-286. · Zbl 1132.62027 · doi:10.1214/009053607000000604 [17] Liang, H. and Li, R. (2009). Variable selection for partially linear models with measurement errors. J. Amer. Statist. Assoc. 104 234-248. · Zbl 1388.62208 · doi:10.1198/jasa.2009.0127 [18] Liang, H., Thurston, S.W., Ruppert, D., Apanasovich, T. and Hauser, R. (2008). Additive partial linear models with measurement errors. Biometrika 95 667-678. · Zbl 1437.62526 · doi:10.1093/biomet/asn024 [19] Liang, K.Y. and Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73 13-22. · Zbl 0595.62110 · doi:10.1093/biomet/73.1.13 [20] Lin, X. and Carroll, R.J. (2001). Semiparametric regression for clustered data. Biometrika 88 1179-1185. · Zbl 0994.62031 · doi:10.1093/biomet/88.4.1179 [21] Liu, X., Wang, L. and Liang, H. (2011). Estimation and variable selection for semiparametric additive partial linear models. Statist. Sinica. 21 1225-1248. · Zbl 1223.62020 · doi:10.5705/ss.2009.140 [22] Liu, Y. and Wu, Y. (2007). Variable selection via a combination of the \(L_{0}\) and \(L_{1}\) penalties. J. Comput. Graph. Statist. 16 782-798. [23] Ma, S., Song, Q. and Wang, L. (2011). Supplement to “Simultaneous variable selection and estimation in semiparametric modeling of longitudinal/clustered data”. . [24] Ma, S. and Yang, L. (2011). Spline-backfitted kernel smoothing of partially linear additive model. J. Statist. Plann. Inference 141 204-219. · Zbl 1197.62130 · doi:10.1016/j.jspi.2010.05.028 [25] Ma, Y. and Li, R. (2010). Variable selection in measurement error models. Bernoulli 16 274-300. · Zbl 1200.62071 · doi:10.3150/09-BEJ205 [26] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281 [27] Opsomer, J.D. and Ruppert, D. (1999). A root-n consistent backfitting estimator for semiparametric additive modelling. J. Comput. Graph. Statist. 8 715-734. [28] Pan, W. and Connett, J.E. (2002). Selecting the working correlation structure in generalized estimating equations with application to the lung health study. Statist. Sinica 12 475-490. · Zbl 1010.62104 [29] Stone, C.J. (1985). Additive regression and other nonparametric models. Ann. Statist. 13 689-705. · Zbl 0605.62065 · doi:10.1214/aos/1176349548 [30] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538 [31] Tibshirani, R. (1997). The lasso method for variable selection in the Cox model. Stat. Med. 16 385-395. [32] Wang, J.L., Xue, L., Zhu, L. and Chong, Y.S. (2010). Estimation for a partial-linear single-index model. Ann. Statist. 38 246-274. · Zbl 1181.62038 · doi:10.1214/09-AOS712 [33] Wang, L. and Yang, L. (2007). Spline-backfitted kernel smoothing of nonlinear additive autoregression model. Ann. Statist. 35 2474-2503. · Zbl 1129.62038 · doi:10.1214/009053607000000488 [34] Wang, N. (2003). Marginal nonparametric kernel regression accounting for within-subject correlation. Biometrika 90 43-52. · Zbl 1034.62035 · doi:10.1093/biomet/90.1.43 [35] Wang, N., Carroll, R.J. and Lin, X. (2005). Efficient semiparametric marginal estimation for longitudinal/clustered data. J. Amer. Statist. Assoc. 100 147-157. · Zbl 1117.62440 · doi:10.1198/016214504000000629 [36] Wu, Y. and Liu, Y. (2009). Variable selection in quantile regression. Statist. Sinica 19 801-817. · Zbl 1166.62012 [37] Xue, L. (2009). Consistent variable selection in additive models. Statist. Sinica 19 1281-1296. · Zbl 1166.62024 [38] Xue, L., Qu, A. and Zhou, J. (2010). Consistent model selection for marginal generalized additive model for correlated data. J. Amer. Statist. Assoc. 105 1518-1530. · Zbl 1388.62223 · doi:10.1198/jasa.2010.tm10128 [39] Xue, L. and Yang, L. (2006). Additive coefficient modeling via polynomial spline. Statist. Sinica 16 1423-1446. · Zbl 1109.62030 [40] Yang, Y. (2008). Localized model selection for regression. Econometric Theory 24 472-492. · Zbl 1284.62409 · doi:10.1017/S0266466608080195 [41] Yuan, M. and Lin, Y. (2007). On the non-negative garrote estimator. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 143-161. · Zbl 1120.62052 · doi:10.1111/j.1467-9868.2007.00581.x [42] Zeger, S.L. and Diggle, P.J. (1994). Semiparametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters. Biometrics 50 689-699. · Zbl 0821.62093 · doi:10.2307/2532783 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.