×

One-step sparse estimates in nonconcave penalized likelihood models. (English) Zbl 1142.62027

Summary: J. Fan and R. Li [J. Am. Stat. Assoc. 96, No. 56, 1348–1360 (2001; Zbl 1073.62547)] proposed a family of variable selection methods via penalized likelihood using concave penalty functions. The nonconcave penalized likelihood estimators enjoy the oracle properties, but maximizing the penalized likelihood function is computationally challenging, because the objective function is nondifferentiable and nonconcave. We propose a new unified algorithm based on the local linear approximation (LLA) for maximizing the penalized likelihood for a broad class of concave penalty functions. Convergence and other theoretical properties of the LLA algorithm are established. A distinguished feature of the LLA algorithm is that at each LLA step, the LLA estimator can naturally adopt a sparse representation. Thus, we suggest using the one-step LLA estimator from the LLA algorithm as the final estimates.
Statistically, we show that if the regularization parameter is appropriately chosen, the one-step LLA estimates enjoy the oracle properties with good initial estimators. Computationally, the one-step LLA estimation methods dramatically reduce the computational cost in maximizing the nonconcave penalized likelihood. We conduct some Monte Carlo simulation to assess the finite sample performance of the one-step sparse estimation methods. The results are very encouraging.

MSC:

62G08 Nonparametric regression and quantile regression
65C60 Computational problems in statistics (MSC2010)
62J05 Linear regression; mixed models
62J07 Ridge regression; shrinkage estimators (Lasso)
65C05 Monte Carlo methods
62G20 Asymptotic properties of nonparametric inference

Citations:

Zbl 1073.62547

References:

[1] Antoniadis, A. and Fan, J. (2001). Regularization of wavelets approximations. J. Amer. Statist. Assoc. 96 939-967. JSTOR: · Zbl 1072.62561 · doi:10.1198/016214501753208942
[2] Bickel, P. J. (1975). One-step Huber estimates in the linear model. J. Amer. Statist. Assoc. 70 428-434. JSTOR: · Zbl 0322.62038 · doi:10.2307/2285834
[3] Black, A. and Zisserman, A. (1987). Visual Reconstruction . MIT Press, Cambridge, MA.
[4] Breiman, L. (1996). Heuristics of instability and stabilization in model selection. Ann. Statist. 24 2350-2383. · Zbl 0867.62055 · doi:10.1214/aos/1032181158
[5] Cai, J., Fan, J., Li, R. and Zhou, H. (2005). Variable selection for multivariate failure time data. Biometrika 92 303-316. · Zbl 1094.62123 · doi:10.1093/biomet/92.2.303
[6] Cai, J., Fan, J., Zhou, H. and Zhou, Y. (2007). Marginal hazard models with varying-coefficients for multivariate failure time data. Ann. Statist. 35 324-354. · Zbl 1114.62104 · doi:10.1214/009053606000001145
[7] Cai, Z., Fan, J. and Li, R. (2000). Efficient estimation and inferences for varying-coefficient models. J. Amer. Statist. Assoc. 95 888-902. JSTOR: · Zbl 0999.62052 · doi:10.2307/2669472
[8] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407-499. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[9] Fan, J. and Chen, J. (1999). One-step local quasi-likelihood estimation. J. Roy. Statist. Soc. Ser. B 61 927-943. JSTOR: · Zbl 0940.62039 · doi:10.1111/1467-9868.00211
[10] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1360. JSTOR: · Zbl 1073.62547 · doi:10.1198/016214501753382273
[11] Fan, J. and Li, R. (2002). Variable selection for Cox’s proportional hazards model and frailty model. Ann. Statist. 30 74-99. · Zbl 1012.62106 · doi:10.1214/aos/1015362185
[12] Fan, J. and Li, R. (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J. Amer. Statist. Assoc. 99 710-723. · Zbl 1117.62329 · doi:10.1198/016214504000001060
[13] Fan, J. and Li, R. (2006). Statistical challenges with high dimensionality: Feature selection in knowledge discovery. In Proceedings of the Madrid International Congress of Mathematicians 2006 3 595-622. EMS, Zürich. · Zbl 1117.62137
[14] Fan, J., Lin, H. and Zhou, Y. (2006). Local partial likelihood estimation for life time data. Ann. Statist. 34 290-325. · Zbl 1091.62099 · doi:10.1214/009053605000000796
[15] Fan, J. and Peng, H. (2004). On non-concave penalized likelihood with diverging number of parameters. Ann. Statist. 32 928-961. · Zbl 1092.62031 · doi:10.1214/009053604000000256
[16] Frank, I. and Friedman, J. (1993). A statistical view of some chemometrics regression tools. Technometrics 35 109-148. · Zbl 0775.62288 · doi:10.2307/1269656
[17] Fu, W. (1998). Penalized regression: The bridge versus the lasso. J. Comput. Graph. Statist. 7 397-416. JSTOR: · doi:10.2307/1390712
[18] Geyer, C. (1994). On the asymptotics of constrainted M -estimation. Ann. Statist. 22 1993-2010. · Zbl 0829.62029 · doi:10.1214/aos/1176325768
[19] Heiser, W. (1995). Convergent Computation by Iterative Majorization : Theory and Applications in Multidimensional Data Analysis . Clarendon Press, Oxford.
[20] Hunter, D. and Li, R. (2005). Variable selection using mm algorithms. Ann. Statist. 33 1617-1642. · Zbl 1078.62028 · doi:10.1214/009053605000000200
[21] Knight, K. and Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356-1378. · Zbl 1105.62357 · doi:10.1214/aos/1015957397
[22] Lange, K. (1995). A gradient algorithm locally equivalent to the EM algorithm. J. Roy. Statist. Soc. Ser. B 57 425-437. JSTOR: · Zbl 0813.62021
[23] Lange, K., Hunter, D. and Yang, I. (2000). Optimization transfer using surrogate objective functions (with discussion). J. Comput. Graph. Statist. 9 1-59. JSTOR: · doi:10.2307/1390605
[24] Lehmann, E. and Casella, G. (1998). Theory of Point Estimation , 2nd ed. Springer, Berlin. · Zbl 0916.62017
[25] Leng, C., Lin, Y. and Wahba, G. (2006). A note on the lasso and related procedures in model selection. Statist. Sinica 16 1273-1284. · Zbl 1109.62056
[26] Li, R. and Liang, H. (2008). Variable selection in semiparametric regression modeling. Ann. Statist. 36 261-286. · Zbl 1132.62027 · doi:10.1214/009053607000000604
[27] Mike, W. (1984). Outlier models and prior distributions in Bayesian linear regression. J. Roy. Statist. Soc. Ser. B 46 431-439. JSTOR: · Zbl 0567.62022
[28] Miller, A. (2002). Subset Selection in Regression , 2nd ed. Chapman and Hall, London. · Zbl 1051.62060
[29] Osborne, M., Presnell, B. and Turlach, B. (2000). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389-403. · Zbl 0962.65036 · doi:10.1093/imanum/20.3.389
[30] Robinson, P. (1988). The stochastic difference between econometrics and statistics. Econometrics 56 531-547. JSTOR: · Zbl 0722.62067 · doi:10.2307/1911699
[31] Rosset, S. and Zhu, J. (2007). Piecewise linear regularized solution paths. Ann. Statist. 35 1012-1030. · Zbl 1194.62094 · doi:10.1214/009053606000001370
[32] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. JSTOR: · Zbl 0850.62538
[33] Wu, Y. (2000). Optimization transfer using surrogate objective functions: Discussion. J. Comput. Graph. Statist. 9 32-34. JSTOR: · doi:10.2307/1390605
[34] Yuan, M. and Lin, Y. (2005). Efficient empirical Bayes variable selection and estimation in linear models. J. Amer. Statist. Assoc. 100 1215-1225. · Zbl 1117.62453 · doi:10.1198/016214505000000367
[35] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. Roy. Statist. Soc. Ser. B 68 49-67. · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x
[36] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. Roy. Statist. Soc. Ser. B 67 301-320. · Zbl 1069.62054 · doi:10.1111/j.1467-9868.2005.00503.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.