×

A selective review of group selection in high-dimensional models. (English) Zbl 1331.62347

Summary: Grouping structures arise naturally in many statistical modeling problems. Several methods have been proposed for variable selection that respect grouping structure in variables. Examples include the group LASSO and several concave group selection methods. In this article, we give a selective review of group selection concerning methodological developments, theoretical properties and computational algorithms. We pay particular attention to group selection methods involving concave penalties. We address both group selection and bi-level selection methods. We describe several applications of these methods in nonparametric additive models, semiparametric regression, seemingly unrelated regressions, genomic data analysis and genome wide association studies. We also highlight some issues that require further study.

MSC:

62J07 Ridge regression; shrinkage estimators (Lasso)
62G08 Nonparametric regression and quantile regression

Software:

hgam; sparsenet
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory ( Tsahkadsor , 1971) 267-281. Akadémiai Kiadó, Budapest. · Zbl 0283.62006
[2] Antoniadis, A. (1996). Smoothing noisy data with tapered coiflets series. Scand. J. Statist. 23 313-330. · Zbl 0861.62028
[3] Antoniadis, A. and Fan, J. (2001). Regularization of wavelet approximations. J. Amer. Statist. Assoc. 96 939-967. · Zbl 1072.62561 · doi:10.1198/016214501753208942
[4] Argyriou, A., Evgeniou, T. and Pontil, M. (2008). Convex multi-task feature learning. Mach. Learn. 73 243-272.
[5] Bach, F. R. (2008). Consistency of the group lasso and multiple kernel learning. J. Mach. Learn. Res. 9 1179-1225. · Zbl 1225.68147
[6] Bakin, S. (1999). Adaptive regression and model selection in data mining problems. Ph.D. thesis, Australian National Univ., Canberra.
[7] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705-1732. · Zbl 1173.62022 · doi:10.1214/08-AOS620
[8] Bickel, P. J., Klaassen, C. A. J., Ritov, Y. and Wellner, J. A. (1993). Efficient and Adaptive Estimation for Semiparametric Models . Johns Hopkins Univ. Press, Baltimore, MD. · Zbl 0786.62001
[9] Breheny, P. and Huang, J. (2009). Penalized methods for bi-level variable selection. Stat. Interface 2 369-380. · Zbl 1245.62034 · doi:10.4310/SII.2009.v2.n3.a10
[10] Breheny, P. and Huang, J. (2011). Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5 232-253. · Zbl 1220.62095 · doi:10.1214/10-AOAS388
[11] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-dimensional Data : Methods , Theory and Applications . Springer, Heidelberg. · Zbl 1273.62015 · doi:10.1007/978-3-642-20192-9
[12] Caruana, R. (1997). Multitask learning: A knowledge-based source of inductive bias. Machine Learning 28 41-75.
[13] Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425-455. · Zbl 0815.62019 · doi:10.1093/biomet/81.3.425
[14] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407-499. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[15] Engle, R. F., Granger, C. W. J., Rice, J. and Weiss, A. (1986). Semiparametric estimates of the relation between weather and electricity sales. J. Amer. Statist. Assoc. 81 310-320.
[16] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1360. · Zbl 1073.62547 · doi:10.1198/016214501753382273
[17] Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928-961. · Zbl 1092.62031 · doi:10.1214/009053604000000256
[18] Frank, I. E. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools (with discussion). Technometrics 35 109-148. · Zbl 0775.62288 · doi:10.2307/1269656
[19] Friedman, J., Hastie, T. and Tibshirani, R. (2010). A note on the group lasso and a sparse group lasso. Preprint, Dept. Statistics, Stanford Univ.
[20] Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Stat. 1 302-332. · Zbl 1378.90064 · doi:10.1214/07-AOAS131
[21] Fu, W. J. (1998). Penalized regressions: The bridge versus the lasso. J. Comput. Graph. Statist. 7 397-416.
[22] Härdle, W., Liang, H. and Gao, J. (2000). Partially Linear Models. Contributions to Statistics . Physica, Heidelberg. · Zbl 0968.62006
[23] Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Monographs on Statistics and Applied Probability 43 . Chapman & Hall, London. · Zbl 0747.62061
[24] Hoover, D. R., Rice, J. A., Wu, C. O. and Yang, L.-P. (1998). Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika 85 809-822. · Zbl 0921.62045 · doi:10.1093/biomet/85.4.809
[25] Huang, J., Horowitz, J. L. and Ma, S. (2008). Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Ann. Statist. 36 587-613. · Zbl 1133.62048 · doi:10.1214/009053607000000875
[26] Huang, J., Horowitz, J. L. and Wei, F. (2010). Variable selection in nonparametric additive models. Ann. Statist. 38 2282-2313. · Zbl 1202.62051 · doi:10.1214/09-AOS781
[27] Huang, J., Wei, F. and Ma, S. (2011). Semiparametric regression pursuit. Statist. Sinica. · Zbl 1253.62024
[28] Huang, J. and Zhang, T. (2010). The benefit of group sparsity. Ann. Statist. 38 1978-2004. · Zbl 1202.62052 · doi:10.1214/09-AOS778
[29] Huang, J., Ma, S., Xie, H. and Zhang, C.-H. (2009). A group bridge approach for variable selection. Biometrika 96 339-355. · Zbl 1163.62050 · doi:10.1093/biomet/asp020
[30] Jacob, L., Obozinski, G. and Vert, J. P. (2009). Group lasso with overlap and graph lasso. In Proceedings of the 26 th Annual International Conference on Machine Learning 433-440. ACM, New York.
[31] Koltchinskii, V. (2009). The Dantzig selector and sparsity oracle inequalities. Bernoulli 15 799-828. · Zbl 1452.62486 · doi:10.3150/09-BEJ187
[32] Lange, K., Hunter, D. R. and Yang, I. (2000). Optimization transfer using surrogate objective functions. J. Comput. Graph. Statist. 9 1-59.
[33] Laurent, B. and Massart, P. (2000). Adaptive estimation of a quadratic functional by model selection. Ann. Statist. 28 1302-1338. · Zbl 1105.62328 · doi:10.1214/aos/1015957395
[34] Leng, C., Lin, Y. and Wahba, G. (2006). A note on the lasso and related procedures in model selection. Statist. Sinica 16 1273-1284. · Zbl 1109.62056
[35] Lin, Y. and Zhang, H. H. (2006). Component selection and smoothing in multivariate nonparametric regression. Ann. Statist. 34 2272-2297. · Zbl 1106.62041 · doi:10.1214/009053606000000722
[36] Liu, J. and Ye, J. (2010). Fast overlapping group Lasso. Available at .
[37] Lounici, K., Pontil, M., Tsybakov, A. B. and van de Geer, S. (2009). Taking advantage of sparsity in multi-task learning. Knowledge and Information Systems 20 109-348.
[38] Lounici, K., Pontil, M., van de Geer, S. and Tsybakov, A. B. (2011). Oracle inequalities and optimal inference under group sparsity. Ann. Statist. 39 2164-2204. · Zbl 1306.62156 · doi:10.1214/11-AOS896
[39] Ma, S. and Huang, J. (2009). Regularized gene selection in cancer microarray meta-analysis. BMC Bioinformatics 10 1.
[40] Ma, S., Huang, J. and Moran, M. S. (2009). Identification of genes associated with multiple cancers via integrative analysis. BMC Genomics 10 535.
[41] Ma, S., Huang, J. and Song, X. (2010). Integrative analysis and variable selection with multiple high-dimensional datasets. Biostatistics 12 763-775. · Zbl 1314.62243 · doi:10.1093/biostatistics/kxr004
[42] Ma, S., Huang, J., Wei, F., Xie, Y. and Fang, K. (2011). Integrative analysis of multiple cancer prognosis studies with gene expression measurements. Stat. Med. 30 3361-3371. · doi:10.1002/sim.4337
[43] Mazumder, R., Friedman, J. H. and Hastie, T. (2011). SparseNet : Coordinate descent with nonconvex penalties. J. Amer. Statist. Assoc. 106 1125-1138. · Zbl 1229.62091 · doi:10.1198/jasa.2011.tm09738
[44] Meier, L., van de Geer, S. and Bühlmann, P. (2008). The group Lasso for logistic regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 53-71. · Zbl 1400.62276 · doi:10.1111/j.1467-9868.2007.00627.x
[45] Meier, L., van de Geer, S. and Bühlmann, P. (2009). High-dimensional additive modeling. Ann. Statist. 37 3779-3821. · Zbl 1360.62186 · doi:10.1214/09-AOS692
[46] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281
[47] Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B Stat. Methodol. 72 417-473. · doi:10.1111/j.1467-9868.2010.00740.x
[48] Nardi, Y. and Rinaldo, A. (2008). On the asymptotic properties of the group lasso estimator for linear models. Electron. J. Stat. 2 605-633. · Zbl 1320.62167 · doi:10.1214/08-EJS200
[49] Obozinski, G., Wainwright, M. J. and Jordan, M. I. (2011). Support union recovery in high-dimensional multivariate regression. Ann. Statist. 39 1-47. · Zbl 1373.62372 · doi:10.1214/09-AOS776
[50] Pan, W., Xie, B. and Shen, X. (2010). Incorporating predictor network in penalized regression with application to microarray data. Biometrics 66 474-484. · Zbl 1192.62235 · doi:10.1111/j.1541-0420.2009.01296.x
[51] Peng, J., Zhu, J., Bergamaschi, A., Han, W., Noh, D.-Y., Pollack, J. R. and Wang, P. (2010). Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Ann. Appl. Stat. 4 53-77. · Zbl 1189.62174 · doi:10.1214/09-AOAS271
[52] Percival, D. (2011). Theoretical properties of the overlapping groups lasso. Available at . · Zbl 1334.62131 · doi:10.1214/12-EJS672
[53] Puig, A., Wiesel, A. and Hero, A. (2011). A multidimensional shrinkage-thresholding operator. IEEE Signal Process. Lett. 18 363-366.
[54] Ravikumar, P., Lafferty, J., Liu, H. and Wasserman, L. (2009). Sparse additive models. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 1009-1030. · doi:10.1111/j.1467-9868.2009.00718.x
[55] Rice, J. A. (2004). Functional and longitudinal data analysis: Perspectives on smoothing. Statist. Sinica 14 631-647. · Zbl 1073.62033
[56] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461-464. · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[57] Shen, X., Zhu, Y. and Pan, W. (2011). Necessary and sufficient conditions towards feature selection consistency and sharp parameter estimation. Preprint, School of Statistics, Univ. Minnesota.
[58] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538
[59] Tseng, P. (2001). Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109 475-494. · Zbl 1006.65062 · doi:10.1023/A:1017501703105
[60] van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360-1392. · Zbl 1327.62425 · doi:10.1214/09-EJS506
[61] Wang, L., Chen, G. and Li, H. (2007). Group SCAD regression analysis for microarray time course gene expression data. Bioinformatics 23 1486-1494.
[62] Wang, H. and Xia, Y. (2009). Shrinkage estimation of the varying coefficient model. J. Amer. Statist. Assoc. 104 747-757. · Zbl 1388.62213 · doi:10.1198/jasa.2009.0138
[63] Wei, F. and Huang, J. (2010). Consistent group selection in high-dimensional linear regression. Bernoulli 16 1369-1384. · Zbl 1207.62146 · doi:10.3150/10-BEJ252
[64] Wei, F., Huang, J. and Li, H. (2011). Variable selection and estimation in high-dimensional varying-coefficient models. Statist. Sinica 21 1515-1540. · Zbl 1225.62056 · doi:10.5705/ss.2009.316
[65] Wei, Z. and Li, H. (2007). Nonparametric pathway-based regression models for analysis of genomic data. Biostatistics 8 265-284. · Zbl 1129.62107 · doi:10.1093/biostatistics/kxl007
[66] Wu, T. T. and Lange, K. (2008). Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2 224-244. · Zbl 1137.62045 · doi:10.1214/07-AOAS174SUPP
[67] Xue, L., Qu, A. and Zhou, J. (2010). Consistent model selection for marginal generalized additive model for correlated data. J. Amer. Statist. Assoc. 105 1518-1530. · Zbl 1388.62223 · doi:10.1198/jasa.2010.tm10128
[68] Ye, F. and Zhang, C.-H. (2010). Rate minimaxity of the Lasso and Dantzig selector for the \(\ell_{q}\) loss in \(\ell_{r}\) balls. J. Mach. Learn. Res. 11 3519-3540. · Zbl 1242.62074
[69] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49-67. · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x
[70] Zellner, A. (1962). An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias. J. Amer. Statist. Assoc. 57 348-368. · Zbl 0113.34902 · doi:10.2307/2281644
[71] Zhang, T. (2009). Some sharp performance bounds for least squares regression with \(L_{1}\) regularization. Ann. Statist. 37 2109-2144. · Zbl 1173.62029 · doi:10.1214/08-AOS659
[72] Zhang, C.-H. (2010a). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894-942. · Zbl 1183.62120 · doi:10.1214/09-AOS729
[73] Zhang, T. (2010b). Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11 1081-1107. · Zbl 1242.68262
[74] Zhang, C.-H. and Zhang, T. (2011). General theory of concave regularization for high dimensional sparse estimation problems. Preprint, Dept. Statistics and Biostatistics, Rutgers Univ.
[75] Zhang, H. H., Cheng, G. and Liu, Y. (2011). Linear or nonlinear? Automatic structure discovery for partially linear models. J. Amer. Statist. Assoc. 106 1099-1112. · Zbl 1229.62051 · doi:10.1198/jasa.2011.tm10281
[76] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567-1594. · Zbl 1142.62044 · doi:10.1214/07-AOS520
[77] Zhang, H. H. and Lin, Y. (2006). Component selection and smoothing for nonparametric regression in exponential families. Statist. Sinica 16 1021-1041. · Zbl 1107.62036
[78] Zhao, P., Rocha, G. and Yu, B. (2009). The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Statist. 37 3468-3497. · Zbl 1369.62164 · doi:10.1214/07-AOS584
[79] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541-2563. · Zbl 1222.62008
[80] Zhou, N. and Zhu, J. (2010). Group variable selection via a hierarchical lasso and its oracle property. Stat. Interface 3 557-574. · Zbl 1245.62183 · doi:10.4310/SII.2010.v3.n4.a13
[81] Zhou, H., Sehl, M. E., Sinsheimer, J. S. and Lange, L. (2010). Association screening of common and rare genetic variants by penalized regression. Bioinformatics 26 2375-2382. · Zbl 1103.68978 · doi:10.1007/s11741-003-0012-0
[82] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418-1429. · Zbl 1171.62326 · doi:10.1198/016214506000000735
[83] Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Ann. Statist. 36 1509-1533. · Zbl 1142.62027 · doi:10.1214/009053607000000802
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.