×

A general theory of concave regularization for high-dimensional sparse estimation problems. (English) Zbl 1331.62353

Summary: Concave regularization methods provide natural procedures for sparse recovery. However, they are difficult to analyze in the high-dimensional setting. Only recently a few sparse recovery results have been established for some specific local solutions obtained via specialized numerical procedures. Still, the fundamental relationship between these solutions such as whether they are identical or their relationship to the global minimizer of the underlying nonconvex formulation is unknown. The current paper fills this conceptual gap by presenting a general theoretical framework showing that, under appropriate conditions, the global solution of nonconvex regularization leads to desirable recovery performance; moreover, under suitable conditions, the global solution corresponds to the unique sparse local solution, which can be obtained via different numerical procedures. Under this unified framework, we present an overview of existing results and discuss their connections. The unified view of this work leads to a more satisfactory treatment of concave high-dimensional sparse estimation procedures, and serves as a guideline for developing further numerical procedures for concave regularization.

MSC:

62J07 Ridge regression; shrinkage estimators (Lasso)

Software:

foba; PDCO; sparsenet
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory ( Tsahkadsor , 1971) 267-281. Akadémiai Kiadó, Budapest. · Zbl 0283.62006
[2] Antoniadis, A. (2010). Comments on: \(\ell_{1}\)-penalization for mixture regression models. TEST 19 257-258. · Zbl 1203.62124 · doi:10.1007/s11749-010-0198-y
[3] Belloni, A., Chernozhukov, V. and Wang, L. (2011). Square-root lasso: Pivotal recovery of sparse signals via conic programming. Biometrika 98 791-806. · Zbl 1228.62083 · doi:10.1093/biomet/asr043
[4] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705-1732. · Zbl 1173.62022 · doi:10.1214/08-AOS620
[5] Breheny, P. and Huang, J. (2011). Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5 232-253. · Zbl 1220.62095 · doi:10.1214/10-AOAS388
[6] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data : Methods , Theory and Applications . Springer, Heidelberg. · Zbl 1273.62015 · doi:10.1007/978-3-642-20192-9
[7] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169-194. · Zbl 1146.62028 · doi:10.1214/07-EJS008
[8] Cai, T. T., Wang, L. and Xu, G. (2010). Shifting inequality and recovery of sparse signals. IEEE Trans. Signal Process. 58 1300-1308. · Zbl 1392.94117 · doi:10.1109/TSP.2009.2034936
[9] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when \(p\) is much larger than \(n\). Ann. Statist. 35 2313-2351. · Zbl 1139.62019 · doi:10.1214/009053606000001523
[10] Candès, E. J. and Plan, Y. (2009). Near-ideal model selection by \(\ell_{1}\) minimization. Ann. Statist. 37 2145-2177. · Zbl 1173.62053 · doi:10.1214/08-AOS653
[11] Candes, E. J. and Tao, T. (2005). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203-4215. · Zbl 1264.94121 · doi:10.1109/TIT.2005.858979
[12] Chen, S. S., Donoho, D. L. and Saunders, M. A. (2001). Atomic decomposition by basis pursuit. SIAM Rev. 43 129-159. · Zbl 0979.94010 · doi:10.1137/S003614450037906X
[13] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407-499. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[14] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1360. · Zbl 1073.62547 · doi:10.1198/016214501753382273
[15] Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928-961. · Zbl 1092.62031 · doi:10.1214/009053604000000256
[16] Frank, I. E. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools (with discussion). Technometrics 35 109-148. · Zbl 0775.62288 · doi:10.2307/1269656
[17] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971-988. · Zbl 1055.62078 · doi:10.3150/bj/1106314846
[18] Huang, J., Ma, S. and Zhang, C.-H. (2008). Adaptive Lasso for sparse high-dimensional regression models. Statist. Sinica 18 1603-1618. · Zbl 1255.62198
[19] Hunter, D. R. and Li, R. (2005). Variable selection using MM algorithms. Ann. Statist. 33 1617-1642. · Zbl 1078.62028 · doi:10.1214/009053605000000200
[20] Kim, Y., Choi, H. and Oh, H.-S. (2008). Smoothly clipped absolute deviation on high dimensions. J. Amer. Statist. Assoc. 103 1665-1673. · Zbl 1286.62062 · doi:10.1198/016214508000001066
[21] Knight, K. and Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356-1378. · Zbl 1105.62357 · doi:10.1214/aos/1015957397
[22] Koltchinskii, V. (2009). The Dantzig selector and sparsity oracle inequalities. Bernoulli 15 799-828. · Zbl 1452.62486 · doi:10.3150/09-BEJ187
[23] Liu, J., Wonka, P. and Ye, J. (2010). Multi-stage Dantzig selector. In Advances in Neural Information Processing Systems (J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel and A. Culotta, eds.) 23 1450-1458.
[24] Mallows, C. L. (1973). Some comments on Cp. Technometrics 12 661-675. · Zbl 0269.62061 · doi:10.2307/1267380
[25] Mazumder, R., Friedman, J. and Hastie, T. (2012). SparseNet: Coordinate descent with non-convex penalties. J. Amer. Statist. Assoc. 106 1125-1138. · Zbl 1229.62091 · doi:10.1198/jasa.2011.tm09738
[26] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281
[27] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246-270. · Zbl 1155.62050 · doi:10.1214/07-AOS582
[28] Osborne, M. R., Presnell, B. and Turlach, B. A. (2000). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389-403. · Zbl 0962.65036 · doi:10.1093/imanum/20.3.389
[29] Osborne, M. R., Presnell, B. and Turlach, B. A. (2000). On the LASSO and its dual. J. Comput. Graph. Statist. 9 319-337.
[30] Raskutti, G., Wainwright, M. J. and Yu, B. (2009). Minimax rates of estimation for high-dimensional linear regression over \(\ell _{q}\)-balls. IEEE Trans. Inform. Theory 57 6976-6994. · Zbl 1365.62276 · doi:10.1109/TIT.2011.2165799
[31] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461-464. · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[32] Städler, N., Bühlmann, P. and van de Geer, S. (2010). \(\ell_{1}\)-penalization for mixture regression models. TEST 19 209-256. · Zbl 1203.62128 · doi:10.1007/s11749-010-0197-z
[33] Sun, T. and Zhang, C.-H. (2010). Comments on: \(\ell _{1}\)-penalization for mixture regression models. Test 19 270-275. · Zbl 1203.62130 · doi:10.1007/s11749-010-0201-7
[34] Sun, T. and Zhang, C.-H. (2011). Scaled sparse linear regression. Technical report. Available at . arXiv:1104.4595 · Zbl 1452.62515
[35] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538
[36] Tropp, J. A. (2006). Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Trans. Inform. Theory 52 1030-1051. · Zbl 1288.94025 · doi:10.1109/TIT.2008.2009806
[37] van de Geer, S. (2007). The deterministic lasso. Technical Report 140. ETH Zurich, Switzerland.
[38] van de Geer, S. A. (2008). High-dimensional generalized linear models and the lasso. Ann. Statist. 36 614-645. · Zbl 1138.62323 · doi:10.1214/009053607000000929
[39] van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360-1392. · Zbl 1327.62425 · doi:10.1214/09-EJS506
[40] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using \(\ell_{1}\)-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183-2202. · Zbl 1367.62220 · doi:10.1109/TIT.2009.2016018
[41] Ye, F. and Zhang, C.-H. (2010). Rate minimaxity of the Lasso and Dantzig selector for the \(\ell_{q}\) loss in \(\ell_{r}\) balls. J. Mach. Learn. Res. 11 3519-3540. · Zbl 1242.62074
[42] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894-942. · Zbl 1183.62120 · doi:10.1214/09-AOS729
[43] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann. Statist. 36 1567-1594. · Zbl 1142.62044 · doi:10.1214/07-AOS520
[44] Zhang, C. H. and Zhang, T. (2012). Supplement to “A general theory of concave regularization for high dimensional sparse estimation problems.” .
[45] Zhang, T. (2009). Some sharp performance bounds for least squares regression with \(L_{1}\) regularization. Ann. Statist. 37 2109-2144. · Zbl 1173.62029 · doi:10.1214/08-AOS659
[46] Zhang, T. (2010). Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11 1081-1107. · Zbl 1242.68262
[47] Zhang, T. (2011). Adaptive forward-backward greedy algorithm for learning sparse representations. IEEE Trans. Inform. Theory 57 4689-4708. · Zbl 1365.62288 · doi:10.1109/TIT.2011.2146690
[48] Zhang, T. (2011). Multi-stage convex relaxation for feature selection. Technical report. Available at . arXiv:1106.0565
[49] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541-2563. · Zbl 1222.62008
[50] Zou, H. (2006). The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418-1429. · Zbl 1171.62326 · doi:10.1198/016214506000000735
[51] Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Ann. Statist. 36 1509-1533. · Zbl 1142.62027 · doi:10.1214/009053607000000802
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.