I-LAMM for sparse learning: simultaneous control of algorithmic complexity and statistical error.(English)Zbl 1392.62215

This paper proposes a general computational framework for solving nonconvex optimisation problems such as the penalized M-estimator $$\mathrm{argmin}_{\beta\in{\mathbb R}^d}\{ {\mathcal L}(\beta) + {\mathcal R}_{\lambda}(\beta)\}$$, where $${\mathcal L}(\beta)$$ is a smooth loss function, $${\mathcal R}_{\lambda}(\beta)$$ is a sparsity-inducing penalty with a regularization parameter $$\lambda$$. The proposed strategy enables the simultaneous control of the algorithmic complexity and the statistical error when fitting high-dimensional models appearing in various problems including low rank matrix completion problems, high-dimensional graphical models and quantile regression.

MSC:

 62J07 Ridge regression; shrinkage estimators (Lasso) 62C20 Minimax procedures in statistical decision theory 62G08 Nonparametric regression and quantile regression
Full Text:

References:

 [1] Agarwal, A., Negahban, S. and Wainwright, M. J. (2012). Fast global convergence of gradient methods for high-dimensional statistical recovery. Ann. Statist.40 2452-2482. · Zbl 1373.62244 [2] Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci.2 183-202. · Zbl 1175.94009 [3] Belloni, A. and Chernozhukov, V. (2013). Least squares after model selection in high-dimensional sparse models. Bernoulli 19 521-547. · Zbl 1456.62066 [4] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist.37 1705-1732. · Zbl 1173.62022 [5] Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge Univ. Press, Cambridge. · Zbl 1058.90049 [6] Breheny, P. and Huang, J. (2011). Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat.5 232-253. · Zbl 1220.62095 [7] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data. Springer, Heidelberg. [8] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat.1 169-194. · Zbl 1146.62028 [9] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc.96 1348-1360. · Zbl 1073.62547 [10] Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B. Stat. Methodol.70 849-911. · Zbl 1411.62187 [11] Fan, J. and Lv, J. (2011). Nonconcave penalized likelihood with NP-dimensionality. IEEE Trans. Inform. Theory 57 5467-5484. · Zbl 1365.62277 [12] Fan, J., Xue, L. and Zou, H. (2014). Strong oracle optimality of folded concave penalized estimation. Ann. Statist.42 819-849. · Zbl 1305.62252 [13] Fan, J., Liu, H., Sun, Q. and Zhang, T. (2018). Supplement to “I-LAMM for sparse learning: Simultaneous control of algorithmic complexity and statistical error.” DOI:10.1214/17-AOS1568SUPP. [14] Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Stat.1 302-332. · Zbl 1378.90064 [15] Hunter, D. R. and Lange, K. (2004). A tutorial on MM algorithms. Amer. Statist.58 30-37. [16] Kim, Y., Choi, H. and Oh, H.-S. (2008). Smoothly clipped absolute deviation on high dimensions. J. Amer. Statist. Assoc.103 1665-1673. · Zbl 1286.62062 [17] Kim, Y. and Kwon, S. (2012). Global optimality of nonconvex penalized estimators. Biometrika 99 315-325. · Zbl 1318.62240 [18] Lange, K., Hunter, D. R. and Yang, I. (2000). Optimization transfer using surrogate objective functions. J. Comput. Graph. Statist.9 1-59. [19] Loh, P.-L. (2017). Statistical consistency and asymptotic normality for high-dimensional robust $$M$$-estimators. Ann. Statist.45 866-896. · Zbl 1371.62023 [20] Loh, P.-L. and Wainwright, M. J. (2014). Support recovery without incoherence: A case for nonconvex regularization. To appear. Available at arXiv:1412.5632. · Zbl 1385.62008 [21] Loh, P.-L. and Wainwright, M. J. (2015). Regularized $$M$$-estimators with nonconvexity: Statistical and algorithmic theory for local optima. J. Mach. Learn. Res.16 559-616. · Zbl 1360.62276 [22] Lozano, A. C. and Meinshausen, N. (2013). Minimum distance estimation for robust high-dimensional regression. Available at arXiv:1307.3227. [23] Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of $$M$$-estimators with decomposable regularizers. Statist. Sci.27 538-557. · Zbl 1331.62350 [24] Nesterov, Y. (2013). Gradient methods for minimizing composite functions. Math. Program.140 125-161. · Zbl 1287.90067 [25] Raskutti, G., Wainwright, M. J. and Yu, B. (2010). Restricted eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res.11 2241-2259. · Zbl 1242.62071 [26] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538 [27] van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat.3 1360-1392. · Zbl 1327.62425 [28] Wang, L., Kim, Y. and Li, R. (2013). Calibrating nonconvex penalized regression in ultra-high dimension. Ann. Statist.41 2505-2536. · Zbl 1281.62106 [29] Wang, Z., Liu, H. and Zhang, T. (2014). Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. Ann. Statist.42 2164-2201. · Zbl 1302.62066 [30] Zhang, T. (2009). Some sharp performance bounds for least squares regression with $$L_{1}$$ regularization. Ann. Statist.37 2109-2144. · Zbl 1173.62029 [31] Zhang, C.-H. (2010a). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist.38 894-942. · Zbl 1183.62120 [32] Zhang, T. (2010b). Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res.11 1081-1107. · Zbl 1242.68262 [33] Zhang, C.-H. and Zhang, T. (2012). A general theory of concave regularization for high-dimensional sparse estimation problems. Statist. Sci.27 576-593. · Zbl 1331.62353 [34] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc.101 1418-1429. · Zbl 1171.62326 [35] Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Ann. Statist.36 1509-1533. · Zbl 1142.62027
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.