×

Fast global convergence of gradient methods for high-dimensional statistical recovery. (English) Zbl 1373.62244

Summary: Many statistical \(M\)-estimators are based on convex optimization problems formed by the combination of a data-dependent loss function with a norm-based regularizer. We analyze the convergence rates of projected gradient and composite gradient methods for solving such problems, working within a high-dimensional framework that allows the ambient dimension \(d\) to grow with (and possibly exceed) the sample size \(n\). Our theory identifies conditions under which projected gradient descent enjoys globally linear convergence up to the statistical precision of the model, meaning the typical distance between the true unknown parameter \(\theta^{*}\) and an optimal solution \(\widehat{\theta}\). By establishing these conditions with high probability for numerous statistical models, our analysis applies to a wide range of \(M\)-estimators, including sparse linear regression using Lasso; group Lasso for block sparsity; log-linear models with regularization; low-rank matrix recovery using nuclear norm regularization; and matrix decomposition using a combination of the nuclear and \(\ell_{1}\) norms. Overall, our analysis reveals interesting connections between statistical and computational efficiency in high-dimensional estimation.

MSC:

62H12 Estimation in multivariate analysis
62F30 Parametric inference under constraints
62J07 Ridge regression; shrinkage estimators (Lasso)
90C25 Convex programming

Software:

PDCO; NESTA
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Agarwal, A., Negahban, S. and Wainwright, M. J. (2012). Supplement to “Fast global convergence of gradient methods for high-dimensional statistical recovery.” . · Zbl 1373.62244
[2] Agarwal, A., Negahban, S. and Wainwright, M. J. (2012). Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions. Ann. Statist. 40 1171-1197. · Zbl 1274.62219 · doi:10.1214/12-AOS1000
[3] Amini, A. A. and Wainwright, M. J. (2009). High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Statist. 37 2877-2921. · Zbl 1173.62049 · doi:10.1214/08-AOS664
[4] Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 183-202. · Zbl 1175.94009 · doi:10.1137/080716542
[5] Becker, S., Bobin, J. and Candès, E. J. (2011). Nesta: A fast and accurate first-order method for sparse recovery. SIAM J. Imaging Sci. 4 1-39. · Zbl 1209.90265 · doi:10.1137/090756855
[6] Bertsekas, D. P. (1995). Nonlinear Programming . Athena Scientific, Belmont, MA. · Zbl 0935.90037
[7] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705-1732. · Zbl 1173.62022 · doi:10.1214/08-AOS620
[8] Boyd, S. and Vandenberghe, L. (2004). Convex Optimization . Cambridge Univ. Press, Cambridge. · Zbl 1058.90049
[9] Bredies, K. and Lorenz, D. A. (2008). Linear convergence of iterative soft-thresholding. J. Fourier Anal. Appl. 14 813-837. · Zbl 1175.65061 · doi:10.1007/s00041-008-9041-1
[10] Candès, E. J., Li, X., Ma, Y. and Wright, J. (2011). Robust principal component analysis? J. ACM 58 Art. 11, 37. · Zbl 1327.62369
[11] Candès, E. J. and Recht, B. (2009). Exact matrix completion via convex optimization. Found. Comput. Math. 9 717-772. · Zbl 1219.90124 · doi:10.1007/s10208-009-9045-5
[12] Chandrasekaran, V., Sanghavi, S., Parrilo, P. A. and Willsky, A. S. (2011). Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21 572-596. · Zbl 1226.90067 · doi:10.1137/090761793
[13] Chen, S. S., Donoho, D. L. and Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33-61. · Zbl 0919.94002 · doi:10.1137/S1064827596304010
[14] Duchi, J., Shalev-Shwartz, S., Singer, Y. and Chandra, T. (2008). Efficient projections onto the \(\ell _1\)-ball for learning in high dimensions. In ICML. Omnipress, Helsinki, Finland.
[15] Fazel, M. (2002). Matrix rank minimization with applications. Ph.D. thesis, Stanford. Available at .
[16] Garg, R. and Khandekar, R. (2009). Gradient descent with sparsification: An iterative algorithm for sparse recovery with restricted isometry property. In ICML. Omnipress, Montreal, Canada.
[17] Hale, E. T., Yin, W. and Zhang, Y. (2008). Fixed-point continuation for \(l_1\)-minimization: Methodology and convergence. SIAM J. Optim. 19 1107-1130. · Zbl 1180.65076 · doi:10.1137/070698920
[18] Hsu, D., Kakade, S. M. and Zhang, T. (2011). Robust matrix decomposition with sparse corruptions. IEEE Trans. Inform. Theory 57 7221-7234. · Zbl 1365.15018 · doi:10.1109/TIT.2011.2158250
[19] Huang, J. and Zhang, T. (2010). The benefit of group sparsity. Ann. Statist. 38 1978-2004. · Zbl 1202.62052 · doi:10.1214/09-AOS778
[20] Ji, S. and Ye, J. (2009). An accelerated gradient method for trace norm minimization. In ICML. Omnipress, Montreal, Canada.
[21] Koltchinskii, V., Lounici, K. and Tsybakov, A. B. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist. 39 2302-2329. · Zbl 1231.62097 · doi:10.1214/11-AOS894
[22] Lee, K. and Bresler, Y. (2009). Guaranteed minimum rank approximation from linear observations by nuclear norm minimization with an ellipsoidal constraint. Technical report, UIUC. Available at .
[23] Loh, P. and Wainwright, M. J. (2012). High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. Ann. Statist. 40 1637-1664. · Zbl 1257.62063
[24] Lounici, K., Pontil, M., Tsybakov, A. B. and van de Geer, S. (2009). Taking advantage of sparsity in multi-task learning. In COLT. Omnipress, Montreal, Canada.
[25] Luo, Z.-Q. and Tseng, P. (1993). Error bounds and convergence analysis of feasible descent methods: A general approach. Ann. Oper. Res. 46/47 157-178. · Zbl 0793.90076 · doi:10.1007/BF02096261
[26] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281
[27] Negahban, S., Ravikumar, P., Wainwright, M. J. and Yu, B. (2009). A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. In NIPS. MIT Press, Vancouver, Canada. · Zbl 1331.62350
[28] Negahban, S. and Wainwright, M. J. (2011). Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Ann. Statist. 39 1069-1097. · Zbl 1216.62090 · doi:10.1214/10-AOS850
[29] Negahban, S. and Wainwright, M. J. (2012). Restricted strong convexity and (weighted) matrix completion: Optimal bounds with noise. J. Mach. Learn. Res. 13 1665-1697. · Zbl 1436.62204
[30] Nesterov, Y. (2004). Introductory Lectures on Convex Optimization : A Basic Course. Applied Optimization 87 . Kluwer Academic, Boston, MA. · Zbl 1086.90045
[31] Nesterov, Y. (2007). Gradient methods for minimizing composite objective function. Technical Report 76, Center for Operations Research and Econometrics (CORE), Catholic Univ. Louvain (UCL).
[32] Ngai, H. V. and Penot, J.-P. (2008). Paraconvex functions and paraconvex sets. Studia Math. 184 1-29. · Zbl 1181.49014 · doi:10.4064/sm184-1-1
[33] Raskutti, G., Wainwright, M. J. and Yu, B. (2010). Restricted eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 11 2241-2259. · Zbl 1242.62071
[34] Raskutti, G., Wainwright, M. J. and Yu, B. (2011). Minimax rates of estimation for high-dimensional linear regression over \(\ell_q\)-balls. IEEE Trans. Inform. Theory 57 6976-6994. · Zbl 1365.62276 · doi:10.1109/TIT.2011.2165799
[35] Recht, B. (2011). A simpler approach to matrix completion. J. Mach. Learn. Res. 12 3413-3430. · Zbl 1280.68141
[36] Recht, B., Fazel, M. and Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52 471-501. · Zbl 1198.90321 · doi:10.1137/070697835
[37] Rohde, A. and Tsybakov, A. B. (2011). Estimation of high-dimensional low-rank matrices. Ann. Statist. 39 887-930. · Zbl 1215.62056 · doi:10.1214/10-AOS860
[38] Rudelson, M. and Zhou, S. (2011). Reconstruction from anisotropic random measurements. Technical report, Univ. Michigan. · Zbl 1364.94158
[39] Srebro, N., Alon, N. and Jaakkola, T. S. (2005). Generalization error bounds for collaborative prediction with low-rank matrices. In NIPS. MIT Press, Vancouver, Canada.
[40] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538
[41] Tropp, J. A. and Gilbert, A. C. (2007). Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inform. Theory 53 4655-4666. · Zbl 1288.94022 · doi:10.1109/TIT.2007.909108
[42] van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360-1392. · Zbl 1327.62425 · doi:10.1214/09-EJS506
[43] Xu, H., Caramanis, C. and Sanghavi, S. (2012). Robust PCA via outlier pursuit. IEEE Trans. Inform. Theory 58 3047-3064. · Zbl 1365.62228 · doi:10.1109/TIT.2011.2173156
[44] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567-1594. · Zbl 1142.62044 · doi:10.1214/07-AOS520
[45] Zhao, P., Rocha, G. and Yu, B. (2009). Grouped and hierarchical model selection through composite absolute penalties. Ann. Statist. 37 3468-3497. · Zbl 1369.62164 · doi:10.1214/07-AOS584
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.