×

A unified framework for high-dimensional analysis of \(M\)-estimators with decomposable regularizers. (English) Zbl 1331.62350

Summary: High-dimensional statistical inference deals with models in which the the number of parameters \(p\) is comparable to or larger than the sample size \(n\). Since it is usually impossible to obtain consistent procedures unless \(p/n\to 0\), a line of recent work has studied models with various types of low-dimensional structure, including sparse vectors, sparse and structured matrices, low-rank matrices and combinations thereof. In such settings, a general approach to estimation is to solve a regularized optimization problem, which combines a loss function measuring how well the model fits the data with some regularization function that encourages the assumed structure. This paper provides a unified framework for establishing consistency and convergence rates for such regularized \(M\)-estimators under high-dimensional scaling. We state one main theorem and show how it can be used to re-derive some existing results, and also to obtain a number of new results on consistency and convergence rates, in both \(\ell_{2}\)-error and related norms. Our analysis also identifies two key properties of loss and regularization functions, referred to as restricted strong convexity and decomposability, that ensure corresponding regularized \(M\)-estimators have fast convergence rates and which are optimal in many well-studied cases.

MSC:

62J07 Ridge regression; shrinkage estimators (Lasso)

Software:

hgam; PDCO
PDF BibTeX XML Cite
Full Text: DOI arXiv Euclid

References:

[1] Agarwal, A., Negahban, S. and Wainwright, M. J. (2011). Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions. Ann. Statist. 40 1171-1197. · Zbl 1216.62090
[2] Bach, F. (2010). Self-concordant analysis for logistic regression. Electron. J. Stat. 4 384-414. · Zbl 1329.62324
[3] Bach, F. R. (2008). Consistency of the group lasso and multiple kernel learning. J. Mach. Learn. Res. 9 1179-1225. · Zbl 1225.68147
[4] Bach, F. R. (2008). Consistency of trace norm minimization. J. Mach. Learn. Res. 9 1019-1048. · Zbl 1225.68146
[5] Baraniuk, R. G., Cevher, V., Duarte, M. F. and Hegde, C. (2008). Model-based compressive sensing. Technical report, Rice Univ. Available at . arXiv:0808.3572 · Zbl 1366.94215
[6] Bickel, P. J., Brown, J. B., Huang, H. and Li, Q. (2009). An overview of recent developments in genomics and associated statistical methods. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 367 4313-4337. · Zbl 1185.62184
[7] Bickel, P. J. and Levina, E. (2008). Covariance regularization by thresholding. Ann. Statist. 36 2577-2604. · Zbl 1196.62062
[8] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705-1732. · Zbl 1173.62022
[9] Bunea, F. (2008). Honest variable selection in linear and logistic regression models via \(l_{1}\) and \(l_{1}+l_{2}\) penalization. Electron. J. Stat. 2 1153-1194. · Zbl 1320.62170
[10] Bunea, F., She, Y. and Wegkamp, M. (2010). Adaptive rank penalized estimators in multivariate regression. Technical report, Florida State. Available at . arXiv:1004.2995
[11] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169-194. · Zbl 1146.62028
[12] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674-1697. · Zbl 1209.62065
[13] Cai, T. and Zhou, H. (2010). Optimal rates of convergence for sparse covariance matrix estimation. Technical report, Wharton School of Business, Univ. Pennsylvania. Available at . · Zbl 1202.62073
[14] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when \(p\) is much larger than \(n\). Ann. Statist. 35 2313-2351. · Zbl 1139.62019
[15] Candès, E. J. and Recht, B. (2009). Exact matrix completion via convex optimization. Found. Comput. Math. 9 717-772. · Zbl 1219.90124
[16] Candes, E. J. and Tao, T. (2005). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203-4215. · Zbl 1264.94121
[17] Candes, E. J., X. Li, Y. M. and Wright, J. (2010). Stable principal component pursuit. In IEEE International Symposium on Information Theory , Austin , TX .
[18] Chandrasekaran, V., Sanghavi, S., Parrilo, P. A. and Willsky, A. S. (2011). Rank-sparsity incoherence for matrix decomposition. SIAM J. Optimiz. 21 572-596. · Zbl 1226.90067
[19] Chen, S. S., Donoho, D. L. and Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33-61. · Zbl 0919.94002
[20] Donoho, D. L. (2006). Compressed sensing. IEEE Trans. Inform. Theory 52 1289-1306. · Zbl 1288.94016
[21] Donoho, D. L. and Tanner, J. (2005). Neighborliness of randomly projected simplices in high dimensions. Proc. Natl. Acad. Sci. USA 102 9452-9457 (electronic). · Zbl 1135.60300
[22] El Karoui, N. (2008). Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Statist. 36 2717-2756. · Zbl 1196.62064
[23] Fazel, M. (2002). Matrix rank minimization with applications. Ph.D. thesis, Stanford. Available at .
[24] Girko, V. L. (1995). Statistical Analysis of Observations of Increasing Dimension. Theory and Decision Library. Series B : Mathematical and Statistical Methods 28 . Kluwer Academic, Dordrecht. Translated from the Russian. · Zbl 0867.62047
[25] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971-988. · Zbl 1055.62078
[26] Hsu, D., Kakade, S. M. and Zhang, T. (2011). Robust matrix decomposition with sparse corruptions. IEEE Trans. Inform. Theory 57 7221-7234. · Zbl 1365.15018
[27] Huang, J. and Zhang, T. (2010). The benefit of group sparsity. Ann. Statist. 38 1978-2004. · Zbl 1202.62052
[28] Jacob, L., Obozinski, G. and Vert, J. P. (2009). Group Lasso with overlap and graph Lasso. In International Conference on Machine Learning ( ICML ) 433-440, Haifa , Israel .
[29] Jenatton, R., Mairal, J., Obozinski, G. and Bach, F. (2011). Proximal methods for hierarchical sparse coding. J. Mach. Learn. Res. 12 2297-2334. · Zbl 1280.94029
[30] Kakade, S. M., Shamir, O., Sridharan, K. and Tewari, A. (2010). Learning exponential families in high-dimensions: Strong convexity and sparsity. In AISTATS , Sardinia , Italy .
[31] Keshavan, R. H., Montanari, A. and Oh, S. (2010). Matrix completion from noisy entries. J. Mach. Learn. Res. 11 2057-2078. · Zbl 1242.62069
[32] Kim, Y., Kim, J. and Kim, Y. (2006). Blockwise sparse regression. Statist. Sinica 16 375-390. · Zbl 1096.62076
[33] Koltchinskii, V. and Yuan, M. (2008). Sparse recovery in large ensembles of kernel machines. In Proceedings of COLT , Helsinki , Finland .
[34] Koltchinskii, V. and Yuan, M. (2010). Sparsity in multiple kernel learning. Ann. Statist. 38 3660-3695. · Zbl 1204.62086
[35] Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 37 4254-4278. · Zbl 1191.62101
[36] Landgrebe, D. (2008). Hyperspectral image data analsysis as a high-dimensional signal processing problem. IEEE Signal Processing Magazine 19 17-28.
[37] Lee, K. and Bresler, Y. (2009). Guaranteed minimum rank approximation from linear observations by nuclear norm minimization with an ellipsoidal constraint. Technical report, UIUC. Available at . arXiv:0903.4742
[38] Liu, Z. and Vandenberghe, L. (2009). Interior-point method for nuclear norm approximation with application to system identification. SIAM J. Matrix Anal. Appl. 31 1235-1256. · Zbl 1201.90151
[39] Lounici, K., Pontil, M., Tsybakov, A. B. and van de Geer, S. (2009). Taking advantage of sparsity in multi-task learning. Technical report, ETH Zurich. Available at . arXiv:0903.1468 · Zbl 1306.62156
[40] Lustig, M., Donoho, D., Santos, J. and Pauly, J. (2008). Compressed sensing MRI. IEEE Signal Processing Magazine 27 72-82.
[41] McCoy, M. and Tropp, J. (2011). Two proposals for robust PCA using semidefinite programming. Electron. J. Stat. 5 1123-1160. · Zbl 1329.62276
[42] Mehta, M. L. (1991). Random Matrices , 2nd ed. Academic Press, Boston, MA. · Zbl 0780.60014
[43] Meier, L., van de Geer, S. and Bühlmann, P. (2009). High-dimensional additive modeling. Ann. Statist. 37 3779-3821. · Zbl 1360.62186
[44] Meinshausen, N. (2008). A note on the Lasso for Gaussian graphical model selection. Statist. Probab. Lett. 78 880-884. · Zbl 1144.62326
[45] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082
[46] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246-270. · Zbl 1155.62050
[47] Nardi, Y. and Rinaldo, A. (2008). On the asymptotic properties of the group lasso estimator for linear models. Electron. J. Stat. 2 605-633. · Zbl 1320.62167
[48] Negahban, S., Ravikumar, P., Wainwright, M. J. and Yu, B. (2009). A unified framework for high-dimensional analysis of \(M\)-estimators with decomposable regularizers. In NIPS Conference , Vancouver , Canada . · Zbl 1331.62350
[49] Negahban, S., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). Supplement to “A unified framework for high-dimensional analysis of \(M\)-estimators with decomposable regularizers.” . · Zbl 1331.62350
[50] Negahban, S. and Wainwright, M. J. (2011). Simultaneous support recovery in high-dimensional regression: Benefits and perils of \(\ell _{1,\infty}\)-regularization. IEEE Trans. Inform. Theory 57 3481-3863. · Zbl 1365.62274
[51] Negahban, S. and Wainwright, M. J. (2011). Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Ann. Statist. 39 1069-1097. · Zbl 1216.62090
[52] Negahban, S. and Wainwright, M. J. (2012). Restricted strong convexity and (weighted) matrix completion: Optimal bounds with noise. J. Mach. Learn. Res. 13 1665-1697. · Zbl 1436.62204
[53] Obozinski, G., Wainwright, M. J. and Jordan, M. I. (2011). Support union recovery in high-dimensional multivariate regression. Ann. Statist. 39 1-47. · Zbl 1373.62372
[54] Pastur, L. A. (1972). The spectrum of random matrices. Teoret. Mat. Fiz. 10 102-112.
[55] Raskutti, G., Wainwright, M. J. and Yu, B. (2010). Restricted eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 11 2241-2259. · Zbl 1242.62071
[56] Raskutti, G., Wainwright, M. J. and Yu, B. (2011). Minimax rates of estimation for high-dimensional linear regression over \(\ell_{q}\)-balls. IEEE Trans. Inform. Theory 57 6976-6994. · Zbl 1365.62276
[57] Raskutti, G., Wainwright, M. J. and Yu, B. (2012). Minimax-optimal rates for sparse additive models over kernel classes via convex programming. J. Mach. Learn. Res. 13 389-427. · Zbl 1283.62071
[58] Ravikumar, P., Lafferty, J., Liu, H. and Wasserman, L. (2009). Sparse additive models. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 1009-1030.
[59] Ravikumar, P., Wainwright, M. J. and Lafferty, J. D. (2010). High-dimensional Ising model selection using \(\ell_{1}\)-regularized logistic regression. Ann. Statist. 38 1287-1319. · Zbl 1189.62115
[60] Ravikumar, P., Wainwright, M. J., Raskutti, G. and Yu, B. (2011). High-dimensional covariance estimation by minimizing \(\ell_{1}\)-penalized log-determinant divergence. Electron. J. Stat. 5 935-980. · Zbl 1274.62190
[61] Recht, B. (2011). A simpler approach to matrix completion. J. Mach. Learn. Res. 12 3413-3430. · Zbl 1280.68141
[62] Recht, B., Fazel, M. and Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52 471-501. · Zbl 1198.90321
[63] Rohde, A. and Tsybakov, A. B. (2011). Estimation of high-dimensional low-rank matrices. Ann. Statist. 39 887-930. · Zbl 1215.62056
[64] Rothman, A. J., Bickel, P. J., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Stat. 2 494-515. · Zbl 1320.62135
[65] Rudelson, M. and Zhou, S. (2011). Reconstruction from anisotropic random measurements. Technical report, Univ. Michigan. · Zbl 1364.94158
[66] Stojnic, M., Parvaresh, F. and Hassibi, B. (2009). On the reconstruction of block-sparse signals with an optimal number of measurements. IEEE Trans. Signal Process. 57 3075-3085. · Zbl 1391.94402
[67] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267-288. · Zbl 0850.62538
[68] Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 91-108. · Zbl 1060.62049
[69] Tropp, J. A., Gilbert, A. C. and Strauss, M. J. (2006). Algorithms for simultaneous sparse approximation. Signal Process. 86 572-602. Special issue on “Sparse approximations in signal and image processing”. · Zbl 1163.94396
[70] Turlach, B. A., Venables, W. N. and Wright, S. J. (2005). Simultaneous variable selection. Technometrics 47 349-363.
[71] van de Geer, S. A. (2008). High-dimensional generalized linear models and the lasso. Ann. Statist. 36 614-645. · Zbl 1138.62323
[72] van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360-1392. · Zbl 1327.62425
[73] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using \(\ell_{1}\)-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183-2202. · Zbl 1367.62220
[74] Wainwright, M. J. (2009). Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting. IEEE Trans. Inform. Theory 55 5728-5741. · Zbl 1367.94106
[75] Wigner, E. P. (1955). Characteristic vectors of bordered matrices with infinite dimensions. Ann. of Math. (2) 62 548-564. · Zbl 0067.08403
[76] Xu, H., Caramanis, C. and Sanghavi, S. (2012). Robust PCA via outlier pursuit. IEEE Trans. Inform. Theory 58 3047-3064. · Zbl 1365.62228
[77] Yuan, M., Ekici, A., Lu, Z. and Monteiro, R. (2007). Dimension reduction and coefficient estimation in multivariate linear regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 329-346.
[78] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49-67. · Zbl 1141.62030
[79] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567-1594. · Zbl 1142.62044
[80] Zhao, P., Rocha, G. and Yu, B. (2009). The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Statist. 37 3468-3497. · Zbl 1369.62164
[81] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541-2563. · Zbl 1222.62008
[82] Zhou, S., Lafferty, J. and Wasserman, L. (2008). Time-varying undirected graphs. In 21 st Annual Conference on Learning Theory ( COLT ), Helsinki , Finland .
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.