×

Slope meets Lasso: improved oracle bounds and optimality. (English) Zbl 1405.62056

Summary: We show that two polynomial time methods, a Lasso estimator with adaptively chosen tuning parameter and a Slope estimator, adaptively achieve the minimax prediction and \(\ell_{2}\) estimation rate \((s/n)\log(p/s)\) in high-dimensional linear regression on the class of \(s\)-sparse vectors in \(\mathbb{R}^{p}\). This is done under the restricted eigenvalue (RE) condition for the Lasso and under a slightly more constraining assumption on the design for the Slope. The main results have the form of sharp oracle inequalities accounting for the model misspecification error. The minimax optimal bounds are also obtained for the \(\ell_{q}\) estimation errors with \(1\leq q\leq2\) when the model is well specified. The results are nonasymptotic, and hold both in probability and in expectation. The assumptions that we impose on the design are satisfied with high probability for a large class of random matrices with independent and possibly anisotropically distributed rows. We give a comparative analysis of conditions, under which oracle bounds for the Lasso and Slope estimators can be obtained. In particular, we show that several known conditions, such as the RE condition and the sparse eigenvalue condition are equivalent if the \(\ell_{2}\)-norms of regressors are uniformly bounded.

MSC:

62H12 Estimation in multivariate analysis
62J05 Linear regression; mixed models
62G08 Nonparametric regression and quantile regression
62C20 Minimax procedures in statistical decision theory
62G05 Nonparametric estimation

References:

[1] Abramovich, F. and Grinshtein, V. (2010). MAP model selection in Gaussian regression. Electron. J. Stat.4 932–949. · Zbl 1329.62051 · doi:10.1214/10-EJS573
[2] Abramowitz, M. and Stegun, I. A. (1964). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. National Bureau of Standards Applied Mathematics Series55. Washington, DC. · Zbl 0171.38503
[3] Berthet, Q. and Rigollet, P. (2013). Optimal detection of sparse principal components in high dimension. Ann. Statist.41 1780–1815. · Zbl 1277.62155 · doi:10.1214/13-AOS1127
[4] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist.37 1705–1732. · Zbl 1173.62022 · doi:10.1214/08-AOS620
[5] Bogdan, M., van den Berg, E., Sabatti, C., Su, W. and Candès, E. J. (2015). SLOPE—Adaptive variable selection via convex optimization. Ann. Appl. Stat.9 1103–1140. · Zbl 1454.62212 · doi:10.1214/15-AOAS842
[6] Boucheron, S., Lugosi, G. and Massart, P. (2013). Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford Univ. Press, London. · Zbl 1279.60005
[7] Candès, E. J. and Davenport, M. A. (2013). How well can we estimate a sparse vector? Appl. Comput. Harmon. Anal.34 317–323. · Zbl 1315.94019 · doi:10.1016/j.acha.2012.08.010
[8] Candes, E. J. and Tao, T. (2006). Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inform. Theory52 5406–5425. · Zbl 1309.94033 · doi:10.1109/TIT.2006.885507
[9] Chafaï, D., Guédon, O., Lecué, G. and Pajor, A. (2012). Interactions Between Compressed Sensing Random Matrices and High Dimensional Geometry. Panoramas et Synthèses37. Société Mathématique de France, Paris.
[10] Dalalyan, A. S., Hebiri, M., Lederer, J. et al. (2017). On the prediction performance of the Lasso. Bernoulli23 552–581. · Zbl 1359.62295 · doi:10.3150/15-BEJ756
[11] Dirksen, S. (2015). Tail bounds via generic chaining. Electron. J. Probab.20 no. 53, 29. · Zbl 1327.60048 · doi:10.1214/EJP.v20-3760
[12] Giraud, C. (2015). Introduction to High-Dimensional Statistics. Monographs on Statistics and Applied Probability.139. CRC Press, Boca Raton, FL. · Zbl 1341.62011
[13] Koltchinskii, V. (2011). Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Ecole D’Ete de Probabilites de Saint-Flour XXXVIII-2008. Springer, New York. · Zbl 1223.91002
[14] Koltchinskii, V., Lounici, K. and Tsybakov, A. B. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist.39 2302–2329. · Zbl 1231.62097 · doi:10.1214/11-AOS894
[15] Koltchinskii, V. and Mendelson, S. (2015). Bounding the smallest singular value of a random matrix without concentration. Int. Math. Res. Not. IMRN23 12991–13008. · Zbl 1331.15027
[16] Lecué, G. and Mendelson, S. (2017). Sparse recovery under weak moment assumptions. J. Eur. Math. Soc. (JEMS) 19 881–904. · Zbl 1414.62135 · doi:10.4171/JEMS/682
[17] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes. Ergebnisse der Mathematik und Ihrer Grenzgebiete (3) 23. Springer, Berlin. · Zbl 0748.60004
[18] Lounici, K., Pontil, M., Tsybakov, A. B. and van de Geer, S. A. (2011). Oracle inequalities and optimal inference under group sparsity. Ann. Statist.39 2164–2204. · Zbl 1306.62156 · doi:10.1214/11-AOS896
[19] Mendelson, S. (2014). Learning without concentration. In Proceedings of the 27th Annual Conference on Learning Theory COLT14 25–39. · Zbl 1333.68232 · doi:10.1145/2699439
[20] Mendelson, S. (2016). Upper bounds on product and multiplier empirical processes. Stochastic Process. Appl.126 3652–3680. · Zbl 1386.60077 · doi:10.1016/j.spa.2016.04.019
[21] Mendelson, S. (2015). Learning without concentration. J. ACM62 Art. 21, 25. · Zbl 1333.68232 · doi:10.1145/2699439
[22] Raskutti, G., Wainwright, M. J. and Yu, B. (2011). Minimax rates of estimation for high-dimensional linear regression over \(ℓ_{q}\)-balls. IEEE Trans. Inform. Theory57 6976–6994. · Zbl 1365.62276 · doi:10.1109/TIT.2011.2165799
[23] Rigollet, P. and Tsybakov, A. (2011). Exponential screening and optimal rates of sparse estimation. Ann. Statist.39 731–771. · Zbl 1215.62043 · doi:10.1214/10-AOS854
[24] Su, W. and Candès, E. (2016). SLOPE is adaptive to unknown sparsity and asymptotically minimax. Ann. Statist.44 1038–1068. · Zbl 1338.62032 · doi:10.1214/15-AOS1397
[25] Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer, New York. · Zbl 1176.62032
[26] Verzelen, N. (2012). Minimax risks for sparse regressions: Ultra-high dimensional phenomenons. Electron. J. Stat.6 38–90. · Zbl 1334.62120 · doi:10.1214/12-EJS666
[27] Witold, B. (2013). Concentration via chaining method and its applications. Technical report, Univ. Warsaw. Available at arXiv:1405.0676.
[28] Ye, F. and Zhang, C.-H. (2010). Rate minimaxity of the Lasso and Dantzig selector for the \(ℓ_{q}\) loss in \(ℓ_{r}\) balls. J. Mach. Learn. Res.11 3519–3540. · Zbl 1242.62074
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.