×

The distribution of the Lasso: uniform control over sparse balls and adaptive parameter tuning. (English) Zbl 1480.62145

Summary: The Lasso is a popular regression method for high-dimensional problems in which the number of parameters \({\theta_1},\dots ,{\theta_N}\), is larger than the number \(n\) of samples: \(N > n\). A useful heuristics relates the statistical properties of the Lasso estimator to that of a simple soft-thresholding denoiser, in a denoising problem in which the parameters \({({\theta_i})_{i\le N}}\) are observed in Gaussian noise, with a carefully tuned variance. Earlier work confirmed this picture in the limit \(n,N\to \infty \), pointwise in the parameters \(\theta\) and in the value of the regularization parameter.
Here, we consider a standard random design model and prove exponential concentration of its empirical distribution around the prediction provided by the Gaussian denoising model. Crucially, our results are uniform with respect to \(\theta\) belonging to \({\ell_q}\) balls, \(q\in [0,1]\), and with respect to the regularization parameter. This allows us to derive sharp results for the performances of various data-driven procedures to tune the regularization.
Our proofs make use of Gaussian comparison inequalities, and in particular of a version of Gordon’s minimax theorem developed by Thrampoulidis, Oymak and Hassibi, which controls the optimum value of the Lasso optimization problem. Crucially, we prove a stability property of the minimizer in Wasserstein distance that allows one to characterize properties of the minimizer itself.

MSC:

62J05 Linear regression; mixed models
62J07 Ridge regression; shrinkage estimators (Lasso)
PDFBibTeX XMLCite
Full Text: DOI arXiv Link

References:

[1] Amelunxen, D., Lotz, M., McCoy, M. B. and Tropp, J. A. (2014). Living on the edge: Phase transitions in convex programs with random data. Inf. Inference 3 224-294. · Zbl 1339.90251 · doi:10.1093/imaiai/iau005
[2] Barbier, J., Dia, M., Macris, N. and Krzakala, F. (2016). The mutual information in random linear estimation. In 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton) 625-632. IEEE. · Zbl 1446.94018
[3] Barbier, J., Krzakala, F., Macris, N., Miolane, L. and Zdeborová, L. (2018). Optimal errors and phase transitions in high-dimensional generalized linear models. Proc. Natl. Acad. Sci. USA 116 5451-5460. · Zbl 1416.62421
[4] Bayati, M., Erdogdu, M. A. and Montanari, A. (2013). Estimating lasso risk and noise level. In Advances in Neural Information Processing Systems 944-952.
[5] Bayati, M. and Montanari, A. (2011). The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE Trans. Inf. Theory 57 764-785. · Zbl 1366.94079 · doi:10.1109/TIT.2010.2094817
[6] Bayati, M. and Montanari, A. (2012). The LASSO risk for Gaussian matrices. IEEE Trans. Inf. Theory 58 1997-2017. · Zbl 1365.62196 · doi:10.1109/TIT.2011.2174612
[7] Bellec, P. C. (2020). Out-of-sample error estimate for robust m-estimators with convex penalty. Available at arXiv:2008.11840.
[8] Berthier, R., Montanari, A. and Nguyen, P.-M. (2020). State evolution for approximate message passing with non-separable functions. Inf. Inference 9 33-79. · Zbl 1470.94003 · doi:10.1093/imaiai/iay021
[9] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705-1732. · Zbl 1173.62022 · doi:10.1214/08-AOS620
[10] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer Series in Statistics. Springer, Heidelberg. · Zbl 1273.62015 · doi:10.1007/978-3-642-20192-9
[11] Cai, T. T. and Guo, Z. (2018). Accuracy assessment for high-dimensional linear regression. Ann. Statist. 46 1807-1836. · Zbl 1403.62131 · doi:10.1214/17-AOS1604
[12] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when \(p\) is much larger than \(n\). Ann. Statist. 35 2313-2351. · Zbl 1139.62019 · doi:10.1214/009053606000001523
[13] Candes, E. J. and Tao, T. (2005). Decoding by linear programming. IEEE Trans. Inf. Theory 51 4203-4215. · Zbl 1264.94121 · doi:10.1109/TIT.2005.858979
[14] Celentano, M., Montanari, A. and Wei, Y. (2020). The lasso with general gaussian designs with applications to hypothesis testing. Available at arXiv:2007.13716.
[15] Chen, S. and Donoho, D. L. (1995). Examples of basis pursuit. In Wavelet Applications in Signal and Image Processing III 2569 564-575. International Society for Optics and Photonics.
[16] Chetverikov, D., Liao, Z. and Chernozhukov, V. (2016). On cross-validated lasso. Available at arXiv:1605.02214. · Zbl 1475.62209
[17] Donoho, D. and Montanari, A. (2016). High dimensional robust M-estimation: Asymptotic variance via approximate message passing. Probab. Theory Related Fields 166 935-969. · Zbl 1357.62220 · doi:10.1007/s00440-015-0675-z
[18] Donoho, D. L. (2006). High-dimensional centrally symmetric polytopes with neighborliness proportional to dimension. Discrete Comput. Geom. 35 617-652. · Zbl 1095.52500 · doi:10.1007/s00454-005-1220-0
[19] Donoho, D. L. and Johnstone, I. M. (1994). Minimax risk over \[{l_p}\]-balls for \[{l_q}\]-error. Probab. Theory Related Fields 99 277-303. · Zbl 0802.62006 · doi:10.1007/BF01199026
[20] Donoho, D. L., Maleki, A. and Montanari, A. (2009). Message-passing algorithms for compressed sensing. Proc. Natl. Acad. Sci. USA 106 18914-18919.
[21] Donoho, D. L., Maleki, A. and Montanari, A. (2011). The noise-sensitivity phase transition in compressed sensing. IEEE Trans. Inf. Theory 57 6920-6941. · Zbl 1365.94094 · doi:10.1109/TIT.2011.2165823
[22] Donoho, D. L. and Tanner, J. (2005). Neighborliness of randomly projected simplices in high dimensions. Proc. Natl. Acad. Sci. USA 102 9452-9457. · Zbl 1135.60300 · doi:10.1073/pnas.0502258102
[23] Efron, B. (2004). The estimation of prediction error: Covariance penalties and cross-validation. J. Amer. Statist. Assoc. 99 619-642. · Zbl 1117.62324 · doi:10.1198/016214504000000692
[24] El Karoui, N. (2013). Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators: Rigorous results. Available at arXiv:1311.2445.
[25] El Karoui, N. (2018). On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators. Probab. Theory Related Fields 170 95-175. · Zbl 1407.62060 · doi:10.1007/s00440-016-0754-9
[26] Gordon, Y. (1985). Some inequalities for Gaussian processes and applications. Israel J. Math. 50 265-289. · Zbl 0663.60034 · doi:10.1007/BF02759761
[27] Gordon, Y. (1988). On Milman’s inequality and random subspaces which escape through a mesh in \[{\mathbf{R}^n} \]. In Geometric Aspects of Functional Analysis (1986/87). Lecture Notes in Math. 1317 84-106. Springer, Berlin. · Zbl 0651.46021 · doi:10.1007/BFb0081737
[28] Javanmard, A. and Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression. J. Mach. Learn. Res. 15 2869-2909. · Zbl 1319.62145
[29] Javanmard, A. and Montanari, A. (2014). Hypothesis testing in high-dimensional regression under the Gaussian random design model: Asymptotic theory. IEEE Trans. Inf. Theory 60 6522-6554. · Zbl 1360.62074 · doi:10.1109/TIT.2014.2343629
[30] Johnstone, I. M. (2002). Function estimation and gaussian sequence models. Unpublished manuscript, 2(5.3):2.
[31] Ma, J., Xu, J. and Maleki, A. (2019). Optimization-based AMP for phase retrieval: The impact of initialization and \[{\ell_2}\] regularization. IEEE Trans. Inf. Theory 65 3600-3629. · Zbl 1432.94032 · doi:10.1109/TIT.2019.2893254
[32] Miolane, L. and Montanari, A. (2021). Supplement to “The distribution of the Lasso: Uniform control over sparse balls and adaptive parameter tuning.” https://doi.org/10.1214/20-AOS2038SUPP
[33] Montanari, A. and Richard, E. (2016). Non-negative principal component analysis: Message passing algorithms and sharp asymptotics. IEEE Trans. Inf. Theory 62 1458-1484. · Zbl 1359.62224 · doi:10.1109/TIT.2015.2457942
[34] Mousavi, A., Maleki, A. and Baraniuk, R. G. (2017). Consistent parameter estimation for LASSO and approximate message passing. Ann. Statist. 45 2427-2454. · Zbl 1390.62123 · doi:10.1214/16-AOS1529
[35] Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of \(M\)-estimators with decomposable regularizers. Statist. Sci. 27 538-557. · Zbl 1331.62350 · doi:10.1214/12-STS400
[36] Panahi, A. and Hassibi, B. (2017). A universal analysis of large-scale regularized least squares solutions. In Advances in Neural Information Processing Systems 3381-3390.
[37] Rangan, S. (2011). Generalized approximate message passing for estimation with random linear mixing. In Information Theory Proceedings (ISIT), 2011 IEEE International Symposium on 2168-2172. IEEE, New York.
[38] Raskutti, G., Wainwright, M. J. and Yu, B. (2011). Minimax rates of estimation for high-dimensional linear regression over \[{\ell_q}\]-balls. IEEE Trans. Inf. Theory 57 6976-6994. · Zbl 1365.62276 · doi:10.1109/TIT.2011.2165799
[39] Reeves, G. and Pfister, H. D. (2016). The replica-symmetric prediction for compressed sensing with Gaussian matrices is exact. In Information Theory (ISIT), 2016 IEEE International Symposium on 665-669. IEEE, New York.
[40] Schniter, P. and Rangan, S. (2015). Compressive phase retrieval via generalized approximate message passing. IEEE Trans. Signal Process. 63 1043-1055. · Zbl 1394.94506 · doi:10.1109/TSP.2014.2386294
[41] Stein, C. M. (1981). Estimation of the mean of a multivariate normal distribution. Ann. Statist. 9 1135-1151. · Zbl 0476.62035
[42] Stojnic, M. (2010). Recovery thresholds for \[{\ell_1}\] optimization in binary compressed sensing. In Information Theory Proceedings (ISIT), 2010 IEEE International Symposium on 1593-1597. IEEE, New York.
[43] Stojnic, M. (2013). A framework to characterize performance of lasso algorithms. Preprint. Available at arXiv:1303.7291.
[44] Sur, P. and Candès, E. J. (2019). A modern maximum-likelihood theory for high-dimensional logistic regression. Proc. Natl. Acad. Sci. USA 116 14516-14525. · Zbl 1431.62084 · doi:10.1073/pnas.1810420116
[45] Takahashi, T. and Kabashima, Y. (2018). A statistical mechanics approach to de-biasing and uncertainty estimation in LASSO for random measurements. J. Stat. Mech. Theory Exp. 7 073405, 25. · Zbl 1456.62151 · doi:10.1088/1742-5468/aace2e
[46] Thrampoulidis, C., Abbasi, E. and Hassibi, B. (2018). Precise error analysis of regularized \(M\)-estimators in high dimensions. IEEE Trans. Inf. Theory 64 5592-5628. · Zbl 1401.94051 · doi:10.1109/TIT.2018.2840720
[47] Thrampoulidis, C., Oymak, S. and Hassibi, B. (2015). Regularized linear regression: A precise analysis of the estimation error. In Conference on Learning Theory 1683-1709.
[48] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538
[49] Tibshirani, R. J. and Taylor, J. (2012). Degrees of freedom in Lasso problems. Ann. Statist. 40 1198-1232. · Zbl 1274.62469 · doi:10.1214/12-AOS1003
[50] Tropp, J. A. (2015). Convex recovery of a structured signal from independent random linear measurements. In Sampling Theory, a Renaissance. Appl. Numer. Harmon. Anal. 67-101. Birkhäuser/Springer, Cham. · Zbl 1358.94034
[51] van de Geer, S., Bühlmann, P., Ritov, Y. and Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Statist. 42 1166-1202. · Zbl 1305.62259 · doi:10.1214/14-AOS1221
[52] van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360-1392. · Zbl 1327.62425 · doi:10.1214/09-EJS506
[53] Villani, C. (2009). Optimal Transport: Old and New. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 338. Springer, Berlin. · Zbl 1156.53003 · doi:10.1007/978-3-540-71050-9
[54] Zhang, C.-H. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 217-242 · Zbl 1411.62196 · doi:10.1111/rssb.12026
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.