×

The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso). (English) Zbl 1274.62471

Summary: We revisit the adaptive Lasso as well as the thresholded Lasso with refitting, in a high-dimensional linear model, and study prediction error, \(\ell _{q}\)-error (\(q\in \{1,2\}\)), and number of false positive selections. Our theoretical results for the two methods are, at a rather fine scale, comparable. The differences only show up in terms of the (minimal) restricted and sparse eigenvalues, favoring thresholding over the adaptive Lasso. As regards prediction and estimation, the difference is virtually negligible, but our bound for the number of false positives is larger for the adaptive Lasso than for thresholding. We also study the adaptive Lasso under beta-min conditions, which are conditions on the size of the coefficients. We show that for exact variable selection, the adaptive Lasso generally needs more severe beta-min conditions than thresholding. Both the two-stage methods add value to the one-stage Lasso in the sense that, under appropriate restricted and sparse eigenvalue conditions, they have similar prediction and estimation error as the one-stage Lasso but substantially less false positives. Regarding the latter, we provide a lower bound for the Lasso with respect to false positive selections.

MSC:

62J07 Ridge regression; shrinkage estimators (Lasso)
62G08 Nonparametric regression and quantile regression

Software:

glmnet
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] S. Arlot and A. Celisse. A survey of cross-validation procedures for model selection., Statistics Surveys , 4:40-79, 2010. · Zbl 1190.62080 · doi:10.1214/09-SS054
[2] A. Barron, L. Birge, and P. Massart. Risk bounds for model selection via penalization., Probability Theory and Related Fields , 113:301-413, 1999. · Zbl 0946.62036 · doi:10.1007/s004400050210
[3] D. Bertsimas and J. Tsitsiklis., Introduction to linear optimization . Athena Scientific Belmont, MA, 1997.
[4] P. Bickel, Y. Ritov, and A. Tsybakov. Simultaneous analysis of Lasso and Dantzig selector., Annals of Statistics , 37 :1705-1732, 2009. · Zbl 1173.62022 · doi:10.1214/08-AOS620
[5] P. Bühlmann and S. van de Geer., Statistics for High-Dimensional Data: Methods, Theory and Applications . Springer, 2011. · Zbl 1273.62015
[6] F. Bunea, A.B. Tsybakov, and M.H. Wegkamp. Aggregation and sparsity via, \ell 1 -penalized least squares. In Proceedings of 19th Annual Conference on Learning Theory, COLT 2006. Lecture Notes in Artificial Intelligence 4005 , pages 379-391, Heidelberg, 2006. Springer Verlag. · Zbl 1143.62319 · doi:10.1007/11776420_29
[7] F. Bunea, A.B. Tsybakov, and M.H. Wegkamp. Aggregation for Gaussian regression., Annals of Statistics , 35 :1674-1697, 2007a. · Zbl 1209.62065 · doi:10.1214/009053606000001587
[8] F. Bunea, A. Tsybakov, and M.H. Wegkamp. Sparsity oracle inequalities for the Lasso., Electronic Journal of Statistics , 1:169-194, 2007b. · Zbl 1146.62028 · doi:10.1214/07-EJS008
[9] E. Candès and Y. Plan. Near-ideal model selection by, \ell 1 minimization. Annals of Statistics , 37 :2145-2177, 2009. · Zbl 1173.62053 · doi:10.1214/08-AOS653
[10] E. Candès and T. Tao. Decoding by linear programming., IEEE Transactions on Information Theory , 51 :4203-4215, 2005. · Zbl 1264.94121 · doi:10.1109/TIT.2005.858979
[11] E. Candès and T. Tao. The Dantzig selector: statistical estimation when p is much larger than n., Annals of Statistics , 35 :2313-2351, 2007. · Zbl 1139.62019 · doi:10.1214/009053606000001523
[12] E.J. Candès, J.K. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements., Communications on Pure and Applied Mathematics , 59 :1207-1223, 2006. · Zbl 1098.94009 · doi:10.1002/cpa.20124
[13] EJ Candès, M. Wakin, and S. Boyd. Enhancing sparsity by reweighted 11 minimization., J. Fourier Anal. Appl , 14:877-905, 2008. · Zbl 1176.94014 · doi:10.1007/s00041-008-9045-x
[14] L. De Haan and A. Ferreira., Extreme Value theory: an Introduction . Springer Verlag, 2006. ISBN 0387239464. · Zbl 1101.62002
[15] J. Friedman, T. Hastie, and R. Tibshirani. Regularized paths for generalized linear models via coordinate descent., Journal of Statistical Software , 33, 2010.
[16] E. Greenshtein and Y. Ritov. Persistency in high dimensional linear predictor-selection and the virtue of over-parametrization., Bernoulli , 10:971-988, 2004. · Zbl 1055.62078 · doi:10.3150/bj/1106314846
[17] J. Huang, S. Ma, and C.-H. Zhang. Adaptive Lasso for sparse high-dimensional regression models., Statistica Sinica , 18 :1603-1618, 2008. · Zbl 1255.62198
[18] V. Koltchinskii. Sparsity in penalized empirical risk minimization., Annales de l’Institut Henri Poincaré, Probabilités et Statistiques , 45:7-57, 2009a. · Zbl 1168.62044 · doi:10.1214/07-AIHP146
[19] V. Koltchinskii. The Dantzig selector and sparsity oracle inequalities., Bernoulli , 15:799-828, 2009b. · Zbl 1452.62486 · doi:10.3150/09-BEJ187
[20] K. Lounici. Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators., Electronic Journal of Statistics , 2:90-102, 2008. · Zbl 1306.62155 · doi:10.1214/08-EJS177
[21] L. Meier, S. van de Geer, and P. Bühlmann. The group Lasso for logistic regression., Journal of the Royal Statistical Society Series B , 70:53-71, 2008. · Zbl 1400.62276 · doi:10.1111/j.1467-9868.2007.00627.x
[22] N. Meinshausen. Relaxed Lasso., Computational Statistics and Data Analysis , 52:374-393, 2007.
[23] N. Meinshausen and P. Bühlmann. High dimensional graphs and variable selection with the Lasso., Annals of Statistics , 34 :1436-1462, 2006. · Zbl 1113.62082 · doi:10.1214/009053606000000281
[24] N. Meinshausen and B. Yu. Lasso-type recovery of sparse representations for high-dimensional data., Annals of Statistics , 37:246-270, 2009. · Zbl 1155.62050 · doi:10.1214/07-AOS582
[25] R. Tibshirani. Regression shrinkage and selection via the Lasso., Journal of the Royal Statistical Society Series B , 58:267-288, 1996. · Zbl 0850.62538
[26] S. van de Geer. High-dimensional generalized linear models and the Lasso., Annals of Statistics , 36:614-645, 2008. · Zbl 1138.62323 · doi:10.1214/009053607000000929
[27] S. van de Geer. On non-asymptotic bounds for estimation in generalized linear models with highly correlated design. In, Asymptotics: Particles, Processes and Inverse Problems (E.A. Cator, G. Jongbloed, C. Kraaikamp, H.P. Lopuhaä, J.A. Wellner eds.) , volume 55, pages 121-134. IMS Lecture Notes Monograph Series, 2007. · Zbl 1176.62071 · doi:10.1214/074921707000000319
[28] S. van de Geer. Least squares estimation with complexity penalties., Mathematical Methods of Statistics , pages 355-374, 2001. · Zbl 1005.62043
[29] S. van de Geer and P. Bühlmann. On the conditions used to prove oracle results for the Lasso., Electronic Journal of Statistics , pages 1360-1392, 2009. · Zbl 1327.62425 · doi:10.1214/09-EJS506
[30] M. Wainwright. Information-theoretic limitations on sparsity recovery in the high-dimensional and noisy setting., IEEE Transactions on Information Theory , 55 :5728-5741, 2007. · Zbl 1367.94106 · doi:10.1109/TIT.2009.2032816
[31] M. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using, \ell 1 -constrained quadratic programming (Lasso). IEEE Transactions on Information Theory , 55 :2183-2202, 2009. · Zbl 1367.62220 · doi:10.1109/TIT.2009.2016018
[32] L. Wasserman and K. Roeder. High dimensional variable selection., Annals of Statistics , 37 :2178-2201, 2009. · Zbl 1173.62054 · doi:10.1214/08-AOS646
[33] C.H. Zhang. Nearly unbiased variable selection under minimax concave penalty., Annals of Statistics , 38(2):894-942, 2010. · Zbl 1183.62120 · doi:10.1214/09-AOS729
[34] C.H. Zhang and J. Huang. The sparsity and bias of the Lasso selection in high-dimensional linear regression., Annals of Statistics , 36(4) :1567-1594, 2008. · Zbl 1142.62044 · doi:10.1214/07-AOS520
[35] T. Zhang. Some sharp performance bounds for least squares regression with, \ell 1 regularization. Annals of Statistics , 37 :2109-2144, 2009. · Zbl 1173.62029 · doi:10.1214/08-AOS659
[36] P. Zhao and B. Yu. On model selection consistency of Lasso., Journal of Machine Learning Research , 7 :2541-2567, 2006. · Zbl 1222.62008
[37] S. Zhou. Thresholding procedures for high dimensional variable selection and statistical estimation. In, Advances in Neural Information Processing Systems 22 . MIT Press, 2009.
[38] S. Zhou. Thresholded lasso for high dimensional variable selection and statistical estimation, 2010. NIPS, 2009).
[39] H. Zou. The adaptive Lasso and its oracle properties., Journal of the American Statistical Association , 101 :1418-1429, 2006. · Zbl 1171.62326 · doi:10.1198/016214506000000735
[40] H. Zou and R. Li. One-step sparse estimates in nonconcave penalized likelihood models (with discussion)., Annals of Statistics , 36 :1509-1566, 2008. · Zbl 1282.62112 · doi:10.1214/009053607000000802
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.