zbMATH — the first resource for mathematics

Which bridge estimator is the best for variable selection? (English) Zbl 1456.62147
From the abstract: “We study the problem of variable selection for linear models under the high-dimensional asymptotic setting, where the number of observations \(n\) grows at the same rate as the number of predictors \(p\). We consider two-stage variable selection techniques (TVS) in which the first stage uses bridge estimators to obtain an estimate of the regression coefficients, and the second stage simply thresholds this estimate to select the “important” predictors. The asymptotic false discovery proportion AFDP and true positive proportion (ATPP) of these TVS are evaluated. We prove that for a fixed ATPP, in order to obtain a smaller AFDP, one should pick a bridge estimator with smaller asymptotic mean square error in the first stage of TVS. Based on such principled discovery, we present a sharp comparison of different TVS, via an in-depth investigation of the estimation properties of bridge estimators. Rather than “orderwise” error bounds with loose constants, our analysis focuses on precise error characterization. Various interesting signal-to-noise ratio and sparsity settings are studied. Our results offer new and thorough in-sights into high-dimensional variable selection.”
For instance, the variable selection of Lasso can be improved by debiasing and thresholding; a TVS with ridge in its first stage outperforms TVS with other bridge estimators for large values of noise; the optimality of two-stage Lasso among two-stage bridge estimators holds for very sparse signals until the signal strength is below some threshold. The authors conduct numerical experiments to support their theoretical findings and validate the scope of their main conclusions for general design matrices.
62J05 Linear regression; mixed models
62J07 Ridge regression; shrinkage estimators (Lasso)
Full Text: DOI Euclid
[1] Aeron, S., Saligrama, V. and Zhao, M. (2010). Information theoretic bounds for compressed sensing. IEEE Trans. Inf. Theory 56 5111-5130. Zentralblatt MATH: 1366.94179
Digital Object Identifier: doi:10.1109/TIT.2010.2059891
· Zbl 1366.94179
[2] Barber, R. F. and Candès, E. J. (2015). Controlling the false discovery rate via knockoffs. Ann. Statist. 43 2055-2085. Zentralblatt MATH: 1327.62082
Digital Object Identifier: doi:10.1214/15-AOS1337
Project Euclid: euclid.aos/1438606853
· Zbl 1327.62082
[3] Bayati, M., Lelarge, M. and Montanari, A. (2015). Universality in polytope phase transitions and message passing algorithms. Ann. Appl. Probab. 25 753-822. Zentralblatt MATH: 1322.60207
Digital Object Identifier: doi:10.1214/14-AAP1010
Project Euclid: euclid.aoap/1424355130
· Zbl 1322.60207
[4] Bayati, M. and Montanari, A. (2011). The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE Trans. Inf. Theory 57 764-785. Zentralblatt MATH: 1366.94079
Digital Object Identifier: doi:10.1109/TIT.2010.2094817
· Zbl 1366.94079
[5] Bayati, M. and Montanari, A. (2012). The LASSO risk for Gaussian matrices. IEEE Trans. Inf. Theory 58 1997-2017. Zentralblatt MATH: 1365.62196
Digital Object Identifier: doi:10.1109/TIT.2011.2174612
· Zbl 1365.62196
[6] Bogdan, M., van den Berg, E., Su, W. and Candes, E. J. (2013). Supplementary materials for “Statistical estimation and testing via the sorted l1 norm”. Ann. Statist.
[7] Butucea, C., Ndaoud, M., Stepanova, N. A. and Tsybakov, A. B. (2018). Variable selection with Hamming loss. Ann. Statist. 46 1837-1875. Zentralblatt MATH: 1414.62126
Digital Object Identifier: doi:10.1214/17-AOS1572
Project Euclid: euclid.aos/1534492821
· Zbl 1414.62126
[8] Candès, E., Fan, Y., Janson, L. and Lv, J. (2018). Panning for gold: ‘model-X’ knockoffs for high dimensional controlled variable selection. J. R. Stat. Soc. Ser. B. Stat. Methodol. 80 551-577. Zentralblatt MATH: 1398.62335
Digital Object Identifier: doi:10.1111/rssb.12265
· Zbl 1398.62335
[9] Cho, H. and Fryzlewicz, P. (2012). High dimensional variable selection via tilting. J. R. Stat. Soc. Ser. B. Stat. Methodol. 74 593-622. Zentralblatt MATH: 1411.62183
Digital Object Identifier: doi:10.1111/j.1467-9868.2011.01023.x
· Zbl 1411.62183
[10] David, G. and Ilias, Z. (2017). High dimensional regression with binary coefficients. Estimating squared error and a phase transition. In Conference on Learning Theory 948-953.
[11] Dobriban, E. and Wager, S. (2018). High-dimensional asymptotics of prediction: Ridge regression and classification. Ann. Statist. 46 247-279. Zentralblatt MATH: 1428.62307
Digital Object Identifier: doi:10.1214/17-AOS1549
Project Euclid: euclid.aos/1519268430
· Zbl 1428.62307
[12] Donoho, D. and Jin, J. (2015). Higher criticism for large-scale inference, especially for rare and weak effects. Statist. Sci. 30 1-25. Zentralblatt MATH: 1332.62019
Digital Object Identifier: doi:10.1214/14-STS506
Project Euclid: euclid.ss/1425492437
· Zbl 1332.62019
[13] Donoho, D. and Montanari, A. (2016). High dimensional robust M-estimation: Asymptotic variance via approximate message passing. Probab. Theory Related Fields 166 935-969. Zentralblatt MATH: 1357.62220
Digital Object Identifier: doi:10.1007/s00440-015-0675-z
· Zbl 1357.62220
[14] Donoho, D. L., Johnstone, I. M., Hoch, J. C. and Stern, A. S. (1992). Maximum entropy and the nearly black object. J. Roy. Statist. Soc. Ser. B 54 41-81. With discussion and a reply by the authors. Zentralblatt MATH: 0788.62103
Digital Object Identifier: doi:10.1111/j.2517-6161.1992.tb01864.x
· Zbl 0788.62103
[15] Donoho, D. L., Maleki, A. and Montanari, A. (2009). Message-passing algorithms for compressed sensing. Proc. Natl. Acad. Sci. USA 106 18914-18919.
[16] Donoho, D. L., Maleki, A. and Montanari, A. (2011). The noise-sensitivity phase transition in compressed sensing. IEEE Trans. Inf. Theory 57 6920-6941. Zentralblatt MATH: 1365.94094
Digital Object Identifier: doi:10.1109/TIT.2011.2165823
· Zbl 1365.94094
[17] Donoho, D. L. and Tanner, J. (2005). Sparse nonnegative solution of underdetermined linear equations by linear programming. Proc. Natl. Acad. Sci. USA 102 9446-9451. Zentralblatt MATH: 1135.90368
Digital Object Identifier: doi:10.1073/pnas.0502269102
· Zbl 1135.90368
[18] El Karoui, N. (2010). High-dimensionality effects in the Markowitz problem and other quadratic programs with linear constraints: Risk underestimation. Ann. Statist. 38 3487-3566. Zentralblatt MATH: 1274.62365
Digital Object Identifier: doi:10.1214/10-AOS795
Project Euclid: euclid.aos/1291126965
· Zbl 1274.62365
[19] El Karoui, N., Bean, D., Bickel, P. J., Lim, C. and Yu, B. (2013). On robust regression with high-dimensional predictors. Proc. Natl. Acad. Sci. USA 110 14557-14562. Zentralblatt MATH: 1359.62184
Digital Object Identifier: doi:10.1073/pnas.1307842110
· Zbl 1359.62184
[20] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1360. Zentralblatt MATH: 1073.62547
Digital Object Identifier: doi:10.1198/016214501753382273
· Zbl 1073.62547
[21] Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 849-911. Zentralblatt MATH: 1411.62187
Digital Object Identifier: doi:10.1111/j.1467-9868.2008.00674.x
· Zbl 1411.62187
[22] Fletcher, A. K., Rangan, S. and Goyal, V. K. (2009). Necessary and sufficient conditions for sparsity pattern recovery. IEEE Trans. Inf. Theory 55 5758-5772. Zentralblatt MATH: 1367.94090
Digital Object Identifier: doi:10.1109/TIT.2009.2032726
· Zbl 1367.94090
[23] Frank, L. E. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools. Technometrics 35 109-135. Zentralblatt MATH: 0775.62288
Digital Object Identifier: doi:10.1080/00401706.1993.10485033
· Zbl 0775.62288
[24] Genovese, C. R., Jin, J., Wasserman, L. and Yao, Z. (2012). A comparison of the lasso and marginal regression. J. Mach. Learn. Res. 13 2107-2143. Zentralblatt MATH: 1435.62270
· Zbl 1435.62270
[25] Hastie, T., Tibshirani, R. and Tibshirani, R. J. (2017). Extended comparisons of best subset selection, forward stepwise selection, and the lasso. arXiv preprint arXiv:1707.08692. arXiv: 1707.08692
[26] Huang, J., Horowitz, J. L. and Ma, S. (2008). Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Ann. Statist. 36 587-613. Zentralblatt MATH: 1133.62048
Digital Object Identifier: doi:10.1214/009053607000000875
Project Euclid: euclid.aos/1205420512
· Zbl 1133.62048
[27] Huber, P. J. (1973). Robust regression: Asymptotics, conjectures and Monte Carlo. Ann. Statist. 1 799-821. Zentralblatt MATH: 0289.62033
Digital Object Identifier: doi:10.1214/aos/1176342503
Project Euclid: euclid.aos/1176342503
· Zbl 0289.62033
[28] Ji, P. and Jin, J. (2012). UPS delivers optimal phase diagram in high-dimensional variable selection. Ann. Statist. 40 73-103. Zentralblatt MATH: 1246.62160
Digital Object Identifier: doi:10.1214/11-AOS947
Project Euclid: euclid.aos/1331830775
· Zbl 1246.62160
[29] Jin, J., Zhang, C.-H. and Zhang, Q. (2014). Optimality of graphlet screening in high dimensional variable selection. J. Mach. Learn. Res. 15 2723-2772. Zentralblatt MATH: 1319.62139
· Zbl 1319.62139
[30] Ke, Z. T., Jin, J. and Fan, J. (2014). Covariate assisted screening and estimation. Ann. Statist. 42 2202-2242. Zentralblatt MATH: 1310.62085
Digital Object Identifier: doi:10.1214/14-AOS1243
Project Euclid: euclid.aos/1413810726
· Zbl 1310.62085
[31] Knight, K. and Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356-1378. Zentralblatt MATH: 1105.62357
Digital Object Identifier: doi:10.1214/aos/1015957397
Project Euclid: euclid.aos/1015957397
· Zbl 1105.62357
[32] Loh, P.-L. and Wainwright, M. J. (2017). Support recovery without incoherence: A case for nonconvex regularization. Ann. Statist. 45 2455-2482. Zentralblatt MATH: 1385.62008
Digital Object Identifier: doi:10.1214/16-AOS1530
Project Euclid: euclid.aos/1513328579
· Zbl 1385.62008
[33] Luo, S. and Chen, Z. (2014). Sequential lasso cum EBIC for feature selection with ultra-high dimensional feature space. J. Amer. Statist. Assoc. 109 1229-1240. Zentralblatt MATH: 1368.62205
Digital Object Identifier: doi:10.1080/01621459.2013.877275
· Zbl 1368.62205
[34] Maleki, A., Anitori, L., Yang, Z. and Baraniuk, R. G. (2013). Asymptotic analysis of complex LASSO via complex approximate message passing (CAMP). IEEE Trans. Inf. Theory 59 4290-4308. Zentralblatt MATH: 1364.62188
Digital Object Identifier: doi:10.1109/TIT.2013.2252232
· Zbl 1364.62188
[35] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436-1462. Zentralblatt MATH: 1113.62082
Digital Object Identifier: doi:10.1214/009053606000000281
Project Euclid: euclid.aos/1152540754
· Zbl 1113.62082
[36] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246-270. Zentralblatt MATH: 1155.62050
Digital Object Identifier: doi:10.1214/07-AOS582
Project Euclid: euclid.aos/1232115934
· Zbl 1155.62050
[37] Miller, A. (2002). Subset Selection in Regression, 2nd ed. Monographs on Statistics and Applied Probability 95. CRC Press/CRC, Boca Raton, FL. Zentralblatt MATH: 1051.62060
· Zbl 1051.62060
[38] Mousavi, A., Maleki, A. and Baraniuk, R. G. (2018). Consistent parameter estimation for LASSO and approximate message passing. Ann. Statist. 46 119-148. Zentralblatt MATH: 1401.62116
Digital Object Identifier: doi:10.1214/17-AOS1544
Project Euclid: euclid.aos/1519268426
· Zbl 1401.62116
[39] Ndaoud, M. and Tsybakov, A. B. (2018). Optimal variable selection and adaptive noisy compressed sensing. arXiv preprint arXiv:1809.03145. arXiv: 1809.03145
· Zbl 1448.94081
[40] Oymak, S. and Hassibi, B. (2016). Sharp MSE bounds for proximal denoising. Found. Comput. Math. 16 965-1029. Zentralblatt MATH: 1380.90221
Digital Object Identifier: doi:10.1007/s10208-015-9278-4
· Zbl 1380.90221
[41] Rad, K. R. (2011). Nearly sharp sufficient conditions on exact sparsity pattern recovery. IEEE Trans. Inf. Theory 57 4672-4679. Zentralblatt MATH: 1365.62203
Digital Object Identifier: doi:10.1109/TIT.2011.2145670
· Zbl 1365.62203
[42] Rangan, S., Goyal, V. and Fletcher, A. K. (2009). Asymptotic analysis of map estimation via the replica method and compressed sensing. In Advances in Neural Information Processing Systems 1545-1553.
[43] Reeves, G. and Gastpar, M. C. (2013). Approximate sparsity pattern recovery: Information-theoretic lower bounds. IEEE Trans. Inf. Theory 59 3451-3465. Zentralblatt MATH: 1364.94250
Digital Object Identifier: doi:10.1109/TIT.2013.2253852
· Zbl 1364.94250
[44] Su, W., Bogdan, M. and Candès, E. (2017). False discoveries occur early on the Lasso path. Ann. Statist. 45 2133-2150. Zentralblatt MATH: 06821121
Digital Object Identifier: doi:10.1214/16-AOS1521
Project Euclid: euclid.aos/1509436830
· Zbl 1459.62142
[45] Sur, P. and Candès, E. J. (2019). A modern maximum-likelihood theory for high-dimensional logistic regression. Proc. Natl. Acad. Sci. USA 116 14516-14525. Zentralblatt MATH: 1431.62084
Digital Object Identifier: doi:10.1073/pnas.1810420116
· Zbl 1431.62084
[46] Sur, P., Chen, Y. and Candès, E. J. (2019). The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled chi-square. Probab. Theory Related Fields 175 487-558. Zentralblatt MATH: 1431.62319
Digital Object Identifier: doi:10.1007/s00440-018-00896-9
· Zbl 1431.62319
[47] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. Zentralblatt MATH: 0850.62538
Digital Object Identifier: doi:10.1111/j.2517-6161.1996.tb02080.x
· Zbl 0850.62538
[48] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using \(\ell_1\)-constrained quadratic programming (Lasso). IEEE Trans. Inf. Theory 55 2183-2202. Zentralblatt MATH: 1367.62220
Digital Object Identifier: doi:10.1109/TIT.2009.2016018
· Zbl 1367.62220
[49] Wainwright, M. J. (2009). Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting. IEEE Trans. Inf. Theory 55 5728-5741. Zentralblatt MATH: 1367.94106
Digital Object Identifier: doi:10.1109/TIT.2009.2032816
· Zbl 1367.94106
[50] Wang, S., Weng, H. and Maleki, A. (2020). Which bridge estimator is optimal for variable selection? arXiv preprint arXiv:1705.08617. arXiv: 1705.08617
[51] Wang, S., Weng, H. and Maleki, A. (2020). Supplement to “Which bridge estimator is the best for variable selection?” https://doi.org/10.1214/19-AOS1906SUPP
[52] Wang, W., Wainwright, M. J. and Ramchandran, K. (2010). Information-theoretic limits on sparse signal recovery: Dense versus sparse measurement matrices. IEEE Trans. Inf. Theory 56 2967-2979. Zentralblatt MATH: 1366.94130
Digital Object Identifier: doi:10.1109/TIT.2010.2046199
· Zbl 1366.94130
[53] Wasserman, L. and Roeder, K. (2009). High-dimensional variable selection. Ann. Statist. 37 2178-2201. Zentralblatt MATH: 1173.62054
Digital Object Identifier: doi:10.1214/08-AOS646
Project Euclid: euclid.aos/1247663752
· Zbl 1173.62054
[54] Weng, H., Feng, Y. and Qiao, X. (2019). Regularization after retention in ultrahigh dimensional linear regression models. Statist. Sinica 29 387-407. Zentralblatt MATH: 1412.62098
· Zbl 1412.62098
[55] Weng, H., Maleki, A. and Zheng, L. (2018). Overcoming the limitations of phase transition by higher order analysis of regularization techniques. Ann. Statist. 46 3099-3129. Zentralblatt MATH: 1411.62194
Digital Object Identifier: doi:10.1214/17-AOS1651
Project Euclid: euclid.aos/1536307244
· Zbl 1411.62194
[56] Yang, E., Lozano, A. and Ravikumar, P. (2014). Elementary estimators for high-dimensional linear regression. In International Conference on Machine Learning 388-396.
[57] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567-1594. Zentralblatt MATH: 1142.62044
Digital Object Identifier: doi:10.1214/07-AOS520
Project Euclid: euclid.aos/1216237292
· Zbl 1142.62044
[58] Zhang, T. (2009). Some sharp performance bounds for least squares regression with \(L_1\) regularization. Ann. Statist. 37 2109-2144. · Zbl 1173.62029
[59] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541-2563. Zentralblatt MATH: 1222.62008
· Zbl 1222.62008
[60] Zheng, L., Maleki, A., Weng, H., Wang, X. and Long, T. (2017). Does \(\ell_p\)-minimization outperform \(\ell_1\)-minimization? IEEE Trans. Inf. Theory 63 6896-6935. Zentralblatt MATH: 1390.94511
Digital Object Identifier: doi:10.1109/TIT.2017.2717585
· Zbl 1390.94511
[61] Zhou, S. (2009). Thresholding procedures for high dimensional variable selection and statistical estimation. In Advances in Neural Information Processing Systems 2304-2312.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.