×

High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking. (English) Zbl 1437.62151

Summary: Penalized likelihood approaches are widely used for high-dimensional regression. Although many methods have been proposed and the associated theory is now well developed, the relative efficacy of different approaches in finite-sample settings, as encountered in practice, remains incompletely understood. There is therefore a need for empirical investigations in this area that can offer practical insight and guidance to users. In this paper, we present a large-scale comparison of penalized regression methods. We distinguish between three related goals: prediction, variable selection and variable ranking. Our results span more than 2300 data-generating scenarios, including both synthetic and semisynthetic data (real covariates and simulated responses), allowing us to systematically consider the influence of various factors (sample size, dimensionality, sparsity, signal strength and multicollinearity). We consider several widely used approaches (Lasso, Adaptive Lasso, Elastic Net, Ridge Regression, SCAD, the Dantzig Selector and Stability Selection). We find considerable variation in performance between methods. Our results support a “no panacea” view, with no unambiguous winner across all scenarios or goals, even in this restricted setting where all data align well with the assumptions underlying the methods. The study allows us to make some recommendations as to which approaches may be most (or least) suitable given the goal and some data characteristics. Our empirical results complement existing theory and provide a resource to compare methods across a range of scenarios and metrics.

MSC:

62G08 Nonparametric regression and quantile regression
62J05 Linear regression; mixed models
62J07 Ridge regression; shrinkage estimators (Lasso)
62-08 Computational methods for problems pertaining to statistics

Software:

flare; c060; glmnet; R
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Bickel, Pj; Ritov, Y.; Tsybakov, Ab, Simultaneous analysis of Lasso and Dantzig selector, Ann. Stat., 37, 1705-1732 (2009) · Zbl 1173.62022
[2] Bondell, Hd; Reich, Bj, Consistent high-dimensional Bayesian variable selection via penalized credible regions, J. Am. Stat. Assoc., 107, 1610-1624 (2012) · Zbl 1258.62026
[3] Breheny, P.; Huang, J., Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection, Ann. Appl. Stat., 5, 232-253 (2011) · Zbl 1220.62095
[4] Bühlmann, P.; Van De Geer, S., Statistics for High-Dimensional Data: Methods, Theory and Applications (2011), Berlin: Springer, Berlin · Zbl 1273.62015
[5] Bühlmann, P.; Mandozzi, J., High-dimensional variable screening and bias in subsequent inference, with an empirical comparison, Comput. Stat., 29, 407-430 (2014) · Zbl 1306.65035
[6] Candes, E.; Tao, T., The Dantzig selector: statistical estimation when \(p\) is much larger than \(n\), Ann. Stat., 35, 2313-2351 (2007) · Zbl 1139.62019
[7] Celeux, G.; Anbari, Me; Marin, Jm; Robert, Cp, Regularization in regression: comparing Bayesian and frequentist methods in a poorly informative situation, Bayesian Anal., 7, 477-502 (2012) · Zbl 1330.62284
[8] Efron, B.; Hastie, T.; Tibshirani, R., Discussion: The Dantzig selector: statistical estimation when \(p\) is much larger than \(n\), Ann. Stat., 35, 2358-2364 (2007)
[9] Fan, J.; Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties, J. Am. Stat. Assoc., 96, 1348-1360 (2001) · Zbl 1073.62547
[10] Fan, J.; Lv, J., A selective overview of variable selection in high dimensional feature space, Stat. Sin., 20, 101-148 (2010) · Zbl 1180.62080
[11] Fan, J.; Peng, H., Nonconcave penalized likelihood with a diverging number of parameters, Ann. Stat., 32, 928-961 (2004) · Zbl 1092.62031
[12] Friedman, Jh; Hastie, T.; Tibshirani, R., Regularization paths for generalized linear models via coordinate descent, J. Stat. Softw., 33, 1-22 (2010)
[13] Hastie, T., Tibshirani, R., Tibshirani, R.J.: Extended Comparisons of Best Subset Selection, Forward Stepwise Selection, and the Lasso (2017). arXiv preprint arXiv:1707.08692 · Zbl 1059.62524
[14] Hoerl, Ae; Kennard, Rw, Ridge regression: biased estimation for nonorthogonal problems, Technometrics, 12, 55-67 (1970) · Zbl 0202.17205
[15] James, Gm; Radchenko, P.; Lv, J., DASSO: connections between the Dantzig selector and Lasso, J. R. Stat. Soc. Ser. B, 71, 127-142 (2009) · Zbl 1231.62129
[16] Li, X.; Zhao, T.; Yuan, X.; Liu, H., The flare package for high dimensional linear regression and precision matrix estimation in R, J. Mach. Learn. Res., 16, 553-557 (2015) · Zbl 1337.62007
[17] Lim, C.; Yu, B., Estimation stability with cross-validation (ESCV), J. Comput. Graph. Stat., 25, 464-492 (2016)
[18] Meinshausen, N.; Bühlmann, P., High-dimensional graphs and variable selection with the lasso, Ann. Stat., 34, 1436-1462 (2006) · Zbl 1113.62082
[19] Meinshausen, N.; Bühlmann, P., Stability selection, J. R. Stat. Soc. Ser. B, 72, 417-473 (2010) · Zbl 1411.62142
[20] Meinshausen, N.; Rocha, G.; Yu, B., Discussion: a tale of three cousins: Lasso, L2 Boosting and Dantzig, Ann. Stat., 35, 2373-2384 (2007)
[21] Perrakis, K.; Mukherjee, S., The Alzheimer’s Disease Neuroimaging Initiative: Scalable Bayesian regression in high dimensions with multiple data sources, J. Comput. Graph. Stat. Adv. (2019)
[22] R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2018)
[23] Sill, M.; Hielscher, T.; Becker, N.; Zucknick, M., c060: extended inference with lasso and elastic-net regularized cox and generalized linear models, J. Stat. Softw., 62, 1-22 (2014)
[24] Integrated genomic analyses of ovarian carcinoma, Nature, 474, 609-615 (2011)
[25] Tibshirani, R., Regression shrinkage and selection via the Lasso, J. R. Stat. Soc., 58, 267-288 (1996) · Zbl 0850.62538
[26] Tibshirani, R.; Saunders, M.; Rosset, S.; Zhu, J.; Knight, K., Sparsity and smoothness via the fused lasso, J. R. Stat. Soc. Ser. B, 67, 91-108 (2005) · Zbl 1060.62049
[27] Tucker, Sl; Gharpure, K.; Herbrich, Sm; Unruh, Ak; Nick, Am; Crane, Ek; Coleman, Rl; Guenthoer, J.; Dalton, Hj; Wu, Sy; Rupaimoole, R.; Lopez-Berestein, G.; Ozpolat, B.; Ivan, C.; Hu, W.; Baggerly, Ka; Sood, Ak, Molecular biomarkers of residual disease after surgical debulking of high-grade serous ovarian cancer, Clin. Cancer Res., 20, 3280-3288 (2014)
[28] Wainwright, Mj, Sharp thresholds for high-dimensional and noisy sparsity recovery using L1-constrained quadratic programming (Lasso), IEEE Trans. Inf. Theory, 55, 2183-2202 (2009) · Zbl 1367.62220
[29] Yuan, M.; Lin, Y., Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Ser. B, 68, 49-67 (2006) · Zbl 1141.62030
[30] Zhang, Ch; Huang, J., The sparsity and bias of the Lasso selection in high-dimensional linear regression, Ann. Stat., 36, 1567-1594 (2008) · Zbl 1142.62044
[31] Zhao, P.; Yu, B., On model selection consistency of lasso, J. Mach. Learn. Res., 7, 2541-2563 (2006) · Zbl 1222.62008
[32] Zou, H., The adaptive lasso and its oracle properties, J. Am. Stat. Assoc., 101, 3133-3164 (2006) · Zbl 1171.62326
[33] Zou, H., Discussion of “Stability selection” by Nicolai Meinshausen and Peter Buhlmann, J. R. Stat. Soc. Ser. B, 72, 468 (2010)
[34] Zou, H.; Hastie, T., Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B, 67, 301-320 (2005) · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.