×

Shrinkage priors for Bayesian penalized regression. (English) Zbl 1431.62564

Summary: In linear regression problems with many predictors, penalized regression techniques are often used to guard against overfitting and to select variables relevant for predicting an outcome variable. Recently, Bayesian penalization is becoming increasingly popular in which the prior distribution performs a function similar to that of the penalty term in classical penalization. Specifically, the so-called shrinkage priors in Bayesian penalization aim to shrink small effects to zero while maintaining true large effects. Compared to classical penalization techniques, Bayesian penalization techniques perform similarly or sometimes even better, and they offer additional advantages such as readily available uncertainty estimates, automatic estimation of the penalty parameter, and more flexibility in terms of penalties that can be considered. However, many different shrinkage priors exist and the available, often quite technical, literature primarily focuses on presenting one shrinkage prior and often provides comparisons with only one or two other shrinkage priors. This can make it difficult for researchers to navigate through the many prior options and choose a shrinkage prior for the problem at hand. Therefore, the aim of this paper is to provide a comprehensive overview of the literature on Bayesian penalization. We provide a theoretical and conceptual comparison of nine different shrinkage priors and parametrize the priors, if possible, in terms of scale mixture of normal distributions to facilitate comparisons. We illustrate different characteristics and behaviors of the shrinkage priors and compare their performance in terms of prediction and variable selection in a simulation study. Additionally, we provide two empirical examples to illustrate the application of Bayesian penalization. Finally, an R package is available online (https://github.com/sara-vanerp/bayesreg) which allows researchers to perform Bayesian penalized regression with novel shrinkage priors in an easy manner.

MSC:

62P15 Applications of statistics to psychology
62F15 Bayesian inference
62J05 Linear regression; mixed models
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Alhamzawi, R.; Yu, K.; Benoit, D. F., Bayesian adaptive lasso quantile regression, Statistical Modelling, 12, 3, 279-297 (2012) · Zbl 07257880
[2] Andersen, M. R.; Vehtari, A.; Winther, O.; Hansen, L. K., Bayesian inference for spatio-temporal spike-and-slab priors, Journal of Machine Learning Research (JMLR), 18, 139, 1-58 (2017) · Zbl 1442.62049
[3] Armagan, A.; Dunson, D. B.; Lee, J., Generalized double Pareto shrinkage, Statistica Sinica (2013) · Zbl 1259.62061
[4] Azmak, O.; Bayer, H.; Caplin, A.; Chun, M.; Glimcher, P.; Koonin, S.; Patrinos, A., Using big data to understand the human condition: The Kavli HUMAN project, Big Data, 3, 3, 173-188 (2015)
[5] Bae, K.; Mallick, B. K., Gene selection using a two-level hierarchical Bayesian model, Bioinformatics, 20, 18, 3423-3430 (2004)
[6] Berger, J. O., The case for objective Bayesian analysis, Bayesian Analysis, 3, 385-402 (2006) · Zbl 1331.62042
[7] Betancourt, M., A conceptual introduction to Hamiltonian Monte Carlo (2017), ArXiv preprint arXiv:1701.02434
[8] Bhadra, A.; Datta, J.; Polson, N. G.; Willard, B., The horseshoe+ estimator of ultra-sparse signals, Bayesian Analysis, 12, 4, 1105-1131 (2016) · Zbl 1384.62079
[9] Bhadra, A.; Datta, J.; Polson, N. G.; Willard, B. T., Lasso meets horseshoe: a survey (2017), ArXiv preprint arXiv:1706.10179 · Zbl 1429.62308
[10] Bhattacharya, A.; Pati, D.; Pillai, N. S.; Dunson, D. B., Bayesian shrinkage (2012), ArXiv preprint arXiv:1212.6088
[11] Bornn, L.; Gottardo, R.; Doucet, A., Grouping priors and the Bayesian elastic net (2010), ArXiv preprint arXiv:1001.4083
[12] Breheny, P.; Huang, J., Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors, Statistics and Computing, 25, 173-187 (2015) · Zbl 1331.62359
[13] Bürkner, P.-C., brms: An R package for Bayesian multilevel models using stan, Journal of Statistical Software, 80, 1, 1-28 (2017)
[14] Caron, F.; Doucet, A., Sparse Bayesian nonparametric regression, (Proceedings of the 25th international conference on Machine learning - ICML ’08 (2008), Association for Computing Machinery (ACM))
[15] Carvalho, C. M.; Polson, N. G.; Scott, J. G., The horseshoe estimator for sparse signals, Biometrika, 97, 2, 465-480 (2010) · Zbl 1406.62021
[16] Cortez, P., & Silva, A. M. G. (2008). Using data mining to predict secondary school student performance. In A. Brito and J. Teixeira (Eds.), Proceedings of 5th future business technology conference; Cortez, P., & Silva, A. M. G. (2008). Using data mining to predict secondary school student performance. In A. Brito and J. Teixeira (Eds.), Proceedings of 5th future business technology conference
[17] Derksen, S.; Keselman, H. J., Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables, British Journal of Mathematical and Statistical Psychology, 45, 2, 265-282 (1992)
[18] Fan, J.; Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96, 456, 1348-1360 (2001) · Zbl 1073.62547
[19] Fawcett, T., Mining the quantified self: personal knowledge discovery as a challenge for data science, Big Data, 3, 4, 249-266 (2015)
[20] Feng, X.-N.; Wang, Y.; Lu, B.; Song, X.-Y., Bayesian regularized quantile structural equation models, Journal of Multivariate Analysis, 154, 234-248 (2017) · Zbl 1352.62094
[21] Feng, X.-N.; Wu, H.-T.; Song, X.-Y., Bayesian adaptive lasso for ordinal regression with latent variables, Sociological Methods & Research, 46, 4, 926-953 (2015)
[22] Friedman, J.; Hastie, T.; Tibshirani, R., Regularization paths for generalized linear models via coordinate descent, Journal of Statistical Software, 33, 1, 1-22 (2010)
[23] Gelman, A., Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper), Bayesian Analysis, 1, 3, 515-534 (2006) · Zbl 1331.62139
[24] Gelman, A.; Rubin, D. B., Inference from iterative simulation using multiple sequences, Statistical Science, 7, 4, 457-472 (1992) · Zbl 1386.65060
[25] George, E. I.; McCulloch, R. E., Variable selection via gibbs sampling, Journal of the American Statistical Association, 88, 423, 881 (1993)
[26] Ghosh, J.; Li, Y.; Mitra, R., On the use of cauchy prior distributions for bayesian logistic regression, Bayesian Analysis, 13, 2, 359-383 (2017) · Zbl 1407.62276
[27] Griffin, J. E.; Brown, P. J., Alternative prior distributions for variable selection with very many more variables than observations (2005), University of Warwick. Centre for Research in Statistical Methodology
[28] Griffin, J. E.; Brown, P. J., Bayesian hyper-lassos with non-convex penalization, Australian & New Zealand Journal of Statistics, 53, 4, 423-442 (2011) · Zbl 1335.62047
[29] Griffin, J.; Brown, P., Hierarchical shrinkage priors for regression models, Bayesian Analysis, 12, 1, 135-159 (2017) · Zbl 1384.62225
[30] Hahn, P. R.; Carvalho, C. M., Decoupling shrinkage and selection in Bayesian linear models: A posterior summary perspective, Journal of the American Statistical Association, 110, 509, 435-448 (2015) · Zbl 1373.62036
[31] Hans, C., Bayesian lasso regression, Biometrika, 96, 4, 835-845 (2009) · Zbl 1179.62038
[32] Hastie, T.; Tibshirani, R.; Wainwright, M., Statistical learning with sparsity (2015), CRC press · Zbl 1319.68003
[33] Hoerl, A. E.; Kennard, R. W., Ridge regression: Biased estimation for nonorthogonal problems, Technometrics, 12, 1, 55-67 (1970) · Zbl 0202.17205
[34] Hsiang, T. C., A Bayesian view on ridge regression, The Statistician, 24, 4, 267 (1975)
[35] Ishwaran, H.; Rao, J. S., Spike and slab variable selection: Frequentist and Bayesian strategies, The Annals of Statistics, 33, 2, 730-773 (2005) · Zbl 1068.62079
[36] Jacobucci, R.; Grimm, K. J., Comparison of frequentist and Bayesian regularization in structural equation modeling, Structural Equation Modeling: A Multidisciplinary Journal, 25, 4, 639-649 (2018)
[37] Kaseva, T., Convergence diagnosis and comparison of shrinkage priors. github repository (2018)
[38] Kyung, M.; Gill, J.; Ghosh, M.; Casella, G., Penalized regression, standard errors, and Bayesian lassos, Bayesian Analysis, 12, 3, 753-778 (2010) · Zbl 1330.62289
[39] Li, Q.; Lin, N., The Bayesian elastic net, Bayesian Analysis, 5, 1, 151-170 (2010) · Zbl 1330.65026
[40] Lichman, M., UCI machine learning repository (2013), University of California, Irvine, School of Information and Computer Sciences
[41] Liu, H.; Xu, X.; Li, J. J., HDCI: High dimensional confidence interval based on lasso and bootstrap. R package version 1.0-2 (2017)
[42] Lu, Z. H.; Chow, S. M.; Loken, E., Bayesian factor analysis as a variable-selection problem: Alternative priors and consequences, Multivariate Behavioral Research, 51, 4, 519-539 (2016)
[43] Lumley, T., leaps: Regression Subset Selection. R package version 3.0 (2017)
[44] Matthews, B., Comparison of the predicted and observed secondary structure of T4 phage lysozyme, Biochimica et Biophysica Acta (BBA) - Protein Structure, 405, 2, 442-451 (1975)
[45] McNeish, D. M., Using lasso for predictor selection and to assuage overfitting: A method long overlooked in behavioral sciences, Multivariate Behavioral Research, 50, 5, 471-484 (2015)
[46] Meinshausen, N.; Bühlmann, P., Stability selection, Journal of the Royal Statistical Society. Series B. Statistical Methodology, 72, 4, 417-473 (2010) · Zbl 1411.62142
[47] Meuwissen, T. H.; Hayes, B. J.; Goddard, M. E., Prediction of total genetic value using genome-wide dense marker maps, Genetics, 157, 4, 1819-1829 (2001)
[48] Mitchell, T. J.; Beauchamp, J. J., Bayesian variable selection in linear regression, Journal of the American Statistical Association, 83, 404, 1023-1032 (1988) · Zbl 0673.62051
[49] Monnahan, C. C.; Thorson, J. T.; Branch, T. A., Faster estimation of Bayesian models in ecology using Hamiltonian Monte Carlo, Methods in Ecology and Evolution, 8, 3, 339-348 (2016)
[50] Mulder, J.; Pericchi, L. R., The matrix-f prior for estimating and testing covariance matrices, Bayesian Analysis, 13, 4, 1189-1210 (2018)
[51] Park, T.; Casella, G., The bayesian lasso, Journal of the American Statistical Association, 103, 482, 681-686 (2008) · Zbl 1330.62292
[52] Peltola, T.; Havulinna, A. S.; Salomaa, V.; Vehtari, A., Hierarchical bayesian survival analysis and projective covariate selection in cardiovascular event risk prediction, (Proceedings of the eleventh UAI conference on bayesian modeling applications workshop-volume 1218 (2014), CEUR-WS. org), 79-88
[53] Perkins, N. J.; Schisterman, E. F., The inconsistency of “Optimal” Cutpoints Obtained using Two Criteria based on the Receiver Operating Characteristic Curve, American Journal of Epidemiology, 163, 7, 670-675 (2006)
[54] Piironen, J.; Betancourt, M.; Simpson, D.; Vehtari, A., Contributed comment on article by van der Pas, Szabó, and van der Vaart, Bayesian Analysis, 12, 4, 1264-1266 (2017)
[55] Piironen, J.; Vehtari, A., Comparison of Bayesian predictive methods for model selection, Statistics and Computing, 27, 3, 711-735 (2016) · Zbl 1505.62321
[56] Piironen, J.; Vehtari, A., Sparsity information and regularization in the horseshoe and other shrinkage priors, Electronic Journal of Statistics, 11, 2, 5018-5051 (2017) · Zbl 1459.62141
[57] Polson, N. G.; Scott, J. G., Shrink globally, act locally: Sparse Bayesian regularization and prediction, (Bayesian statistics, vol. 9 (2011), Oxford University Press (OUP)), 501-538
[58] Polson, N. G.; Scott, J. G., On the half-cauchy prior for a global scale parameter, Bayesian Analysis, 7, 4, 887-902 (2012) · Zbl 1330.62148
[59] Polson, N. G.; Scott, J. G.; Windle, J., The bayesian bridge, Journal of the Royal Statistical Society. Series B. Statistical Methodology, 76, 4, 713-733 (2014) · Zbl 07555460
[60] Redmond, M.; Baveja, A., A data-driven software tool for enabling cooperative information sharing among police departments, European Journal of Operational Research, 141, 3, 660-678 (2002) · Zbl 1081.68745
[61] Roy, V.; Chakraborty, S., Selection of tuning parameters, solution paths and standard errors for bayesian lassos, Bayesian Analysis (2016)
[62] Stan Development Team, rstanarm: Bayesian applied regression modeling via Stan. R package version 2.13.1 (2016), URL http://mc-stan.org/
[63] Stan development team, RStan: The R interface to Stan, R package version 2.16.2 (2017), URL http://mc-stan.org
[64] Stan development team, Stan Modeling Language Users Guide and Reference Manual, Version 2.17.0 (2017), URL http://mc-stan.org
[65] Stan development team, The Stan Core Library, Version 2.16.0 (2017), URL http://mc-stan.org
[66] Tibshirani, R., Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B. Statistical Methodology, 58, 1, 267-288 (1996) · Zbl 0850.62538
[67] Tibshirani, R., Regression shrinkage and selection via the lasso: a retrospective, Journal of the Royal Statistical Society. Series B. Statistical Methodology, 73, 3, 273-282 (2011) · Zbl 1411.62212
[68] van Erp, S.; Mulder, J.; Oberski, D. L., Prior sensitivity analysis in default bayesian structural equation modeling, Psychological Methods, 23, 2, 363-388 (2018)
[69] Vehtari, A.; Gabry, J.; Yao, Y.; Gelman, A., loo: Efficient leave-one-out cross-validation and WAIC for Bayesian models. R package version 2.0.0 (2018), URL https://CRAN.R-project.org/package=loo
[70] West, M., On scale mixtures of normal distributions, Biometrika, 74, 3, 646-648 (1987) · Zbl 0648.62015
[71] van de Wiel, M. A.; Beest, D. E.t.; Münch, M., Learning from a lot: Empirical Bayes in high-dimensional prediction settings (2017), ArXiv preprint arXiv:1709.04192
[72] Wolpert, D. H.; Strauss, C. E.M., What bayes has to say about the evidence procedure, (Maximum entropy and bayesian methods (1996), Springer Netherlands), 61-78 · Zbl 0895.62036
[73] Yuan, M.; Lin, Y., Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society. Series B. Statistical Methodology, 68, 1, 49-67 (2006) · Zbl 1141.62030
[74] Zhao, S.; Gao, C.; Mukherjee, S.; Engelhardt, B. E., Bayesian group factor analysis with structured sparsity, Journal of Machine Learning Research (JMLR), 17, 196, 1-47 (2016) · Zbl 1436.62233
[75] Zou, H., The adaptive lasso and its oracle properties, Journal of the American Statistical Association, 101, 476, 1418-1429 (2006) · Zbl 1171.62326
[76] Zou, H.; Hastie, T., Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society. Series B. Statistical Methodology, 67, 2, 301-320 (2005) · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.