×

Skinny Gibbs: a consistent and scalable Gibbs sampler for model selection. (English) Zbl 1428.62116

Summary: We consider the computational and statistical issues for high-dimensional Bayesian model selection under the Gaussian spike and slab priors. To avoid large matrix computations needed in a standard Gibbs sampler, we propose a novel Gibbs sampler called “Skinny Gibbs” which is much more scalable to high-dimensional problems, both in memory and in computational efficiency. In particular, its computational complexity grows only linearly in \(p,\) the number of predictors, while retaining the property of strong model selection consistency even when \(p\) is much greater than the sample size \(n\). The present article focuses on logistic regression due to its broad applicability as a representative member of the generalized linear models. We compare our proposed method with several leading variable selection methods through a simulation study to show that Skinny Gibbs has a strong performance as indicated by our theoretical work.

MSC:

62F15 Bayesian inference
62J12 Generalized linear models (logistic models)

Software:

EMVS; SSS; pi-MASS; glmnet
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Albert, J. H.; Chib, S., Bayesian Analysis of Binary and Polychotomous Response Data, Journal of the American Statistical Association, 88, 669-679 (1993) · Zbl 0774.62031
[2] Barbieri, M. M.; Berger, J. O., Optimal Predictive Model Selection, Annals of Statistics, 32, 870-897 (2004) · Zbl 1092.62033
[3] Bhattacharya, A.; Chakraborty, A.; Mallick, B., Fast Sampling with Gaussian Scale Mixture Priors in High-Dimensional Regression, Biometrika, 103, 985-991 (2016)
[4] Bhattacharya, A.; Pati, D.; Pillai, N. S.; Dunson, D. B., Dirichlet Laplace Priors for Optimal Shrinkage, Journal of the American Statistical Association, 110, 1479-1490 (2015) · Zbl 1373.62368
[5] Bondell, H. D.; Reich, B. J., Consistent High Dimensional Bayesian Variable Selection via Penalized Credible Regions, Journal of the American Statistical Association, 107, 1610-1624 (2012) · Zbl 1258.62026
[6] Breheny, P.; Huang, J., Coordinate Descent Algorithms for Nonconvex Penalized Regression, with Applications to Biological Feature Selection, Annals of Applied Statistics, 5, 232-253 (2011) · Zbl 1220.62095
[7] Bühlmann, P., and van de Geer, S. (2011), Statistics for High-Dimensional Data, Berlin, Heidelberg: Springer-Verlag. · Zbl 1273.62015
[8] Carbonetto, P.; Stephens, M., Scalable Variational Inference for Bayesian Variable Selection in Regression, and Its Accuracy in Genetic Association Studies, Bayesian Analysis, 7, 73-108 (2012) · Zbl 1330.62089
[9] Castillo, I.; Van Der Vaart, A., Needles and Straw in a Haystack: Posterior Concentration for Possibly Sparse Sequences, Annals of Statistics, 40, 2069-2101 (2012) · Zbl 1257.62025
[10] Chen, J.; Chen, Z., Extended BIC for Small-n-Large-P Sparse GLM, Statistica Sinica, 22, 555-574 (2012) · Zbl 1238.62080
[11] Chen, M. H.; Huang, L.; Ibrahim, J. G.; Kim, S., Bayesian Variable Selection and Computation for Generalized Linear Models with Conjugate Priors, Bayesian Analysis, 3, 585-614 (2008) · Zbl 1330.62298
[12] Fan, J.; Li, R., Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties, Journal of the American Statistical Association, 96, 1348-1360 (2001) · Zbl 1073.62547
[13] Fan, J.; Peng, H., Nonconcave Penalized Likelihood with A Diverging Number of Parameters, Annals of Statistics, 32, 928-961 (2004) · Zbl 1092.62031
[14] Friedman, J. H., Hastie, T., and Tibshirani, R. (2008), “Regularized Paths for Generalized Linear Models via Coordinate Descent,” Journal of Statistical Software, 33, 1-22.
[15] Gelman, A.; Jakulin, A.; Pittau, M.; Su, Y., A Weakly Informative Default Prior Distribution for Logistic and Other Regression Models, Annals of Applied Statistics, 4, 1360-1383 (2008) · Zbl 1156.62017
[16] George, E. I.; Mcculloch, R. E., Variable Selection via Gibbs Sampling, Journal of the American Statistical Association, 88, 881-889 (1993)
[17] Geweke, J. (1996), “Variable Selection and Model Comparison in Regression,” in Bayesian Statistics, eds. J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, Oxford: Oxford University Press, pp. 609-620. · Zbl 0324.62004
[18] Guan, Y.; Stephens, M., Bayesian Variable Selection Regression for Genome-Wide Association Studies and Other Large-scale Problems, Annals of Applied Statistics, 5, 1780-1815 (2011) · Zbl 1229.62145
[19] Hans, C.; Dobra, A.; West, M., Shotgun Stochastic Search for “Large p” Regression, Journal of the American Statistical Association, 102, 507-516 (2007) · Zbl 1134.62398
[20] Holmes, C. C.; Held, L., Bayesian Auxiliary Variable Models for Binary and Multinomial Regression, Bayesian Analysis, 1, 145-168 (2006) · Zbl 1331.62142
[21] Huang, J.; Zhang, C. H., Estimation and Selection via Absolute Penalized Convex Minimization And its Multistage Adaptive Applications, Journal of Machine Learning Research, 13, 1839-1864 (2012) · Zbl 1435.62091
[22] Imai, Y.; Patel, H. R.; Doliba, N. M.; Matschinsky, F. M.; Tobias, J. W.; Ahima, R. S., Analysis of Gene Expression in Pancreatic Islets From Diet-Induced Obese Mice, Physiological Genomics, 36, 43-51 (2008)
[23] Ishwaran, H.; Rao, J. S., Spike and Slab Variable Selection: Frequentist and Bayesian Strategies, Annals of Statistics, 33, 730-773 (2005) · Zbl 1068.62079
[24] Johnson, V. E.; Rossell, D., Bayesian Model Selection in High-Dimensional Settings, Journal of the American Statistical Association, 107, 649-660 (2012) · Zbl 1261.62024
[25] Lan, H.; Chen, M.; Flowers, J. B.; Yandell, B. S.; Stapleton, D. S.; Mata, C. M.; Mui, E. T.; Flowers, M. T.; Schueler, K. L.; Manly, K. F.; Williams, R. W.; Kendziorski, K.; Attie, A. D., Combined Expression Trait Correlations and Expression Quantitative Trait Locus Mapping, PLoS Genetics, 2, e6 (2006)
[26] Liang, F.; Liu, C.; Carroll, R., Stochastic Approximation in Monte Carlo Computation, Journal of the American Statistical Association, 102, 305-320 (2007) · Zbl 1226.65002
[27] Liang, F.; Song, Q.; Yu, K., Bayesian Subset Modeling for High Dimensional Generalized Linear Models, Journal of the American Statistical Association, 108, 589-606 (2013) · Zbl 06195963
[28] Narisetty, N. N.; He, X., Bayesian Variable Selection With Shrinking and Diffusing Priors, Annals of Statistics, 42, 789-817 (2014) · Zbl 1302.62158
[29] O’Brien, S. M.; Dunson, D. B., Bayesian Multivariate Logistic Regression, Biometrics, 60, 739-746 (2004) · Zbl 1274.62375
[30] Park, T.; Casella, G., The Bayesian LASSO, Journal of the American Statistical Association, 103, 681-686 (2008) · Zbl 1330.62292
[31] Polson, N.; Scott, J. G.; Windle, J., Bayesian Inference for Logistic Models using Pólya-Gamma Latent Variables, Journal of the American Statistical Association, 108, 1339-1349 (2013) · Zbl 1283.62055
[32] Ročková, V.; George, E. I., EMVS: The EM Approach to Bayesian Variable Selection, Journal of the American Statistical Association, 109, 828-846 (2014) · Zbl 1367.62049
[33] Scott, G.; Berger, J., Bayes and Empirical-Bayes Multiplicity Adjustment in the Variable-Selection Problem, Annals of Statistics, 38, 2587-2619 (2010) · Zbl 1200.62020
[34] Shen, X.; Pan, W.; Zhu, Y., Likelihood-Based Selection and Sharp Parameter Estimation, Journal of the American Statistical Association, 107, 223-232 (2012) · Zbl 1261.62020
[35] Stefanski, L. A., A Normal Scale Mixture Representation of the Logistic Distribution, Statistics and Probability Letters, 11, 69-70 (1991) · Zbl 0712.62010
[36] Tibshirani, R. (1996), “Regression Shrinkage and Selection via the Lasso,” Journal of the Royal Statistical Society, Series B, 58, 267-288. · Zbl 0850.62538
[37] Geer, S. A., High-Dimensional Generalized Linear Models and the Lasso, Annals of Statistics, 36, 614-645 (2008) · Zbl 1138.62323
[38] Wendel, A. A.; Li, L. O.; Cline, G. W.; Shulman, G. I.; Coleman, R. A., Glycerol-3-Phosphate Acyltransferase 1 Deficiency in ob/ob Mice Diminishes Hepatic Steatosis but Does Not Protect against Insulin Resistance or Obesity, Diabetes, 59, 1321-1329 (2010)
[39] Zhang, C. H., Nearly Unbiased Variable Selection Under Minimax Concave Penalty, The Annals of Statistics, 38, 894-942 (2010) · Zbl 1183.62120
[40] Zhao, P.; Yu, B., On Model Selection Consistency of Lasso, Efficient Empirical Bayes Variable Selection and Estimation in Linear Models, 7, 2541-2563 (2006) · Zbl 1222.62008
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.