Bayesian variable selection for Gaussian copula regression models. (English) Zbl 07499903

Summary: We develop a novel Bayesian method to select important predictors in regression models with multiple responses of diverse types. A sparse Gaussian copula regression model is used to account for the multivariate dependencies between any combination of discrete and/or continuous responses and their association with a set of predictors. We use the parameter expansion for data augmentation strategy to construct a Markov chain Monte Carlo algorithm for the estimation of the parameters and the latent variables of the model. Based on a centered parameterization of the Gaussian latent variables, we design a fixed-dimensional proposal distribution to update jointly the latent binary vectors of important predictors and the corresponding nonzero regression coefficients. For Gaussian responses and for outcomes that can be modeled as a dependent version of a Gaussian response, this proposal leads to a Metropolis-Hastings step that allows an efficient exploration of the predictors’ model space. The proposed strategy is tested on simulated data and applied to real datasets in which the responses consist of low-intensity counts, binary, ordinal and continuous variables.


62-XX Statistics


SSS; BDgraph; bfa; pi-MASS; loo
Full Text: DOI arXiv


[1] Albert, J. H.; Chib, S., “Bayesian Analysis of Binary and Polychotomous Response Data, Journal of American Statistical Association, 88, 669-679 (1993) · Zbl 0774.62031
[2] Bhadra, A.; Rao, A.; Baladandayuthapani, V., “Inferring Network Structure in Non-normal and Mixed Discrete-continuous Genomic Data, Biometrics, 74, 185-195 (2018) · Zbl 1415.62085
[3] Bornn, L.; Caron, F., “Bayesian Clustering in Decomposable Graphs, Bayesian Analysis, 6, 829-846 (2011) · Zbl 1330.62244
[4] Bottolo, L.; Richardson, S., “Evolutionary Stochastic Search for Bayesian Model Exploration, Bayesian Analysis, 5, 583-618 (2010) · Zbl 1330.90042
[5] Bové, D. S.; Held, L., “Hyper-g priors for Generalized Linear Models, Bayesian Analysis, 6, 387-410 (2011) · Zbl 1330.62058
[6] Brooks, S. P.; Giudici, P.; Roberts, G. O., “Efficient Construction of Reversible Jump Markov Chain Monte Carlo Proposal Distributions, Journal of Royal Statistical Society, Series B, 65, 3-39 (2003) · Zbl 1063.62120
[7] Brown, P. J.; Vannucci, M.; Fearn, T., “Multivariate Bayesian Variable Selection and Prediction, Journal of Royal Statistical Society, Series B, 60, 627-641 (1998) · Zbl 0909.62022
[8] Castro, M.; Paleti, R.; Bhat, C. R., “A Latent Variable Representation of Count Data Models to Accommodate Spatial and Temporal Dependence: Application to Predicting Crash Frequency at Intersections, Transportation Research B - Methods, 46, 253-272 (2012)
[9] Chib, S.; Greenberg, E., “Analysis of Multivariate Probit Models, Biometrika, 85, 347-361 (1998) · Zbl 0938.62020
[10] Davies, G.; Lam, M.; Harris, S. E.; Trampush, J. W.; Luciano, M.; Hill, W., “Study of 300,486 Individuals Identifies 148 Independent Genetic Loci Influencing General Cognitive Function, Nature Communication, 9, 2098 (2018)
[11] Dawid, A. P.; Lauritzen, S. L., “Hyper Markov Laws in the Statistical Analysis of Decomposable Graphical Models, Annals of Statistics, 21, 1272-1317 (1993) · Zbl 0815.62038
[12] Dellaportas, P.; Forster, J. J.; Ntzoufras, I., “On Bayesian Model and Variable Selection Using MCMC, Statistical Computing, 12, 27-36 (2002) · Zbl 1247.62086
[13] Deshpande, S. K.; Ročková, V.; George, E. I., “Simultaneous Variable and Covariance Selection with the Multivariate Spike-and-slab Lasso, Journal of Computational and Graphical Statistics, 28, 1-11 (2019)
[14] Dvorzak, M.; Wagner, H., “Sparse Bayesian Modelling of Underreported Count Data, Statistical Modelling, 16, 24-46 (2016) · Zbl 07259008
[15] Forster, J. J.; Gill, R. C.; Overstall, A. M., “Reversible Jump Methods for Generalised Linear Models and Generalised Linear Mixed Models, Statistical Computing, 22, 107-120 (2012) · Zbl 1322.62195
[16] Frühwirth-Schnatter, S.; Frühwirth, R.; Held, L.; Rue, H., “Improved Auxiliary Mixture Sampling for Hierarchical Models of Non-gaussian Data, Statistical Computing, 19, 479-492 (2009)
[17] Gehman, L. T.; Stoilov, P.; Maguire, J.; Damianov, A.; Lin, C.-H.; Shiue, L.; Ares., M. Jr; Mody, I.; Black, D. L., “The Splicing Regulator RBFOX1 (A2BP1) Controls Neuronal Excitation in the Mammalian Brain, Nature Genetics, 43, 706 (2011)
[18] George, E. I.; McCulloch, R. E., “Approaches for Bayesian Variable Selection, Statistica Sinica, 7, 339-373 (1997) · Zbl 0884.62031
[19] Gneiting, T.; Raftery, A. E., “Strictly Proper Scoring Rules, Prediction, and Estimation, Journal of American Statistical Association, 102, 359-378 (2007) · Zbl 1284.62093
[20] Green, P. J., “Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination, Biometrika, 82, 711-732 (1995) · Zbl 0861.62023
[21] Guan, Y.; Stephens, M., “Bayesian Variable Selection Regression for Genome-wide Association Studies and Other Large-scale Problems, Annals of Applied Statistics, 5, 1780-1815 (2011) · Zbl 1229.62145
[22] Hans, C.; Dobra, A.; West, M., “Shotgun Stochastic Search for “large p” Regression, Journal of the American Statistical Association, 102, 507-516 (2007) · Zbl 1134.62398
[23] Hoff, P. D., “Extending the Rank Likelihood for Semiparametric Copula Estimation, Annals of Applied Statistics, 1, 265-283 (2007) · Zbl 1129.62050
[24] Holmes, C.; Denison, D. T.; Mallick, B., “Accounting for Model Uncertainty in Seemingly Unrelated Regressions, Journal of Computational and Graphical Statistics, 11, 533-551 (2002)
[25] Holmes, C. C.; Held, L., “Bayesian Auxiliary Variable Models for Binary and Multinomial Regression, Bayesian Analysis, 1, 145-168 (2006) · Zbl 1331.62142
[26] Johnson, M. R.; Shkura, K.; Langley, S. R.; Delahaye-Duriez, A.; Srivastava, P.; David Hill, W., “Systems Genetics Identifies a Convergent Gene Network for Cognition and Neurodevelopmental Disease, Nature Neuroscience, 19, 223-232 (2016)
[27] Kohn, R.; Smith, M.; Chan, D., “Nonparametric Regression Using Linear Combinations of Basis Functions, Statistical Computation, 11, 313-322 (2001)
[28] Lamnisos, D.; Griffin, J. E.; Steel, M. F., “Transdimensional Sampling Algorithms for Bayesian Variable Selection in Classification Problems with Many More Variables Than Observations, Journal of Computational and Graphical Statistics, 18, 592-612 (2009)
[29] Lauritzen, S. L., Graphical Models (1996), Oxford: Clarendon Press, Oxford · Zbl 0907.62001
[30] Lenkoski, A., “A Direct Sampler for G-Wishart Variates, Statistics, 2, 119-128 (2013)
[31] Liang, F.; Paulo, R.; Molina, G.; Clyde, M. A.; Berger, J. O., “Mixtures of g-Priors for Bayesian Variable Selection, Journal of American Statistical Association, 103, 410-423 (2008) · Zbl 1335.62026
[32] Liu, J. S.; Wu, Y. N., “Parameter Expansion for Data Augmentation, Journal of American Statistical Association, 94, 1264-1274 (1999) · Zbl 1069.62514
[33] McCullagh, P., “Regression Models for Ordinal Data, Journal of Royal Statistical Society, Series B, 42, 109-142 (1980) · Zbl 0483.62056
[34] McCullagh, P.; Nelder, J. A., Generalized Linear Models (1989), Boca Raton, FL: Chapman and Hall/CRC, Boca Raton, FL · Zbl 0744.62098
[35] Mohammadi, A.; Wit, E. C., “BDgraph: An R Package for Bayesian Structure Learning in Graphical Models, Journal of Statistical Software, 89, 1-30 (2019)
[36] Murray, J. S.; Dunson, D. B.; Carin, L.; Lucas, J. E., “Bayesian Gaussian Copula Factor Models for Mixed Data, Journal of American Statistical Association, 108, 656-665 (2013) · Zbl 06195968
[37] Oliva, C. A.; Vargas, J. Y.; Inestrosa, N. C., “Wnts in Adult Brain: From Synaptic Plasticity to Cognitive Deficiencies, Frontiers in Cellular Neuroscience, 7, 224 (2013)
[38] Pitt, M.; Chan, D.; Kohn, R., “Efficient Bayesian Inference for Gaussian Copula Regression Models, Biometrika, 93, 537-554 (2006) · Zbl 1108.62027
[39] Richardson, S.; Bottolo, L.; Rosenthal, J. S.; Bernardo, J. M.; Bayarri, M. J.; Berger, J. O.; Dawid, A. P.; Heckerman, D.; Smith, A. F. M.; West, M., Bayesian Statistics, 9, “Bayesian Models for Sparse Regression Analysis of High Dimensional Data,”, 539-568 (2010), NY: Oxford University Press, NY
[40] Ročková, V.; George, E. I., “The Spike-and-slab Lasso, Journal of American Statistical Association, 113, 431-444 (2018) · Zbl 1398.62186
[41] Rothman, A. J.; Levina, E.; Zhu, J., “Sparse Multivariate Regression with Covariance Estimation, Journal of Computational Graphics and Statistics, 19, 947-962 (2010)
[42] Ruffieux, H.; Davison, A. C.; Hager, J.; Inshaw, J.; Fairfax, B. P.; Richardson, S.; Bottolo, L., “A Global-local Approach for Detecting Hotspots in Multiple-response Regression, Annals of Applied Statistics, 14, 905-928 (2020) · Zbl 1446.62288
[43] Schon, K.; van Os, N. J.; Oscroft, N.; Baxendale, H., “Genotype, Extrapyramidal Features, and Severity of Variant Ataxia-telangiectasia, Annals of Neurology, 85, 170-180 (2019)
[44] Sklar, M., “Fonctions de Répartition à n Dimensions et leurs Marges, Publications de l’Institut de statistique de l’Université de Paris 10, 8, 229-231 (1959) · Zbl 0100.14202
[45] Song, P. X.-K., “Multivariate Dispersion Models Generated from Gaussian Copula, Scandinavian Journal of Statistics, 27, 305-320 (2000) · Zbl 0955.62054
[46] Song, P. X.-K.; Li, M.; Yuan, Y., “Joint Regression Analysis of Correlated Data Using Gaussian Copulas, Biometrics, 65, 60-68 (2009) · Zbl 1159.62049
[47] Talhouk, A.; Doucet, A.; Murphy, K., “Efficient Bayesian Inference for Multivariate Probit Models with Sparse Inverse Correlation Matrices, Journal of Computational and Graphical Statistics, 21, 739-757 (2012)
[48] Thiel, C. M.; Özyurt, J.; Nogueira, W.; Puschmann, S., “Effects of Age on Long Term Memory for Degraded Speech, Frontiers in Human Neuroscience, 10, 473 (2016)
[49] Van Dyk, D. A.; Meng, X.-L., “The Art of Data Augmentation, Journal of Computational and Graphical Statistics, 10, 1-50 (2001)
[50] Vehtari, A., Gabry, J., Yao, Y., and Gelman, A. (2018), “LOO: Efficient Leave-one-out Cross-validation and WAIC for Bayesian Models. R package version 2.0.0. · Zbl 06737720
[51] Vehtari, A.; Gelman, A.; Gabry, J., “Practical Bayesian Model Evaluation Using Leave-one-out Cross-validation and WAIC, Statistical Computing, 27, 1413-1432 (2017) · Zbl 06737719
[52] Wang, H., “Sparse Seemingly Unrelated Regression Modelling: Applications in Finance and Econometrics, Computational Statistics and Data Analysis, 54, 2866-2877 (2010) · Zbl 1284.91461
[53] Wang, H.; Li, S. Z., “Efficient Gaussian Graphical Model Determination Under G-Wishart Prior Distributions, Electronic Journal of Statistics, 6, 168-198 (2012) · Zbl 1335.62069
[54] Webb, E. L.; Forster, J. J., “Bayesian Model Determination for Multivariate Ordinal and Binary Data, Computational Statistics and Data Analysis, 52, 2632-2649 (2008) · Zbl 1452.62232
[55] Yu, Y.; Meng, X.-L., “To Center or Not to Center: That is Not the Question-An Ancillarity-Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Efficiency, Journal of Computational and Graphical Statistics, 20, 531-570 (2011)
[56] Zellner, A., “An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for Aggregation Bias, Journal of American Statistical Association, 57, 348-368 (1962) · Zbl 0113.34902
[57] Zhang, X.; Boscardin, W. J.; Belin, T. R.; Wan, X.; He, Y.; Zhang, K., “A Bayesian Method for Analyzing Combinations of Continuous, Ordinal, and Nominal Categorical Data with Missing Values, Journal of Multivariate Analysis, 135, 43-58 (2015) · Zbl 1329.62038
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.