×

Sparse linear mixed model selection via streamlined variational Bayes. (English) Zbl 07603106

Summary: Linear mixed models are a versatile statistical tool to study data by accounting for fixed effects and random effects from multiple sources of variability. In many situations, a large number of candidate fixed effects is available and it is of interest to select a parsimonious subset of those being effectively relevant for predicting the response variable. Variational approximations facilitate fast approximate Bayesian inference for the parameters of a variety of statistical models, including linear mixed models. However, for models having a high number of fixed or random effects, simple application of standard variational inference principles does not lead to fast approximate inference algorithms, due to the size of model design matrices and inefficient treatment of sparse matrix problems arising from the required approximating density parameters updates.
We illustrate how recently developed streamlined variational inference procedures can be generalized to make fast and accurate inference for the parameters of linear mixed models with nested random effects and global-local priors for Bayesian fixed effects selection. Our variational inference algorithms achieve convergence to the same optima of their standard implementations, although with significantly lower computational effort, memory usage and time, especially for large numbers of random effects. Using simulated and real data examples, we assess the quality of automated procedures for fixed effects selection that are free from hyperparameters tuning and only rely upon variational posterior approximations. Moreover, we show high accuracy of variational approximations against model fitting via Markov Chain Monte Carlo sampling.

MSC:

62F15 Bayesian inference
62H12 Estimation in multivariate analysis
62J05 Linear regression; mixed models
62J07 Ridge regression; shrinkage estimators (Lasso)
PDFBibTeX XMLCite
Full Text: DOI arXiv Link

References:

[1] ARMAGAN, A. and DUNSON, D. B. (2011). Sparse variational analysis of linear mixed models for large data sets. Statistics & Probability Letters 81 1056-1062. · Zbl 1219.62045
[2] ARMAGAN, A., DUNSON, D. B. and LEE, J. (2013). Generalized double Pareto shrinkage. Statistica Sinica 23 119-143. · Zbl 1259.62061
[3] BALTAGI, B. H. (2021). Econometric Analysis of Panel Data, Sixth ed. Springer Cham. · Zbl 1466.62002
[4] BARBIERI, M. M. and BERGER, J. O. (2004). Optimal predictive model selection. The Annals of Statistics 32 870-897. · Zbl 1092.62033
[5] BHADRA, A., DATTA, J., POLSON, N. G. and WILLARD, B. (2017). The horseshoe+ estimator of ultra-sparse signals. Bayesian Analysis 12 1105-1131. · Zbl 1384.62079
[6] BHADRA, A., DATTA, J., POLSON, N. G. and WILLARD, B. (2019). Lasso meets horseshoe: a survey. Statistical Science 34 405-427. · Zbl 1429.62308
[7] BHATTACHARYA, A., CHAKRABORTY, A. and MALLICK, B. K. (2016). Fast sampling with Gaussian scale mixture priors in high-dimensional regression. Biometrika 103 985-991.
[8] BHATTACHARYA, A., PATI, D., PILLAI, N. S. and DUNSON, D. B. (2015). Dirichlet-Laplace priors for optimal shrinkage. Journal of the American Statistical Association 110 1479-1490. · Zbl 1373.62368
[9] Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York. · Zbl 1107.68072
[10] BLEI, D. M., KUCUKELBIR, A. and MCAULIFFE, J. D. (2017). Variational inference: a review for statisticians. Journal of the American Statistical Association 112 859-877.
[11] BOGDAN, M. G., CHAKRABARTI, A., FROMMLET, F. and GHOSH, J. K. (2011). Asymptotic Bayes-optimality under sparsity of some multiple testing procedures. The Annals of Statistics 39 1551-1579. · Zbl 1221.62012
[12] BONDELL, H. D. and REICH, B. J. (2012). Consistent high-dimensional Bayesian variable selection via penalized credible regions. Journal of the American Statistical Association 107 1610-1624. · Zbl 1258.62026
[13] BOYD, S. and VANDENBERGHE, L. (2004). Convex Optimization. Cambridge University Press, Cambridge. · Zbl 1058.90049
[14] BROWN, H. and PRESCOTT, R. (2015). Applied Mixed Models in Medicine, Third ed. John Wiley & Sons. · Zbl 1304.92002
[15] BÜRKNER, P.-C. (2018). Advanced Bayesian Multilevel Modeling with the R Package brms. The R Journal 10 395-411.
[16] CARBONETTO, P. and STEPHENS, M. (2012). Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Analysis 7 73-107. · Zbl 1330.62089
[17] CARPENTER, B., GELMAN, A., HOFFMAN, M. D., LEE, D., GOODRICH, B., BETANCOURT, M., BRUBAKER, M., GUO, J., LI, P. and RIDDELL, A. (2017). Stan: A probabilistic programming language. Journal of Statistical Software 76 1-32.
[18] CARVALHO, C. M., POLSON, N. G. and SCOTT, J. G. (2009). Handling Sparsity via the Horseshoe. In Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research 5 73-80. PMLR, Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA.
[19] Carvalho, C. M., Polson, N. G. and Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika 97 465-480. · Zbl 1406.62021
[20] CHEN, Z. and DUNSON, D. B. (2003). Random effects selection in linear mixed models. Biometrics. Journal of the International Biometric Society 59 762-769. · Zbl 1214.62027
[21] DEGANI, E., MAESTRINI, L., TOCZYDŁOWSKA, D. and WAND, M. P. (2022). Supplement to “Sparse linear mixed model selection via streamlined variational Bayes”. DOI: 10.1214/22-EJS2063SUPP.
[22] EDDELBUETTEL, D. and SANDERSON, C. (2014). RcppArmadillo: Accelerating R with high-performance C++ linear algebra. Computational Statistics and Data Analysis 71 1054-1063. · Zbl 1471.62055
[23] EFRON, B. (2008). Microarrays, empirical Bayes and the two-groups model. Statistical Science 23 1-22. · Zbl 1327.62046
[24] FAES, C., ORMEROD, J. T. and WAND, M. P. (2011). Variational Bayesian inference for parametric and nonparametric regression with missing data. Journal of the American Statistical Association 106 959-971. · Zbl 1229.62028
[25] FAN, Y. and LI, R. (2012). Variable selection in linear mixed effects models. The Annals of Statistics 40 2043-2068. · Zbl 1257.62077
[26] FITZMAURICE, G., DAVIDIAN, M., VERBEKE, G. and MOLENBERGHS, G. (2008). Longitudinal Data Analysis. CRC Press.
[27] Frank, I. E. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools. Technometrics 35 109-135. · Zbl 0775.62288
[28] GELMAN, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Analysis 1 515-533. · Zbl 1331.62139
[29] GEORGE, E. I. and MCCULLOCH, R. E. (1997). Approaches for Bayesian variable selection. Statistica Sinica 7 339-374.
[30] GOLDSTEIN, H. (2010). Multilevel Statistical Models, Fourth ed. John Wiley & Sons Inc.
[31] GRIFFIN, J. E. and BROWN, P. J. (2010). Inference with normal-gamma prior distributions in regression problems. Bayesian Analysis 5 171-188. · Zbl 1330.62128
[32] GRIFFIN, J. E. and BROWN, P. J. (2011). Bayesian hyper-lassos with non-convex penalization. Australian & New Zealand Journal of Statistics 53 423-442. · Zbl 1335.62047
[33] GROLL, A. and TUTZ, G. (2014). Variable selection for generalized linear mixed models by \[{L_1}\]-penalized estimation. Statistics and Computing 24 137-154. · Zbl 1325.62139
[34] HAHN, P. R. and CARVALHO, C. M. (2015). Decoupling shrinkage and selection in Bayesian linear models: a posterior summary perspective. Journal of the American Statistical Association 110 435-448. · Zbl 1373.62036
[35] Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12 55-67. · Zbl 0202.17205
[36] HUANG, A. and WAND, M. P. (2013). Simple marginally noninformative prior distributions for covariance matrices. Bayesian Analysis 8 439-451. · Zbl 1329.62135
[37] HUGHES, D. M., GARCÍA-FIÑANA, M. and WAND, M. P. (2021). Fast approximate inference for multivariate longitudinal data. Biostatistics. (Year and page numbers pending.).
[38] HUI, F. K. C., MÜLLER, S. and WELSH, A. H. (2017). Joint selection in mixed models using regularized PQL. Journal of the American Statistical Association 112 1323-1333.
[39] ISHWARAN, H. and RAO, J. S. (2005). Spike and slab variable selection: frequentist and Bayesian strategies. The Annals of Statistics 33 730-773. · Zbl 1068.62079
[40] JOHNSTONE, I. M. and SILVERMAN, B. W. (2005). Empirical Bayes selection of wavelet thresholds. The Annals of Statistics 33 1700-1752. · Zbl 1078.62005
[41] KINNEY, S. K. and DUNSON, D. B. (2007). Fixed and random effects selection in linear and logistic models. Biometrics. Journal of the International Biometric Society 63 690-698. · Zbl 1147.62022
[42] KLEBANOFF, M. A. (2009). The Collaborative Perinatal Project: a 50-year retrospective. Paediatric and Perinatal Epidemiology 23 2-8.
[43] KORTE, A., VILHJÁLMSSON, B. J., SEGURA, V., PLATT, A., LONG, Q. and NORDBORG, M. (2012). A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nature Genetics 44 1066-1071.
[44] LEE, C. Y. Y. and WAND, M. P. (2016). Streamlined mean field variational Bayes for longitudinal and multilevel data analysis. Biometrical Journal 58 868-895. · Zbl 1386.62005
[45] LENG, C., TRAN, M.-N. and NOTT, D. (2014). Bayesian adaptive Lasso. Annals of the Institute of Statistical Mathematics 66 221-244. · Zbl 1334.62130
[46] LI, H. and PATI, D. (2017). Variable selection using shrinkage priors. Computational Statistics & Data Analysis 107 107-119. · Zbl 1466.62135
[47] LI, J., WANG, Z., LI, R. and WU, R. (2015). Bayesian group Lasso for nonparametric varying-coefficient models with application to functional genome-wide association studies. The Annals of Applied Statistics 9 640-664. · Zbl 1397.62260
[48] LI, Y., WANG, S., SONG, P. X. K., WANG, N., ZHOU, L. and ZHU, J. (2018). Doubly regularized estimation and selection in linear mixed-effects models for high-dimensional longitudinal data. Statistics and its Interface 11 721-737. · Zbl 06944680
[49] LINDNER, C. C. and RODGER, C. A. (2008). Design Theory, second ed. Discrete Mathematics and its Applications. Chapman and Hall/CRC, Boca Raton, FL.
[50] LUTS, J., BRODERICK, T. and WAND, M. P. (2014). Real-time semiparametric regression. Journal of Computational and Graphical Statistics 23 589-615.
[51] MAESTRINI, L. (2018). On Variational Approximations for Frequentist and Bayesian Inference, PhD thesis, Università degli Studi di Padova, Italy.
[52] MAESTRINI, L. and WAND, M. P. (2018). Variational message passing for skew \(t\) regression. Stat 7 e196, 11.
[53] MAESTRINI, L. and WAND, M. P. (2021). The Inverse G-Wishart distribution and variational message passing. Australian & New Zealand Journal of Statistics 63 517-541.
[54] MCLEAN, M. W. and WAND, M. P. (2019). Variational message passing for elaborate response regression models. Bayesian Analysis 14 371-398. · Zbl 1416.62221
[55] MENICTAS, M., CREDICO, G. D. and WAND, M. P. (2022). Streamlined variational inference for linear mixed models with crossed random effects. Journal of Computational and Graphical Statistics. (volume and page numbers pending).
[56] MENICTAS, M., NOLAN, T. H., SIMPSON, D. G. and WAND, M. P. (2021). Streamlined variational inference for higher level group-specific curve models. Statistical Modelling 21 479-519. · Zbl 07506791
[57] MINKA, T. P., WINN, J. M., GUIVER, J. P., ZAYKOV, Y., FABIAN, D. and BRONSKILL, J. (2018). Infer.NET 0.3. Microsoft Research Cambridge. http://dotnet.github.io/infer.
[58] MITCHELL, T. J. and BEAUCHAMP, J. J. (1988). Bayesian variable selection in linear regression. Journal of the American Statistical Association 83 1023-1036. · Zbl 0673.62051
[59] NEVILLE, S. E., ORMEROD, J. T. and WAND, M. P. (2014). Mean field variational Bayes for continuous sparse signal shrinkage: pitfalls and remedies. Electronic Journal of Statistics 8 1113-1151. · Zbl 1298.62050
[60] NOLAN, T. H., MENICTAS, M. and WAND, M. P. (2020). Streamlined computing for variational inference with higher level random effects. Journal of Machine Learning Research (JMLR) 21 Paper No. 157, 62. · Zbl 07306862
[61] NOLAN, T. H. and WAND, M. P. (2017). Accurate logistic variational message passing: algebraic and numerical details. Stat 6 102-112.
[62] NOLAN, T. H. and WAND, M. P. (2020). Streamlined solutions to multilevel sparse matrix problems. The Australian & New Zealand Industrial and Applied Mathematics Journal 62 18-41. · Zbl 1450.65030
[63] O’HARA, R. B. and SILLANPÄÄ, M. J. (2009). A review of Bayesian variable selection methods: what, how and which. Bayesian Analysis 4 85-117. · Zbl 1330.62291
[64] WORLD HEALTH ORGANIZATION (2006). WHO child growth standards: length/height-for-age, weight-for-age, weight-for-length, weight-for-height and body mass index-for-age: methods and development. Available at https://apps.who.int/iris/handle/10665/43413.
[65] ORMEROD, J. T. and WAND, M. P. (2010). Explaining variational approximations. The American Statistician 64 140-153. · Zbl 1200.65007
[66] ORMEROD, J. T., YOU, C. and MÜLLER, S. (2017). A variational Bayes approach to variable selection. Electronic Journal of Statistics 11 3549-3594. · Zbl 1384.62240
[67] PARK, T. and CASELLA, G. (2008). The Bayesian Lasso. Journal of the American Statistical Association 103 681-686. · Zbl 1330.62292
[68] PINHEIRO, J. C. and BATES, D. M. (2006). Mixed-Effects Models in S and S-Plus. Springer, New York.
[69] POLSON, N. G. and SCOTT, J. G. (2011). Shrink globally, act locally: sparse Bayesian regularization and prediction. In Bayesian Statistics, 9 501-538. Oxford Univ. Press, Oxford.
[70] RAO, J. N. K. and MOLINA, I. (2015). Small Area Estimation, second ed. Wiley Series in Survey Methodology. John Wiley & Sons, Inc., Hoboken, NJ.
[71] RAY, P. and BHATTACHARYA, A. (2018). Signal Adaptive Variable Selector for the Horseshoe Prior. arXiv preprint arXiv:1810.09004.
[72] Ruppert, D., Wand, M. P. and Carroll, R. J. (2003). Semiparametric regression. Cambridge series in statistical and probabilistic mathematics 12. Cambridge University Press, Cambridge. · Zbl 1038.62042
[73] SANDERSON, C. and CURTIN, R. (2016). Armadillo: a template-based C++ library for linear algebra. Journal of Open Source Software 1.
[74] SCHELLDORFER, J., BÜHLMANN, P. and VAN DE GEER, S. (2011). Estimation for high-dimensional linear mixed-effects models using \[{\ell_1}\]-penalization. Scandinavian Journal of Statistics. Theory and Applications 38 197-214. · Zbl 1246.62161
[75] SIKORSKA, K., RIVADENEIRA, F., GROENEN, P. J., HOFMAN, A., UITTERLINDEN, A. G., EILERS, P. H. and LESAFFRE, E. (2013). Fast linear mixed model computations for genome-wide association studies with longitudinal data. Statistics in Medicine 32 165-180.
[76] SNOW, G. (2020). TeachingDemos: Demonstrations for Teaching and Learning R package version 2.12. https://CRAN.R-project.org/package=TeachingDemos.
[77] TANG, X., GHOSH, M., XU, X. and GHOSH, P. (2018). Bayesian variable selection and estimation based on global-local shrinkage priors. Sankhya A. The Indian Journal of Statistics 80 215-246.
[78] TAYLOR, P. M. (1980). The First Year of Life: The Collaborative Perinatal Project of the National Institute of Neurological and Communicative Disorders and Stroke. Journal of the American Medical Association 244 1503-1503.
[79] R CORE TEAM (2020). R: A Language and Environment for Statistical Computing R Foundation for Statistical Computing, Vienna, Austria https://www.R-project.org/.
[80] TIBSHIRANI, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 58 267-288. · Zbl 0850.62538
[81] TUNG, D. T., TRAN, M.-N. and CUONG, T. M. (2019). Bayesian adaptive lasso with variational Bayes for variable selection in high-dimensional generalized linear mixed models. Communications in Statistics. Simulation and Computation 48 530-543. · Zbl 07551450
[82] VAN RIJSBERGEN, C. J. (2004). The Geometry of Information Retrieval. Cambridge University Press, Cambridge. · Zbl 1095.68030
[83] VERBEKE, G. and MOLENBERGHS, G. (2000). Linear Mixed Models for Longitudinal Data. Springer Series in Statistics. Springer-Verlag, New York. · Zbl 0956.62055
[84] VONESH, E. F. and CHINCHILLI, V. M. (1997). Linear and Nonlinear Models for the Analysis of Repeated Measurements. Statistics: Textbooks and Monographs 154. Marcel Dekker, Inc., New York. · Zbl 0893.62077
[85] WAND, M. P. (2020). KernSmooth: Functions for Kernel Smoothing Supporting Wand and Jones (1995) R package version 2.23-18. https://CRAN.R-project.org/package=KernSmooth.
[86] WAND, M. P. and JONES, M. C. (1995). Kernel Smoothing. Monographs on Statistics and Applied Probability 60. Chapman and Hall, Ltd., London. · Zbl 0854.62043
[87] WAND, M. P., ORMEROD, J. T., PADOAN, S. A. and FRÜHRWIRTH, R. (2011). Mean field variational Bayes for elaborate distributions. Bayesian Analysis 6 847-900. · Zbl 1330.62158
[88] WANG, S. S. J. and WAND, M. P. (2011). Statistical computing and graphics using Infer.NET for statistical analyses. The American Statistician 65 115-126. · Zbl 06244069
[89] YANG, M. (2013). Bayesian nonparametric centered random effects models with variable selection. Biometrical Journal 55 217-230. · Zbl 1441.62541
[90] YANG, M., WANG, M. and DONG, G. (2020). Bayesian variable selection for mixed effects model with shrinkage prior. Computational Statistics 35 227-243. · Zbl 1505.62430
[91] ZHANG, Y. and BONDELL, H. D. (2018). Variable selection via penalized credible regions with Dirichlet-Laplace global-local shrinkage priors. Bayesian Analysis 13 823-844. · Zbl 1407.62272
[92] ZHAO, Y., STAUDENMAYER, J., COULL, B. A. and WAND, M. P. (2006). General design Bayesian generalized linear mixed models. Statistical Science 21 35-51. · Zbl 1129.62063
[93] ZOU, H. (2006). The Adaptive Lasso and Its Oracle Properties. Journal of the American Statistical Association 101 1418-1429. · Zbl 1171.62326
[94] ZOU, H. and HASTIE, T. (2005). Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 67 301-320. · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.