zbMATH — the first resource for mathematics

Variational inference for generalized linear mixed models using partially noncentered parametrizations. (English) Zbl 1331.62167
Summary: The effects of different parametrizations on the convergence of Bayesian computational algorithms for hierarchical models are well explored. Techniques such as centering, noncentering and partial noncentering can be used to accelerate convergence in MCMC and EM algorithms but are still not well studied for variational Bayes (VB) methods. As a fast deterministic approach to posterior approximation, VB is attracting increasing interest due to its suitability for large high-dimensional data. Use of different parametrizations for VB has not only computational but also statistical implications, as different parametrizations are associated with different factorized posterior approximations. We examine the use of partially noncentered parametrizations in VB for generalized linear mixed models (GLMMs). Our paper makes four contributions. First, we show how to implement an algorithm called nonconjugate variational message passing for GLMMs. Second, we show that the partially noncentered parametrization can adapt to the quantity of information in the data and determine a parametrization close to optimal. Third, we show that partial noncentering can accelerate convergence and produce more accurate posterior approximations than centering or noncentering. Finally, we demonstrate how the variational lower bound, produced as part of the computation, can be useful for model selection.

62F15 Bayesian inference
62J12 Generalized linear models (logistic models)
Full Text: DOI Euclid
[1] Attias, H. (1999). Inferring parameters and structure of latent variable models by variational Bayes. In Proceedings of the 15 th Conference on Uncertainty in Artificial Intelligence 21-30. Morgan Kaufmann, San Francisco, CA.
[2] Attias, H. (2000). A variational Bayesian framework for graphical models. In Advances in Neural Information Processing Systems 12 209-215. MIT Press, Cambridge, MA.
[3] Bishop, C. M. (2006). Pattern Recognition and Machine Learning . Springer, New York. · Zbl 1107.68072
[4] Blocker, A. W. (2011). Fast Rcpp implementation of Gauss-Hermite quadrature. R package “fastGHQuad” version 0.1-1. Available at .
[5] Braun, M. and McAuliffe, J. (2010). Variational inference for large-scale models of discrete choice. J. Amer. Statist. Assoc. 105 324-335. · Zbl 1397.62103
[6] Breslow, N. E. and Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. J. Amer. Statist. Assoc. 88 9-25. · Zbl 0775.62195
[7] Brown, P. and Zhou, L. (2010). MCMC for generalized linear mixed models with glmmBUGS. The R Journal 2 13-16.
[8] Browne, W. J. and Draper, D. (2006). A comparison of Bayesian and likelihood-based methods for fitting multilevel models. Bayesian Anal. 1 473-513 (electronic). · Zbl 1331.62125
[9] Cai, B. and Dunson, D. B. (2008). Bayesian variable selection in generalized linear mixed models. In Random Effect and Latent Variable Model Selection. Lecture Notes in Statistics 192 63-91. Springer, New York.
[10] Christensen, O. F., Roberts, G. O. and Sköld, M. (2006). Robust Markov chain Monte Carlo methods for spatial generalized linear mixed models. J. Comput. Graph. Statist. 15 1-17.
[11] Corduneanu, A. and Bishop, C. M. (2001). Variational Bayesian model selection for mixture distributions. In Artificial Intelligence and Statistics 27-34. Morgan Kaufmann, San Francisco, CA.
[12] De Backer, M., De Vroey, C., Lesaffre, E., Scheys, I. and De Keyser, P. (1998). Twelve weeks of continuous oral therapy for toenail onychomycosis caused by dermatophytes: A double-blind comparative trial of terbinafine 250 mg/day versus itraconazole 200 mg/day. Journal of the American Academy of Dermatology 38 57-63.
[13] Fitzmaurice, G. and Laird, N. (1993). A likelihood-based method for analysing longitudinal binary responses. Biometrika 80 141-151. · Zbl 0775.62296
[14] Fong, Y., Rue, H. and Wakefield, J. (2010). Bayesian inference for generalised linear mixed models. Biostatistics 11 397-412.
[15] Gelfand, A. E., Sahu, S. K. and Carlin, B. P. (1995). Efficient parameterisations for normal linear mixed models. Biometrika 82 479-488. · Zbl 0832.62064
[16] Gelfand, A. E., Sahu, S. K. and Carlin, B. P. (1996). Efficient parametrizations for generalized linear mixed models. In Bayesian Statistics 5 ( Alicante , 1994) 165-180. Oxford Univ. Press, New York.
[17] Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004). Bayesian Data Analysis , 2nd ed. Chapman & Hall/CRC, Boca Raton, FL. · Zbl 1039.62018
[18] Ghahramani, Z. and Beal, M. J. (2001). Propagation algorithms for variational Bayesian learning. In Advances in Neural Information Processing Systems 13 507-513. MIT Press, Cambridge, MA.
[19] Hoffman, M. D., Blei, D. M., Wang, C. and Paisley, J. (2012). Stochastic variational inference. Available at . arXiv:1206.7051 · Zbl 1317.68163
[20] Jaakkola, T. S. and Jordan, M. I. (2000). Bayesian parameter estimation via variational methods. Statist. Comput. 10 25-37.
[21] Kass, R. E. and Natarajan, R. (2006). A default conjugate prior for variance components in generalized linear mixed models (comment on article by Browne and Draper). Bayesian Anal. 1 535-542 (electronic). · Zbl 1331.62148
[22] Knowles, D. A. and Minka, T. P. (2011). Non-conjugate variational message passing for multinomial and binary regression. In Advances in Neural Information Processing Systems 24 1701-1709. Available at .
[23] Liu, Q. and Pierce, D. A. (1994). A note on Gauss-Hermite quadrature. Biometrika 81 624-629. · Zbl 0813.65053
[24] Liu, J. S. and Wu, Y. N. (1999). Parameter expansion for data augmentation. J. Amer. Statist. Assoc. 94 1264-1274. · Zbl 1069.62514
[25] Lunn, D. J., Thomas, A., Best, N. and Spiegelhalter, D. (2000). WinBUGS-A Bayesian modelling framework: Concepts, structure, and extensibility. Statist. Comput. 10 325-337.
[26] Magnus, J. R. and Neudecker, H. (1988). Matrix Differential Calculus with Applications in Statistics and Econometrics . Wiley, Chichester. · Zbl 0651.15001
[27] Meng, X.-L. and van Dyk, D. (1997). The EM algorithm-An old folk-song sung to a fast new tune (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 59 511-567. · Zbl 1090.62518
[28] Meng, X.-L. and van Dyk, D. A. (1999). Seeking efficient data augmentation schemes via conditional and marginal augmentation. Biometrika 86 301-320. · Zbl 1054.62505
[29] O’Hagan, A. and Forster, J. (2004). Kendall’s Advanced Theory of Statistics V. 2 B : Bayesian Inference , 2nd ed. Arnold, London. · Zbl 1058.62002
[30] Ormerod, J. T. and Wand, M. P. (2010). Explaining variational approximations. Amer. Statist. 64 140-153. · Zbl 1200.65007
[31] Ormerod, J. T. and Wand, M. P. (2012). Gaussian variational approximate inference for generalized linear mixed models. J. Comput. Graph. Statist. 21 2-17.
[32] Overstall, A. M. and Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Comput. Statist. Data Anal. 54 3269-3288. · Zbl 1284.62462
[33] Papaspiliopoulos, O., Roberts, G. O. and Sköld, M. (2003). Non-centered parameterizations for hierarchical models and data augmentation. In Bayesian Statistics 7 ( Tenerife , 2002) 307-326. Oxford Univ. Press, New York.
[34] Papaspiliopoulos, O., Roberts, G. O. and Sköld, M. (2007). A general framework for the parametrization of hierarchical models. Statist. Sci. 22 59-73. · Zbl 1246.62195
[35] Qi, Y. and Jaakkola, T. S. (2006). Parameter expanded variational Bayesian methods. In Advances in Neural Information Processing Systems 19 1097-1104. MIT Press, Cambridge, MA.
[36] Raudenbush, S. W., Yang, M.-L. and Yosef, M. (2000). Maximum likelihood for generalized linear models with nested random effects via high-order, multivariate Laplace approximation. J. Comput. Graph. Statist. 9 141-157.
[37] Rijmen, F. and Vomlel, J. (2008). Assessing the performance of variational methods for mixed logistic regression models. J. Stat. Comput. Simul. 78 765-779. · Zbl 1145.62052
[38] Roos, M. and Held, L. (2011). Sensitivity analysis in Bayesian generalized linear mixed models for binary data. Bayesian Anal. 6 259-278. · Zbl 1330.62150
[39] Roulin, A. and Bersier, L. F. (2007). Nestling barn owls beg more intensely in the presence of their mother than in the presence of their father. Animal Behaviour 74 1099-1106.
[40] Saul, L. K. andJordan (1998). A mean field learning algorithm for unsupervised neural networks. In Learning in Graphical Models 541-554. Kluwer Academic, Dordrecht. · Zbl 0910.68180
[41] Sturtz, S., Ligges, U. and Gelman, A. (2005). R2WinBUGS: A package for running WinBUGS from R. Journal of Statistical Software 12 1-16.
[42] Tan, S. L. and Nott, D. J. (2013). Variational approximation for mixtures of linear mixed models. J. Comput. Graph. Statist.
[43] Thall, P. F. and Vail, S. C. (1990). Some covariance models for longitudinal count data with overdispersion. Biometrics 46 657-671. · Zbl 0712.62048
[44] Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics with S , 4th ed. Springer, New York. · Zbl 1006.62003
[45] Wand, M. P. (2013). Fully simplified multivariate normal updates in non-conjugate variational message passing. Unpublished manuscript. Available at . · Zbl 1319.62066
[46] Winn, J. and Bishop, C. M. (2005). Variational message passing. J. Mach. Learn. Res. 6 661-694. · Zbl 1222.68332
[47] Yu, Y. and Meng, X.-L. (2011). To center or not to center: That is not the question-An ancillarity-sufficiency interweaving strategy (ASIS) for boosting MCMC efficiency. J. Comput. Graph. Statist. 20 531-570.
[48] Yu, D. and Yau, K. K. W. (2012). Conditional Akaike information criterion for generalized linear mixed models. Comput. Statist. Data Anal. 56 629-644. · Zbl 1239.62087
[49] Zhao, Y., Staudenmayer, J., Coull, B. A. and Wand, M. P. (2006). General design Bayesian generalized linear mixed models. Statist. Sci. 21 35-51. · Zbl 1129.62063
[50] Zuur, A. F., Ieno, E. N., Walker, N. J., Saveliev, A. A. and Smith, G. M. (2009). Mixed Effects Models and Extensions in Ecology with R . Springer, New York. · Zbl 1284.62024
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.