Durante, Daniele; Rigon, Tommaso Conditionally conjugate mean-field variational Bayes for logistic models. (English) Zbl 1429.62318 Stat. Sci. 34, No. 3, 472-485 (2019). Summary: Variational Bayes (VB) is a common strategy for approximate Bayesian inference, but simple methods are only available for specific classes of models including, in particular, representations having conditionally conjugate constructions within an exponential family. Models with logit components are an apparently notable exception to this class, due to the absence of conjugacy among the logistic likelihood and the Gaussian priors for the coefficients in the linear predictor. To facilitate approximate inference within this widely used class of models, T. S. Jaakkola and M. I. Jordan [“Bayesian parameter estimation via variational methods”, Stat. Comput. 10, 25–37 (2000; doi:10.1023/A:1008932416310)] proposed a simple variational approach which relies on a family of tangent quadratic lower bounds of the logistic log-likelihood, thus restoring conjugacy between these approximate bounds and the Gaussian priors. This strategy is still implemented successfully, but few attempts have been made to formally understand the reasons underlying its excellent performance. Following a review on VB for logistic models, we cover this gap by providing a formal connection between the above bound and a recent Pólya-gamma data augmentation for logistic regression. Such a result places the computational methods associated with the aforementioned bounds within the framework of variational inference for conditionally conjugate exponential family models, thereby allowing recent advances for this class to be inherited also by the methods relying on [loc. cit.]. Cited in 9 Documents MSC: 62J12 Generalized linear models (logistic models) 62F15 Bayesian inference Keywords:em; logistic regression; Pólya-gamma data augmentation; quadratic approximation; variational Bayes; Bayesian inference Software:BayesLogit; PRMLT × Cite Format Result Cite Review PDF Full Text: DOI arXiv Euclid References: [1] Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9 1981-2014. · Zbl 1225.68143 [2] Beal, M. J. and Ghahramani, Z. (2003). The variational Bayesian EM algorithm for incomplete data: With application to scoring graphical model structures. In Bayesian Statistics, 7 (Tenerife, 2002) 453-463. Oxford Univ. Press, New York. [3] Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York. · Zbl 1107.68072 [4] Bishop, C. M. and Svensén, M. (2003). Bayesian hierarchical mixtures of experts. Proc. Conf. Uncertain. Artif. Intell. 57-64. [5] Blei, D. M., Kucukelbir, A. and McAuliffe, J. D. (2017). Variational inference: A review for statisticians. J. Amer. Statist. Assoc. 112 859-877. [6] Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent Dirichlet allocation. J. Mach. Learn. Res. 3 993-1022. · Zbl 1112.68379 [7] Böhning, D. and Lindsay, B. G. (1988). Monotonicity of quadratic-approximation algorithms. Ann. Inst. Statist. Math. 40 641-663. · Zbl 0723.65150 · doi:10.1007/BF00049423 [8] Braun, M. and McAuliffe, J. (2010). Variational inference for large-scale models of discrete choice. J. Amer. Statist. Assoc. 105 324-335. · Zbl 1397.62103 · doi:10.1198/jasa.2009.tm08030 [9] Browne, R. P. and McNicholas, P. D. (2015). Multivariate sharp quadratic bounds via \(\boldsymbol{\Sigma} \)-strong convexity and the Fenchel connection. Electron. J. Stat. 9 1913-1938. · Zbl 1336.62126 · doi:10.1214/15-EJS1061 [10] Carbonetto, P. and Stephens, M. (2012). Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Anal. 7 73-107. · Zbl 1330.62089 · doi:10.1214/12-BA703 [11] Choi, H. M. and Hobert, J. P. (2013). The Polya-gamma Gibbs sampler for Bayesian logistic regression is uniformly ergodic. Electron. J. Stat. 7 2054-2064. · Zbl 1349.60123 · doi:10.1214/13-EJS837 [12] de Leeuw, J. and Lange, K. (2009). Sharp quadratic majorization in one dimension. Comput. Statist. Data Anal. 53 2471-2484. · Zbl 1453.62078 · doi:10.1016/j.csda.2009.01.002 [13] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1-38. · Zbl 0364.62022 · doi:10.1111/j.2517-6161.1977.tb01600.x [14] Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85 398-409. · Zbl 0702.62020 · doi:10.1080/01621459.1990.10476213 [15] Giordano, R. J., Broderick, T. and Jordan, M. I. (2015). Linear response methods for accurate covariance estimates from mean field variational Bayes. Adv. Neural Inf. Process. Syst. 1441-1449. [16] Hoffman, M. D., Blei, D. M., Wang, C. and Paisley, J. (2013). Stochastic variational inference. J. Mach. Learn. Res. 14 1303-1347. · Zbl 1317.68163 [17] Hunter, D. R. and Lange, K. (2004). A tutorial on MM algorithms. Amer. Statist. 58 30-37. [18] Jaakkola, T. S. and Jordan, M. I. (2000). Bayesian parameter estimation via variational methods. Stat. Comput. 10 25-37. [19] Jordan, M. I., Ghahramani, Z., Jaakkola, T. S. and Saul, L. K. (1999). An introduction to variational methods for graphical models. Mach. Learn. 37 183-233. · Zbl 0945.68164 · doi:10.1023/A:1007665907178 [20] Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. Ann. Math. Stat. 22 79-86. · Zbl 0042.38403 · doi:10.1214/aoms/1177729694 [21] Lee, S., Huang, J. Z. and Hu, J. (2010). Sparse logistic principal components analysis for binary data. Ann. Appl. Stat. 4 1579-1601. · Zbl 1202.62084 · doi:10.1214/10-AOAS327 [22] McLachlan, G. J. and Krishnan, T. (1997). The EM Algorithm and Extensions. Wiley Series in Probability and Statistics: Applied Probability and Statistics. Wiley, New York. · Zbl 0882.62012 [23] Ormerod, J. T. and Wand, M. P. (2010). Explaining variational approximations. Amer. Statist. 64 140-153. · Zbl 1200.65007 · doi:10.1198/tast.2010.09058 [24] Polson, N. G., Scott, J. G. and Windle, J. (2013). Bayesian inference for logistic models using Pólya-Gamma latent variables. J. Amer. Statist. Assoc. 108 1339-1349. · Zbl 1283.62055 · doi:10.1080/01621459.2013.829001 [25] Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA. · Zbl 1177.68165 [26] Ren, L., Du, L., Carin, L. and Dunson, D. B. (2011). Logistic stick-breaking process. J. Mach. Learn. Res. 12 203-239. · Zbl 1280.62079 [27] Robbins, H. and Monro, S. (1951). A stochastic approximation method. Ann. Math. Stat. 22 400-407. · Zbl 0054.05901 · doi:10.1214/aoms/1177729586 [28] Scott, J. G. and Sun, L. (2013). Expectation-maximization for logistic regression. Available at arXiv:1306.0040. [29] Spall, J. C. (2003). Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control. Wiley-Interscience Series in Discrete Mathematics and Optimization. Wiley Interscience, Hoboken, NJ. · Zbl 1088.90002 [30] Tang, Y., Browne, R. P. and McNicholas, P. D. (2015). Model based clustering of high-dimensional binary data. Comput. Statist. Data Anal. 87 84-101. · Zbl 1468.62191 · doi:10.1016/j.csda.2014.12.009 [31] Wand, M. P. (2017). Fast approximate inference for arbitrarily large semiparametric regression models via message passing. J. Amer. Statist. Assoc. 112 137-156. [32] Wand, M. P., Ormerod, J. T., Padoan, S. A. and Frührwirth, R. (2011). Mean field variational Bayes for elaborate distributions. Bayesian Anal. 6 847-900. · Zbl 1330.62158 · doi:10.1214/11-BA631 [33] Wang, C. and Blei, D. M. (2013). Variational inference in nonconjugate models. J. Mach. Learn. Res. 14 1005-1031. · Zbl 1320.62057 [34] Wang, B. and Titterington, D. M. (2004). Convergence and asymptotic normality of variational Bayesian approximations for exponential family models with missing values. Proc. Conf. Uncertain. Artif. Intell. 577-584. [35] Zhu, L. (2012). New inequalities for hyperbolic functions and their applications. J. Inequal. Appl. 303 1-29. · Zbl 1279.26067 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.