×

Probabilistic programming with stochastic variational message passing. (English) Zbl 07581224

Summary: Stochastic approximation methods for variational inference have recently gained popularity in the probabilistic programming community since these methods are amenable to automation and allow online, scalable, and universal approximate Bayesian inference. Unfortunately, common Probabilistic Programming Languages (PPLs) with stochastic approximation engines lack the efficiency of message passing-based inference algorithms with deterministic update rules such as Belief Propagation (BP) and Variational Message Passing (VMP). Still, Stochastic Variational Inference (SVI) and Conjugate-Computation Variational Inference (CVI) provide principled methods to integrate fast deterministic inference techniques with broadly applicable stochastic approximate inference. Unfortunately, implementation of SVI and CVI necessitates manually driven variational update rules, which does not yet exist in most PPLs. In this paper, we cast SVI and CVI explicitly in a message passing-based inference context. We provide an implementation for SVI and CVI in ForneyLab, which is an automated message passing-based probabilistic programming package in the open source Julia language. Through a number of experiments, we demonstrate how SVI and CVI extends the automated inference capabilities of message passing-based probabilistic programming.

MSC:

68T37 Reasoning under uncertainty in the context of artificial intelligence
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] van de Meent, J. W.; Paige, B.; Yang, H.; Wood, F., An introduction to probabilistic programming (2018)
[2] Mohamed, S.; Rosca, M.; Figurnov, M.; Mnih, A., Monte Carlo gradient estimation in machine learning (2019)
[3] Zhang, C.; Butepage, J.; Kjellstrom, H.; Mandt, S., Advances in variational inference (2018)
[4] Ge, H.; Xu, K.; Ghahramani, Z., Turing: a language for flexible probabilistic inference, (International Conference on Artificial Intelligence and Statistics, PMLR (2018)), 1682-1690
[5] Carpenter, B.; Gelman, A.; Hoffman, M. D.; Lee, D.; Goodrich, B.; Betancourt, M.; Brubaker, M.; Guo, J.; Li, P.; Riddell, A., Stan: a probabilistic programming language, J. Stat. Softw., 76, 1 (2017)
[6] Bingham, E.; Chen, J. P.; Jankowiak, M.; Obermeyer, F.; Pradhan, N.; Karaletsos, T.; Singh, R.; Szerlip, P.; Horsfall, P.; Goodman, N. D., Pyro: deep universal probabilistic programming, J. Mach. Learn. Res., 20, 28, 1-6 (2019)
[7] Dillon, J. V.; Langmore, I.; Tran, D.; Brevdo, E.; Vasudevan, S.; Moore, D.; Patton, B.; Alemi, A.; Hoffman, M.; Saurous, R. A., TensorFlow distributions (2017)
[8] Titsias, M.; Lázaro-Gredilla, M., Doubly stochastic variational Bayes for non-conjugate inference, (International Conference on Machine Learning (2014)), 1971-1979
[9] Ranganath, R.; Gerrish, S.; Blei, D., Black box variational inference, (Artificial Intelligence and Statistics, PMLR (2014)), 814-822
[10] Casella, G.; Robert, C. P., Rao-blackwellisation of sampling schemes, Biometrika, 83, 1, 81-94 (1996), publisher: [Oxford University Press, Biometrika Trust] · Zbl 0866.62024
[11] Owen, A. B., Monte Carlo Theory, Methods and Examples (2013)
[12] Kingma, D. P.; Welling, M., Auto-encoding variational Bayes (2014)
[13] Rezende, D. J.; Mohamed, S.; Wierstra, D., Stochastic backpropagation and approximate inference in deep generative models, (Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32. Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, ICML’14, JMLR.org, Beijing, China (2014)), pp. II-1278-II-1286
[14] Kucukelbir, A.; Tran, D.; Ranganath, R.; Gelman, A.; Blei, D. M., Automatic differentiation variational inference, J. Mach. Learn. Res., 18, 1, 430-474 (2017), publisher: JMLR.org · Zbl 1437.62109
[15] Pearl, J., Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (1988), Morgan Kaufmann
[16] MacKay, D. J., Information Theory, Inference and Learning Algorithms (2003), Cambridge University Press · Zbl 1055.94001
[17] Minka, T. P., Expectation propagation for approximate Bayesian inference, (Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence (2001)), 362-369
[18] Vehtari, A.; Gelman, A.; Sivula, T.; Jylänki, P.; Tran, D.; Sahai, S.; Blomstedt, P.; Cunningham, J. P.; Schiminovich, D.; Robert, C. P., Expectation propagation as a way of life: a framework for Bayesian inference on partitioned data, J. Mach. Learn. Res., 21, 17, 1-53 (2020) · Zbl 1498.68287
[19] Winn, J.; Bishop, C. M., Variational message passing, J. Mach. Learn. Res., 6, Apr, 661-694 (2005) · Zbl 1222.68332
[20] Dauwels, J., On variational message passing on factor graphs, (IEEE International Symposium on Information Theory (2007)), 2546-2550
[21] Hoffman, M.; Blei, D. M.; Wang, C.; Paisley, J., Stochastic variational inference, J. Mach. Learn. Res., 14, 1, 1303-1347 (2013), publisher: JMLR.org · Zbl 1317.68163
[22] Khan, M.; Lin, W., Conjugate-computation variational inference: converting variational inference in non-conjugate models to inferences in conjugate models, (Artificial Intelligence and Statistics, PMLR (2017)), 878-887
[23] Amari, S., Natural gradient works efficiently in learning, Neural Comput., 10, 2, 251-276 (1998)
[24] Amari, S., Information Geometry and Its Applications, Applied Mathematical Sciences, vol. 194 (2016), Springer Japan: Springer Japan Tokyo · Zbl 1350.94001
[25] Minka, T.; Winn, J.; Guiver, J.; Zaykov, Y.; Fabian, D.; Bronskill, J., /Infer.NET 0.3 (2018)
[26] Bezanson, J.; Karpinski, S.; Shah, V. B.; Edelman, A., Julia: a fast dynamic language for technical computing (2012), arXiv preprint
[27] Bagaev, D.; de Vries, B., Reactive message passing for scalable Bayesian inference, type: article (Dec. 2021)
[28] Cox, M.; van de Laar, T.; de Vries, B., A factor graph approach to automated design of Bayesian signal processing algorithms, Int. J. Approx. Reason., 104, 185-204 (2019)
[29] Akbayrak, S.; Bocharov, I.; de Vries, B., Extended variational message passing for automated approximate Bayesian inference, Entropy, 23, 7, 815 (2021)
[30] Forney, G., Codes on graphs: normal realizations, IEEE Trans. Inf. Theory, 47, 2, 520-548 (2001), conference Name: IEEE Transactions on Information Theory · Zbl 0998.94021
[31] Loeliger, H. A.; Dauwels, J.; Hu, J.; Korl, S.; Ping, L.; Kschischang, F. R., The factor graph approach to model-based signal processing, Proc. IEEE, 95, 6, 1295-1322 (2007), publisher: IEEE
[32] Şenöz, İ.; van de Laar, T.; Bagaev, D.; de Vries, B., Variational message passing and local constraint manipulation in factor graphs, Entropy, 23, 807 (2021), number: 7 Publisher: Multidisciplinary Digital Publishing Institute
[33] Beal, M. J., Variational algorithms for approximate Bayesian inference (2003), UCL (University College London), PhD Thesis
[34] Wainwright, M. J.; Jordan, M. I., Graphical models, exponential families, and variational inference, Found. Trends Mach. Learn., 1, 1-2, 1-305 (2008) · Zbl 1193.62107
[35] Paquet, U.; Koenigstein, N., One-class collaborative filtering with random graphs, (Proceedings of the 22nd International Conference on World Wide Web - WWW’13 (2013), ACM Press: ACM Press Rio de Janeiro, Brazil), 999-1008
[36] Masegosa, A. R.; Martínez, A. M.; Langseth, H.; Nielsen, T. D.; Salmerón, A.; Ramos-López, D.; Madsen, A. L., d-VMP: distributed variational message passing, (Proceedings of the Eighth International Conference on Probabilistic Graphical Models, PMLR (2016)), 321-332, ISSN: 1938-7228
[37] Khan, M. E.; Rue, H., The Bayesian learning rule (2021)
[38] Robbins, H.; Monro, S., A stochastic approximation method, Ann. Math. Stat., 22, 3, 400-407 (1951), publisher: Institute of Mathematical Statistics · Zbl 0054.05901
[39] Paquet, U., On the convergence of stochastic variational inference in Bayesian networks, (NIPS Workshop on Variational Inference (2014))
[40] Knowles, D. A.; Minka, T., Non-conjugate variational message passing for multinomial and binary regression, (Advances in Neural Information Processing Systems (2011)), 1701-1709
[41] Cox, M.; De Vries, B., Robust expectation propagation in factor graphs involving both continuous and binary variables, (2018 26th European Signal Processing Conference (EUSIPCO) (2018), IEEE: IEEE Rome), 2583-2587
[42] Opper, M.; Archambeau, C., The variational Gaussian approximation revisited, Neural Comput., 21, 3, 786-792 (2009) · Zbl 1178.68450
[43] Baydin, A. G.; Pearlmutter, B. A.; Radul, A. A.; Siskind, J. M., Automatic differentiation in machine learning: a survey, J. Mach. Learn. Res., 18, 1, 5595-5637 (2017), publisher: JMLR.org
[44] Ye, L.; Beskos, A.; De Iorio, M.; Hao, J., Monte Carlo co-ordinate ascent variational inference, Stat. Comput., 1-19 (2020), Publisher: Springer · Zbl 1447.62030
[45] Barber, D., Bayesian Reasoning and Machine Learning (2012), Cambridge University Press · Zbl 1267.68001
[46] Fisher, R. A., The use of multiple measurements in taxonomic problems, Annu. Eugen., 7, 2, 179-188 (1936),_eprint:
[47] Anderson, E., The species problem in Iris, Ann. Mo. Bot. Gard., 23, 3, 457-509 (1936), publisher: Missouri Botanical Garden Press
[48] Bishop, C. M., Pattern Recognition and Machine Learning (2006), Springer · Zbl 1107.68072
[49] Carlin, B. P.; Gelfand, A. E.; Smith, A. F.M., Hierarchical Bayesian analysis of changepoint problems, J. R. Stat. Soc., Ser. C, Appl. Stat., 41, 2, 389-405 (1992), publisher: [Wiley, Royal Statistical Society] · Zbl 0825.62408
[50] Adams, R. P.; MacKay, D. J.C., Bayesian online changepoint detection (2007)
[51] Sutton, R. S.; Barto, A. G., Reinforcement Learning, An Introduction (2018), MIT Press, google-Books-ID: uWV0DwAAQBAJ · Zbl 1407.68009
[52] Cemgil, A. T., A Tutorial Introduction to Monte Carlo Methods, Markov Chain Monte Carlo and Particle Filtering, Academic Press Library in Signal Processing, vol. 1, 1065-1114 (2014), Elsevier
[53] Gelman, A.; Carlin, J. B.; Stern, H. S.; Dunson, D. B.; Vehtari, A.; Rubin, D. B., Bayesian Data Analysis (2013), CRC Press
[54] Revels, J.; Lubin, M.; Papamarkou, T., Forward-mode automatic differentiation in Julia (2016)
[55] Revels, J., ReverseDiff.jl (2017)
[56] Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; Desmaison, A.; Köpf, A.; Yang, E.; DeVito, Z.; Raison, M.; Tejani, A.; Chilamkurthy, S.; Steiner, B.; Fang, L.; Bai, J.; Chintala, S., PyTorch: an imperative style, high-performance deep learning library (2019)
[57] Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G. S.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Goodfellow, I.; Harp, A.; Irving, G.; Isard, M.; Jia, Y.; Jozefowicz, R.; Kaiser, L.; Kudlur, M.; Levenberg, J.; Mane, D.; Monga, R.; Moore, S.; Murray, D.; Olah, C.; Schuster, M.; Shlens, J.; Steiner, B.; Sutskever, I.; Talwar, K.; Tucker, P.; Vanhoucke, V.; Vasudevan, V.; Viegas, F.; Vinyals, O.; Warden, P.; Wattenberg, M.; Wicke, M.; Yu, Y.; Zheng, X., TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems (2015), 19
[58] Innes, M.; Saba, E.; Fischer, K.; Gandhi, D.; Rudilosso, M. C.; Joy, N. M.; Karmali, T.; Pal, A.; Shah, V., Fashionable modelling with flux (2018)
[59] Jospin, L. V.; Buntine, W.; Boussaid, F.; Laga, H.; Bennamoun, M., Hands-on Bayesian neural networks – a tutorial for deep learning users (2021)
[60] Ruiz, F. R.; Titsias, M.; Blei, D., The generalized reparameterization gradient, (Advances in Neural Information Processing Systems (2016))
[61] Jankowiak, M.; Obermeyer, F., Pathwise derivatives beyond the reparameterization trick, (Proceedings of the 35th International Conference on Machine Learning, PMLR (2018)), 2235-2244
[62] Figurnov, M.; Mohamed, S.; Mnih, A., Implicit reparameterization gradients, (Advances in Neural Information Processing Systems (2018))
[63] Ranganath, R.; Wang, C.; David, B.; Xing, E., An adaptive learning rate for stochastic variational inference, (Proceedings of the 30th International Conference on Machine Learning, PMLR (2013)), 298-306, ISSN: 1938-7228
[64] Dhaka, A. K.; Catalina, A.; Andersen, M. R.; Magnusson, M.; Huggins, J. H.; Vehtari, A., Robust, accurate stochastic optimization for variational inference, (Advances in Neural Information Processing Systems (2020))
[65] Lin, W.; Schmidt, M.; Khan, M. E., Handling the positive-definite constraint in the Bayesian learning rule, (Proceedings of the 37th International Conference on Machine Learning, PMLR (2020)), 6116-6126
[66] Mackay, D. J.C., Introduction to Monte Carlo Methods, (Learning in Graphical Models (1998), Springer), 175-204 · Zbl 0911.65004
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.