zbMATH — the first resource for mathematics

Scaling up Bayesian variational inference using distributed computing clusters. (English) Zbl 1420.68171
Summary: In this paper we present an approach for scaling up Bayesian learning using variational methods by exploiting distributed computing clusters managed by modern big data processing tools like Apache Spark or Apache Flink, which efficiently support iterative map-reduce operations. Our approach is defined as a distributed projected natural gradient ascent algorithm, has excellent convergence properties, and covers a wide range of conjugate exponential family models. We evaluate the proposed algorithm on three real-world datasets from different domains (the Pubmed abstracts dataset, a GPS trajectory dataset, and a financial dataset) and using several models (LDA, factor analysis, mixture of Gaussians and linear regression models). Our approach compares favorably to stochastic variational inference and streaming variational Bayes, two of the main current proposals for scaling up variational methods. For the scalability analysis, we evaluate our approach over a network with more than one billion nodes and approx. \(75\%\) latent variables using a computer cluster with 128 processing units (AWS). The proposed methods are released as part of an open-source toolbox for scalable probabilistic machine learning (http://www.amidsttoolbox.com), see our work [“AMIDST: a Java toolbox for scalable probabilistic machine learning”, Preprint, arXiv:1704.01427].

68T05 Learning and adaptive systems in artificial intelligence
62F15 Bayesian inference
Full Text: DOI
[1] Alexandrov, A.; Bergmann, R.; Ewen, S.; Freytag, J.-C.; Hueske, F.; Heise, A.; Kao, O.; Leich, M.; Leser, U.; Markl, V.; Naumann, F.; Peters, M.; Rheinländer, A.; Sax, M. J.; Schelter, S.; Höger, M.; Tzoumas, K.; Warneke, D., The stratosphere platform for big data analytics, VLDB J., 23, 939-964, (2014)
[2] Beal, M. J., Variational algorithms for approximate Bayesian inference, (2003), University College London, Ph.D. thesis Gatsby Computational Neuroscience Unit
[3] Bernardo, J. M.; Smith, A. F., Bayesian theory, (2006), John Wiley & Sons Canada, Limited
[4] Blei, D. M.; Ng, A. Y.; Jordan, M. I., Latent Dirichlet allocation, J. Mach. Learn. Res., 3, 993-1022, (2003) · Zbl 1112.68379
[5] Borchani, H.; Martínez, A. M.; Masegosa, A.; Langseth, H.; Nielsen, T. D.; Salmerón, A.; Fernández, A.; Madsen, A. L.; Sáez, R., Dynamic Bayesian modeling for risk prediction in credit operations, (Proceedings of the 13th Scandinavian Conference on Artificial Intelligence, (2015), IOS Press), 17-26
[6] Borchani, H.; Martínez, A. M.; Masegosa, A.; Langseth, H.; Nielsen, T. D.; Salmerón, A.; Fernández, A.; Madsen, A. L.; Sáez, R., Modeling concept drift: a probabilistic graphical model based approach, (Proc. of The Fourteenth Int. Symposium on IDA, (2015), Springer International Publishing), 72-83
[7] Boyd, S.; Vandenberghe, L., Convex optimization, (2004), Cambridge University Press · Zbl 1058.90049
[8] Broderick, T.; Boyd, N.; Wibisono, A.; Wilson, A. C.; Jordan, M. I., Streaming variational Bayes, In Advances in NIPS, vol. 26, 1727-1735, (2013), Curran Associates, Inc.
[9] Cabañas, R.; Martínez, A. M.; Masegosa, A. R.; Ramos-López, D.; Samerón, A.; Nielsen, T. D.; Langseth, H.; Madsen, A. L., Financial data analysis with PGMs using AMIDST, (2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), (2016), IEEE), 1284-1287
[10] Campbell, T.; How, J. P., Approximate decentralized Bayesian inference, (Proc. of the Thirtieth Conf. on UAI, (2014)), 102-111
[11] Carbone, P.; Ewen, S.; Haridi, S.; Katsifodimos, A.; Markl, V.; Tzoumas, K., Apache flink: stream and batch processing in a single engine, Q. Bull. Comput. Soc. IEEE Tech. Comm. Data Eng., 36, 28, (2015)
[12] Casella, G.; Berger, R., Statistical inference, (2001), Duxbury Resource Center
[13] Chen, M.-H.; Shao, Q.-M.; Ibrahim, J. G., Monte Carlo methods in Bayesian computation, (2012), Springer Science & Business Media
[14] Chu, C.-T.; Kim, S. K.; Lin, Y.-A.; Yu, Y.; Bradski, G.; Ng, A. Y.; Olukotun, K., Map-reduce for machine learning on multicore, Adv. Neural Inf. Process. Syst., 19, 281-288, (2007)
[15] Dean, J.; Ghemawat, S., Mapreduce: simplified data processing on large clusters, Commun. ACM, 51, 107, (2008)
[16] Doucet, A.; De Freitas, N.; Gordon, N., An introduction to sequential Monte Carlo methods, (Sequential Monte Carlo Methods in Practice, (2001), Springer), 3-14 · Zbl 1056.93576
[17] Duchi, J.; Hazan, E.; Singer, Y., Adaptive subgradient methods for online learning and stochastic optimization, J. Mach. Learn. Res., 12, 2121-2159, (2011) · Zbl 1280.68164
[18] Foulds, J.; Boyles, L.; DuBois, C.; Smyth, P.; Welling, M., Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation, (Proc. of the Int. Conf. on Knowledge Discovery and Data Mining, (2013), ACM), 446-454
[19] Hashem, I. A.T.; Yaqoob, I.; Anuar, N. B.; Mokhtar, S.; Gani, A.; Khan, S. U., The rise of “big data” on cloud computing: review and open research issues, Inf. Sci., 47, 98-115, (2015)
[20] Hoffman, M. D.; Blei, D. M.; Wang, C.; Paisley, J., Stochastic variational inference, J. Mach. Learn. Res., 14, 1303-1347, (2013) · Zbl 1317.68163
[21] Khan, M. E.; Babanezhad, R.; Lin, W.; Schmidt, M.; Sugiyama, M., Convergence of proximal-gradient stochastic variational inference under non-decreasing step-size sequence, (2015), arXiv preprint
[22] Kushner, H. J.; Yin, G. G., Stochastic approximation algorithms and applications, (1997), Springer New York · Zbl 0914.60006
[23] Lichman, M., UCI machine learning repository, (2013), URL:
[24] Luo, Z.-Q.; Tseng, P., Error bounds and convergence analysis of feasible descent methods: a general approach, Ann. Oper. Res., 46, 157-178, (1993) · Zbl 0793.90076
[25] Mandt, S.; Blei, D., Smoothed gradients for stochastic variational inference, (Advances in Neural Information Processing Systems, (2014), MIT Press), 2438-2446
[26] Martens, J., New insights and perspectives on the natural gradient method, (2014), arXiv preprint
[27] Masegosa, A. R.; Martínez, A. M.; Borchani, H., Probabilistic graphical models on multi-core CPUs using Java 8, IEEE Comput. Intell. Mag., 11, 41-54, (2016)
[28] Masegosa, A. R.; Martínez, A. M.; Langseth, H.; Nielsen, T. D.; Salmerón, A.; Ramos-López, D.; Madsen, A. L., D-VMP: distributed variational message passing, (PGM’2016, JMLR: Workshop and Conference Proceedings, vol. 52, (2016)), 321-332
[29] Masegosa, A. R.; Martínez, A. M.; Ramos-López, D.; Cabañas, R.; Salmerón, A.; Nielsen, T. D.; Langseth, H.; Madsen, A. L., AMIDST: a Java toolbox for scalable probabilistic machine learning, (2017), arXiv preprint
[30] Meng, X.; Bradley, J.; Yavuz, B.; Sparks, E.; Venkataraman, S.; Liu, D.; Freeman, J.; Tsai, D.; Amde, M.; Owen, S.; Xin, D.; Xin, R.; Franklin, M. J.; Zadeh, R.; Zaharia, M.; Talwalkar, A., Mllib: machine learning in apache spark, (2015), arXiv preprint · Zbl 1360.68697
[31] Meng, X.; Bradley, J.; Yuvaz, B.; Sparks, E.; Venkataraman, S.; Liu, D.; Freeman, J.; Tsai, D.; Amde, M.; Owen, S., Mllib: machine learning in apache spark, J. Mach. Learn. Res., 17, 1-7, (2016) · Zbl 1360.68697
[32] Robbins, H.; Monro, S., A stochastic approximation method, Ann. Math. Stat., 22, 400-407, (1951) · Zbl 0054.05901
[33] Sato, M.-A., Online model selection based on the variational Bayes, Neural Comput., 13, 1649-1681, (2001) · Zbl 1013.62087
[34] Welling, M.; Teh, Y. W., Bayesian learning via stochastic gradient Langevin dynamics, (Proc. of the Int. Conf. on Machine Learning (ICML-11), (2011)), 681-688
[35] Winn, J. M.; Bishop, C. M., Variational message passing, J. Mach. Learn. Res., 6, 661-694, (2005) · Zbl 1222.68332
[36] Zaharia, M.; Chowdhury, M.; Franklin, M. J.; Shenker, S.; Stoica, I., Spark: cluster computing with working sets, (Proc. of the Second USENIX Conf. on Hot Topics in Cloud Computing, (2010)), 1-7
[37] Zheng, Y.; Li, Q.; Chen, Y.; Xie, X.; Ma, W.-Y., Understanding mobility based on GPS data, (Proceedings of the 10th International Conference on Ubiquitous Computing, UbiComp ’08, (2008), ACM New York, NY, USA), 312-321
[38] Zheng, Y.; Xie, X.; Ma, W.-Y., Geolife: a collaborative social networking service among user, location and trajectory, IEEE Data Eng. Bull., 33, 32-39, (2010)
[39] Zheng, Y.; Zhang, L.; Xie, X.; Ma, W.-Y., Mining interesting locations and travel sequences from GPS trajectories, (Proceedings of the 18th International Conference on World Wide Web, WWW ’09, (2009), ACM New York, NY, USA), 791-800
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.