×

Recursive estimation for sparse Gaussian process regression. (English) Zbl 1448.93320

Summary: Gaussian processes (GPs) are powerful kernelized methods for nonparametric regression used in many applications. However, their use is limited to a few thousand of training samples due to their cubic time complexity. In order to scale GPs to larger datasets, several sparse approximations based on so-called inducing points have been proposed in the literature. In this work, we investigate the connection between a general class of sparse inducing point GP regression methods and Bayesian recursive estimation which enables Kalman filter like updating for online learning. The majority of previous work has focused on the batch setting, in particular for learning the model parameters and the position of the inducing points, here instead we focus on training with mini-batches. By exploiting the Kalman filter formulation, we propose a novel approach that estimates such parameters by recursively propagating the analytical gradients of the posterior over mini-batches of the data. Compared to state of the art methods, our method keeps analytic updates for the mean and covariance of the posterior, thus reducing drastically the size of the optimization problem. We show that our method achieves faster convergence and superior performance compared to state of the art sequential GP regression on synthetic GP as well as real-world data with up to a million of data samples.

MSC:

93E10 Estimation and detection in stochastic control theory
93E11 Filtering in stochastic control theory
62P30 Applications of statistics in engineering and industry; control charts
62G08 Nonparametric regression and quantile regression
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Álvarez, M. A.; Luengo, D.; Lawrence, N. D., Linear latent force models using Gaussian processes, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 11, 2693-2705 (2013)
[2] Benavoli, Alessio; Zaffalon, Marco, State space representation of non-stationary Gaussian processes (2016), abs/1601.01544 · Zbl 1237.62004
[3] Bijl, Hildo; Schön, Thomas B.; van Wingerden, Jan-Willem; Verhaegen, Michel, Online sparse Gaussian process training with input noise (2016), abs/1601.08068
[4] Bishop, Christopher M., Pattern recognition and machine learning (Information science and statistics) (2006), Springer-Verlag: Springer-Verlag Berlin, Heidelberg · Zbl 1107.68072
[5] Bottou, Léon; Curtis, Frank E.; Nocedal, Jorge, Optimization methods for large-scale machine learning, SIAM Review, 60, 2, 223-311 (2018) · Zbl 1397.65085
[6] Bui, Thang D.; Nguyen, Cuong; Turner, Richard E., Streaming sparse Gaussian process approximations, (Guyon, I.; Luxburg, U. V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; Garnett, R., Advances in Neural Information Processing Systems 30 (2017), Curran Associates, Inc.), 3299-3307
[7] Bui, Thang D.; Yan, Josiah; Turner, Richard E., A unifying framework for sparse Gaussian process approximation using power expectation propagation, Journal of Machine Learning Research (JMLR), 18, 1-72 (2017) · Zbl 1443.60037
[8] Carron, Andrea; Todescato, Marco; Carli, Ruggero; Schenato, Luca; Pillonetto, Gianluigi, Machine learning meets Kalman filtering, (2016 IEEE 55th conference on decision and control (2016), IEEE), 4594-4599
[9] Chen, Xiangyi; Liu, Sijia; Sun, Ruoyu; Hong, Mingyi, On the convergence of a class of Adam-type algorithms for non-convex optimization, (Proceedings of the 7th international conference on learning representations. Proceedings of the 7th international conference on learning representations, ICLR 2019 (2019))
[10] Chen, Tianshi; Ohlsson, Henrik; Ljung, Lennart, On the estimation of transfer functions, regularizations and Gaussian processes—Revisited, Automatica, 48, 8, 1525-1535 (2012) · Zbl 1269.93126
[11] Csató, Lehel; Opper, Manfred, Sparse online Gaussian processes, Neural Computation, 14, 3, 641-668 (2002) · Zbl 0987.62060
[12] Frigola, Roger; Lindsten, Fredrik; Schön, Thomas B.; Rasmussen, Carl Edward, Bayesian inference and learning in Gaussian process state-space models with particle MCMC, (Burges, C. J.C.; Bottou, L.; Welling, M.; Ghahramani, Z.; Weinberger, K. Q., Advances in Neural Information Processing Systems 26 (2013), Curran Associates, Inc.), 3156-3164
[13] GPy: A Gaussian process framework in Python (2012), http://github.com/SheffieldML/GPy
[14] Hartikainen, Jouni; Särkkä, Simo, Kalman filtering and smoothing solutions to temporal Gaussian process regression models, (2010 IEEE international workshop on machine learning for signal processing (2010), IEEE), 379-384 · Zbl 1368.93757
[15] Hensman, James; Fusi, Nicolò; Lawrence, Neil D., Gaussian processes for big data, (Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence. Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, UAI’13 (2013), AUAI Press), 282-290
[16] Hoang, Trong Nghia; Hoang, Quang Minh; Low, Bryan Kian Hsiang, A unifying framework of anytime sparse Gaussian process regression models with stochastic variational inference for big data, (Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37. Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15 (2015), JMLR.org), 569-578
[17] Hoffman, Matthew D.; Blei, David M.; Wang, Chong; Paisley, John, Stochastic variational inference, Journal of Machine Learning Research (JMLR), 14, 1, 1303-1347 (2013) · Zbl 1317.68163
[18] Kingma, Diederik P.; Ba, Jimmy, Adam: A method for stochastic optimization (2014), arXiv preprint arXiv:1412.6980
[19] Kocijan, Juš; Girard, Agathe; Banko, Blaž; Murray-Smith, Roderick, Dynamic systems identification with Gaussian processes, Mathematical and Computer Modelling of Dynamical Systems, 11, 4, 411-424 (2005) · Zbl 1122.93081
[20] Liu, Haitao; Ong, Yew-Soon; Shen, Xiaobo; Cai, Jianfei, When Gaussian process meets big data: A review of scalable GPs (2018), arXiv preprint arXiv:1807.01065
[21] Macdonald, Benn; Higham, Catherine; Husmeier, Dirk, Controversy in mechanistic modelling with Gaussian processes, (Proceedings of the 32nd international conference on international conference on machine learning - Volume 37. Proceedings of the 32nd international conference on international conference on machine learning - Volume 37, ICML’15 (2015), JMLR.org), 1539-1547
[22] Matthews, Alexander G.de G.; van der Wilk, Mark; Nickson, Tom; Fujii, Keisuke.; Boukouvalas, Alexis; León-Villagrá, Pablo, GPflow: A Gaussian process library using tensorflow, Journal of Machine Learning Research (JMLR), 18, 40, 1-6 (2017) · Zbl 1437.62127
[23] Mattos, César Lincoln C., Dai, Zhenwen, Damianou, Andreas, Forth, Jeremy, Barreto, Guilherme A., & Lawrence, Neil D. (2016). Recurrent Gaussian processes. In Hugo Larochelle, Brian Kingsbury, and Samy Bengio (Eds.), Proceedings of the international conference on learning representations, Vol. 3, Caribe Hotel, San Juan, PR.
[24] Pillonetto, Gianluigi, A new kernel-based approach to hybrid system identification, Automatica, 70, 21-31 (2016) · Zbl 1339.93113
[25] Pillonetto, Gianluigi; De Nicolao, Giuseppe, A new kernel-based approach for linear system identification, Automatica, 46, 1, 81-93 (2010) · Zbl 1214.93116
[26] Pillonetto, Gianluigi; Dinuzzo, Francesco; Chen, Tianshi; De Nicolao, Giuseppe; Ljung, Lennart, Kernel methods in system identification, machine learning and function estimation: A survey, Automatica, 50, 3, 657-682 (2014) · Zbl 1298.93342
[27] Quiñonero-Candela, Joaquin; Rasmussen, Carl Edward, A unifying view of sparse approximate Gaussian process regression, Journal of Machine Learning Research (JMLR), 6, Dec, 1939-1959 (2005) · Zbl 1222.68282
[28] Rasmussen, Carl Edward; Williams, Christopher K. I., Gaussian processes for machine learning (2006), MIT press: MIT press Cambridge · Zbl 1177.68165
[29] Reddi, Sashank J.; Kale, Satyen; Kumar, Sanjiv, On the convergence of adam and beyond, (Proceedings of the 6th International Conference on Learning Representations, Vancouver, BC, Canada. Proceedings of the 6th International Conference on Learning Representations, Vancouver, BC, Canada, ICLR 2018 (2018), OpenReview.net)
[30] Sarkka, Simo; Solin, Arno; Hartikainen, Jouni, Spatiotemporal learning via infinite-dimensional Bayesian filtering and smoothing: A look at Gaussian process regression through Kalman filtering, IEEE Signal Processing Magazine, 30, 4, 51-61 (2013)
[31] Seeger, Matthias; Williams, Christopher; Lawrence, Neil, Fast forward selection to speed up sparse Gaussian process regression, Artificial intelligence and statistics 9EPFL-CONF-161318 (2003)
[32] Silverman, Bernhard W., Some aspects of the spline smoothing approach to non-parametric regression curve fitting, Journal of the Royal Statistical Society. Series B. Statistical Methodology, 1-52 (1985) · Zbl 0606.62038
[33] Smola, Alex J.; Bartlett, Peter L., Sparse greedy Gaussian process regression, (Leen, T. K.; Dietterich, T. G.; Tresp, V., Advances in Neural Information Processing Systems 13 (2001), MIT Press), 619-625
[34] Snelson, Edward; Ghahramani, Zoubin, Sparse Gaussian processes using pseudo-inputs, (Weiss, Y.; Schölkopf, B.; Platt, J. C., Advances in Neural Information Processing Systems 18 (2006), MIT Press), 1257-1264
[35] Svensson, Andreas; Schön, Thomas B., A flexible state-space model for learning nonlinear dynamical systems, Automatica, 80, 189-199 (2017) · Zbl 1370.93300
[36] Titsias, Michalis, Variational learning of inducing variables in sparse Gaussian processes, (Artificial intelligence and statistics 12 (2009)), 567-574
[37] Todescato, Marco; Carron, Andrea; Carli, Ruggero; Pillonetto, Gianluigi; Schenato, Luca, Efficient spatio-temporal Gaussian regression via Kalman filtering (2017), arXiv preprint arXiv:1705.01485 · Zbl 1370.93262
[38] Tran, P. T.; Phong, L. T., On the convergence proof of amsgrad and a new version, IEEE Access, 7, 61706-61716 (2019)
[39] Wahba, Grace; Lin, Xiwu; Gao, Fangyu; Xiang, Dong; Klein, Ronald; Klein, Barbara, The bias-variance tradeoff and the randomized GACV, (Kearns, M. J.; Solla, S. A.; Cohn, D. A., Advances in Neural Information Processing Systems 11 (1999), MIT Press), 620-626
[40] Wang, Yali; Barber, David, Gaussian processes for Bayesian estimation in ordinary differential equations, (Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32. Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, ICML’14 (2014), JMLR.org), 1485-1493
[41] Zhou, Dongruo; Tang, Yiqi; Yang, Ziyan; Cao, Yuan; Gu, Quanquan, On the convergence of adaptive gradient methods for nonconvex optimization (2018), arXiv:1808.05671
[42] Zou, Fangyu, Shen, Li, Jie, Zequn, Zhang, Weizhong, & Liu, Wei (2019). A sufficient condition for convergences of Adam and RMSProp. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition, 2019-June(1) (pp. 11119-11127).
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.