zbMATH — the first resource for mathematics

Stochastic quasi-Newton with line-search regularisation. (English) Zbl 1461.93556
Summary: In this paper we present a novel quasi-Newton algorithm for use in stochastic optimisation. Quasi-Newton methods have had an enormous impact on deterministic optimisation problems because they afford rapid convergence and computationally attractive algorithms. In essence, this is achieved by learning the second-order (Hessian) information based on observing first-order gradients. We extend these ideas to the stochastic setting by employing a highly flexible model for the Hessian and infer its value based on observing noisy gradients. In addition, we propose a stochastic counterpart to standard line-search procedures and demonstrate the utility of this combination on maximum likelihood identification for general nonlinear state space models.
93E20 Optimal stochastic control
93E12 Identification in stochastic control theory
93C10 Nonlinear systems in control theory
90C53 Methods of quasi-Newton type
Full Text: DOI
[1] Armijo, L., Minimization of functions having Lipschitz continuous first partial derivatives, Pacific Journal of Mathematics, 16, 1, 1-3 (1966) · Zbl 0202.46105
[2] Asi, H.; Duchi, J. C., Stochastic (approximate) proximal point methods: convergence, optimality, and adaptivity, SIAM Journal on Optimization, 29, 3, 2257-2290 (2019) · Zbl 07105236
[3] Bertsekas, Dimitri P.; Tsitsiklis, John N., Neuro-dynamic programming (vol. 5) (1996), Athena Scientific Belmont: Athena Scientific Belmont MA · Zbl 0924.68163
[4] Bollapragada, R., Mudigere, D., Nocedal, J., Shi, H.-J. M., & Tang, P. T. P. (2018). A progressive batching L-BFGS method for machine learning. In Proceedings of the 35th international conference on machine learning. Stockholm, Sweden.
[5] Bordes, A.; Bottou, L.; Gallinari, P., SGD-QN: Careful quasi-Newton stochastic gradient descent, Journal of Machine Learning Research (JMLR), 10, 1737-1754 (2009) · Zbl 1235.68130
[6] Bottou, L.; Curtis, F. E.; Nocedal, J., Optimization methods for large-scale machine learning, SIAM Review, 60, 2, 223-311 (2018) · Zbl 1397.65085
[7] Broyden, C. G., A class of methods for solving nonlinear simultaneous equations, Mathematics of Computation, 19, 92, 577-593 (1965) · Zbl 0131.13905
[8] Broyden, C. G., Quasi-Newton methods and their application to function minimization, Mathematics of Computation, 21, 368-381 (1967) · Zbl 0155.46704
[9] Broyden, C. G., The convergence of a class of double-rank minimization algorithms, Journal of the Institute of Mathematics and its Applications, 6, 1, 76-90 (1970) · Zbl 0223.65023
[10] Del Moral, P., Feynman-Kac formulae: Genealogical and interacting particle systems with applications (2004), Springer: Springer New York, USA · Zbl 1130.60003
[11] Duchi, J.; Hazan, E.; Singer, Y., Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research (JMLR), 12, 2121-2159 (2011) · Zbl 1280.68164
[12] Fletcher, R., A new approach to variable metric algorithms, The Computer Journal, 13, 3, 317-322 (1970) · Zbl 0207.17402
[13] Fletcher, R., Practical methods of optimization (1987), John Wiley & Sons: John Wiley & Sons Chichester, UK · Zbl 0905.65002
[14] Fletcher, R.; Powell, M. J.D., A rapidly convergent descent method for minimization, The Computer Journal, 6, 2, 163-168 (1963) · Zbl 0132.11603
[15] Goldfarb, D., A family of variable metric updates derived by variational means, Mathematics of Computation, 24, 109, 23-26 (1970) · Zbl 0196.18002
[16] Goodwin, Graham C.; Ramadge, Peter J.; Caines, Peter E., Discrete time stochastic adaptive control, SIAM Journal on Control and Optimization, 19, 6, 829-853 (1981) · Zbl 0473.93075
[17] Gordon, N. J., Salmond, D. J., & Smith, A. F. M. (1993). Novel approach to nonlinear/non-Gaussian Bayesian state estimation. In IEE proceedings on radar and signal processing (vol. 140) (pp. 107-113).
[18] Hendriks, J. N.; Jidling, C.; Wills, A.; Schön, T. B., Evaluating the squared-exponential covariance function in gaussian processes with integral observationsTechnical report (2018)
[19] Hennig, P., Probabilistic interpretation of linear solvers, SIAM Journal on Optimization, 25, 1, 234-260 (2015) · Zbl 1356.49042
[20] Hennig, P.; Kiefel, M., Quasi-Newton methods: A new direction, Journal of Machine Learning Research (JMLR), 14, 843-865 (2013) · Zbl 1320.62049
[21] Kantas, N.; Doucet, A.; Singh, S. S.; Maciejowski, J. M.; Chopin, N., On particle methods for parameter estimation in state-space models, Statistical Science, 30, 3, 328-351 (2015) · Zbl 1332.62096
[22] Kiefer, Jack; Wolfowitz, Jacob, Stochastic estimation of the maximum of a regression function, The Annals of Mathematical Statistics, 23, 3, 462-466 (1952) · Zbl 0049.36601
[23] Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Proceedings of the 3rd international conference on learning representations. San Diego, CA, USA.
[24] Kitagawa, G. (1993). A Monte Carlo filtering and smoothing method for non-Gaussian nonlinear state space models. In Proceedings of the 2nd US-Japan joint seminar on statistical time series analysis (pp. 110-131).
[25] Lindsten, F.; Schön, T. B., Backward simulation methods for Monte Carlo statistical inference, Foundations and Trends in Machine Learning, 6, 1, 1-143 (2013) · Zbl 1278.68016
[26] Ljung, Lennart, Analysis of recursive stochastic algorithms, IEEE Transactions on Automatic Control, 22, 4, 551-575 (1977) · Zbl 0362.93031
[27] Ljung, Lennart, Strong convergence of a stochastic approximation algorithm, The Annals of Statistics, 680-696 (1978) · Zbl 0402.62060
[28] Ljung, L., Asymptotic behavior of the extended Kalman filter as a parameter estimator for linear systems, IEEE Transactions on Automatic Control, AC-24, 1, 36-50 (1979) · Zbl 0399.93054
[29] Ljung, Lennart; Pflug, Georg; Walk, Harro, Stochastic approximation and optimization of random systems (vol. 17) (2012), Birkhäuser · Zbl 0747.62090
[30] Ljung, L.; Söderström, T., (Theory and practice of recursive identification. Theory and practice of recursive identification, The MIT press series in signal processing, optimization, and control (1983), The MIT Press: The MIT Press Cambridge, Massachusetts)
[31] Luo, Liangchen, Xiong, Yuanhao, & Liu, Yan (2019). Adaptive gradient methods with dynamic bound of learning rate. In International conference on learning representations. New Orleans, LA, USA.
[32] Magnus, J. R.; Neudecker, H., The elimination matrix: Some lemmas and applications, SIAM Journal on Algebraic Discrete Methods, 1, 4, 422-449 (1980) · Zbl 0497.15014
[33] Mahsereci, M.; Hennig, P., Probabilistic line searches for stochastic optimization, Journal of Machine Learning Research (JMLR), 18, 119, 1-59 (2017) · Zbl 1441.90110
[34] Malik, S.; Pitt, M. K., Particle filters for continuous likelihood evaluation and maximisation, Journal of Econometrics, 165, 2, 190-209 (2011) · Zbl 1441.62807
[35] Mokhtari, A.; Ribeiro, A., RES: Regularized stochastic BFGS algorithm, IEEE Transactions on Signal Processing, 62, 23, 6089-6104 (2014) · Zbl 1394.94405
[36] Moulines, E., & Bach, F. (2011). Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In Advances in neural information processing systems. Granada, Spain.
[37] Nocedal, J.; Wright, S. J., (Numerical optimization. Numerical optimization, Springer series in operations research (2006), Springer: Springer New York, USA) · Zbl 1104.65059
[38] Pitt, M. K.; dos Santos Silva, R.; Giordani, R.; Kohn, R., On some properties of Markov chain Monte Carlo simulation methods based on the particle filter, Journal of Econometrics, 171, 2, 134-151 (2012) · Zbl 1443.62499
[39] Poyiadjis, G.; Doucet, A.; Singh, S. S., Particle approximations of the score and observed information matrix in state space models with application to parameter estimation, Biometrika, 98, 1, 65-80 (2011) · Zbl 1214.62093
[40] Rasmussen, C. E.; Williams, C. K.I., Gaussian processes for machine learning (2006), MIT Press · Zbl 1177.68165
[41] Robbins, H.; Monro, S., A stochastic approximation method, The Annals of Mathematical Statistics, 22, 3, 400-407 (1951) · Zbl 0054.05901
[42] Schön, T. B., Lindsten, F., Dahlin, J., Wågberg, J., Naesseth, A. C., & Svensson, A., et al. Sequential Monte Carlo methods for system identification. In Proceedings of the 17th IFAC symposium on system identification. Beijing, China.
[43] Schön, T. B.; Wills, A.; Ninness, B., System identification of nonlinear state-space models, Automatica, 47, 1, 39-49 (2011) · Zbl 1209.93155
[44] Schraudolph, N. N., Yu, J., & Günter, S. (2007). A stochastic quasi-Newton method for online convex optimization. In Proceedings of the 11th international conference on artificial intelligence and statistics.
[45] Shah, A., Wilson, A. G., & Ghahramani, Z. (2014). Student-t processs as alternatives to Gaussian processes. In Proceedings of the 17th international conference on artificial intelligence and statistics. Reykjavik, Iceland.
[46] Shanno, D. F., Conditioning of quasi-Newton methods for function minimization, Mathematics of Computation, 24, 111, 647-656 (1970) · Zbl 0225.65073
[47] Spall, James. C., Introduction to stochastic search and optimization: Estimation, simulation, and control (vol. 65) (2005), John Wiley & Sons
[48] Stewart, L., & McCarty, P. (1992). The use of Bayesian belief networks to fuse continuous and discrete information for target recognition and discrete information for target recognition, tracking, and situation assessment. In Proceedings of SPIE signal processing, sensor fusion and target recognition (vol. 1699) (pp. 177-185).
[49] Wills, A. G., & Schön, T. B. (2017). On the construction of probabilistic Newton-type algorithms. In Proceedings of the 56th IEEE conference on decision and control. Melbourne, Australia.
[50] Wills, A. G., Schön, T. B., & Jidling, C. (2020). A fast quasi-newton-type method for large-scale stochastic optimisation. In IFAC world congress.
[51] Wills, A.; Schön, T. B.; Ljung, L.; Ninness, B., Identification of Hammerstein-Wiener models, Automatica, 49, 1, 70-81 (2013) · Zbl 1257.93109
[52] Wolfe, P., Convergence conditions for ascent methods, SIAM Review, 11, 2, 226-235 (1969) · Zbl 0177.20603
[53] Wolfe, P., Convergence conditions for ascent methods II: Some corrections, SIAM Review, 13, 2, 185-188 (1971) · Zbl 0216.26901
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.