zbMATH — the first resource for mathematics

A convergence analysis of the perturbed compositional gradient flow: averaging principle and normal deviations. (English) Zbl 1397.34098
Summary: We consider in this work a system of two stochastic differential equations named the perturbed compositional gradient flow. By introducing a separation of fast and slow scales of the two equations, we show that the limit of the slow motion is given by an averaged ordinary differential equation. We then demonstrate that the deviation of the slow motion from the averaged equation, after proper rescaling, converges to a stochastic process with Gaussian inputs. This indicates that the slow motion can be approximated in the weak sense by a standard perturbed gradient flow or the continuous-time stochastic gradient descent algorithm that solves the optimization problem for a composition of two functions. As an application, the perturbed compositional gradient flow corresponds to the diffusion limit of the Stochastic Composite Gradient Descent (SCGD) algorithm for minimizing a composition of two expected-value functions in the optimization literatures. For the strongly convex case, such an analysis implies that the SCGD algorithm has the same convergence time asymptotic as the classical stochastic gradient descent algorithm. Thus it validates, at the level of continuous approximation, the effectiveness of using the SCGD algorithm in the strongly convex case.
34F05 Ordinary differential equations and systems with randomness
34C29 Averaging method for ordinary differential equations
90C30 Nonlinear programming
Full Text: DOI
[1] L. Arnold, Hasslemann’s program revisited: The analysis of stochasticity in deterministic climate models, in Progress in Probability Book Series, Springer, 49, Stochastic Climate Models, 141-157. · Zbl 1015.34037
[2] V. I. Bakhtin, Asymptotics of superregular perturbations of fiber ergodic semigroups, Stochastics and Stochastic Reports, 75, 295, (2003) · Zbl 1039.60055
[3] V. Bakhtin; Y. Kifer, Diffusion approximation for slow motion in fully coupled averaging, Probability Theory and Related Fields, 129, 157, (2004) · Zbl 1069.34070
[4] A. Benveniste, M. Metivier and P. Priouret, Adaptive algorithms and stochastic approximations, Applications of Mathematics, Springer, 22 (1990), xii+365pp. · Zbl 0752.93073
[5] V. S. Borkar, Stochastic Approximation: A Dynamical Systems Viewpoint, Cambridge University Press, 2008. · Zbl 1181.62119
[6] S. Cerrai, A Khasminskii’s averaging principle for stochastic reaction-diffusion equations, Annals of Applied Probability, 19, 899, (2009) · Zbl 1191.60076
[7] S. Cerrai, Normal deviations from the averaged motion for some reaction-diffusion equations with fast oscillating perturbation, Journal de Mathématiques Pures et Appliquées, 91, 614, (2009) · Zbl 1174.60034
[8] Y. Ermoliev, Methods of Stochastic Programming. Monographs in Optimization and OR, Nauka, Moscow, 1976.
[9] M. I. Freidlin, Functional Integration and Partial Differential Equations, Princeton University Press, Princeton, 1985. · Zbl 0568.60057
[10] M. Freidlin and A. Wentzell, Random Perturbations of Dynamical Systems, 2\begindocument\(^{nd}\)\enddocument Edition, Springer, 1998. · Zbl 0922.60006
[11] M. Hairer; G. Pavliotis, Periodic homogenization for hypoelliptic diffusions, Journal of Statistical Physics, 117, 261, (2004) · Zbl 1130.82026
[12] K. Hasselmann, Stochastic climate models, part ⅰ, theory, Tellus, 28, 473, (1976)
[13] W. Hu and C. J. Li, On the fast convergence of random perturbations of the gradient flow, preprint, arXiv: 1706.00837
[14] W. Hu, C. J. Li, L. Li and J. Liu, On the diffusion approximation of nonconvex stochastic gradient descent, Annals of Mathematical Science and Applications, to appear. arXiv: 1705.07562
[15] R. Khasminskii, On stochastic processes defined by differential equations with a small parameter, Theory of Probability and its Applications, 11, 211, (1966) · Zbl 0168.16002
[16] R. Khasminskii, On the principle of averaging the it ô’s stochastic differential equations, Kybernetika (Prague), 4, 260, (1968)
[17] Y. Kifer, The exit problem for small random perturbations of dynamical systems with a hyperbolic fixed point, Israel Journal of Mathematics, 40, 74, (1981) · Zbl 0473.60067
[18] C. Kipnis; S. R. S. Varadhan, Central limit theorem for additive functionals of reversible Markov processes and applications to simple exclusions, Communications in Mathematical Physics, 104, 1, (1986) · Zbl 0588.60058
[19] H. Kushner and G. George Yin, Stochastic approximations and recursive algorithms and applications, Applications of Mathematics (Stochastic Modeling and Applied Probability), 35, 2\begindocument\(^{nd}\)\enddocument Edition, Springer, 2003. · Zbl 1026.62084
[20] R. J. Muirhead, Aspects of Multivariate Statistical Theory, John Wiley & Sons, Inc., New York, 1982. · Zbl 0556.62028
[21] B. Oksendal, Stochastic Differential Equations, Springer-Verlag, 1992.
[22] E. Pardoux; A. Yu. Veretennikov, On the possion equation and diffusion approximation 1, Annals of Probability, 29, 1061, (2001) · Zbl 1029.60053
[23] E. Pardoux; A. Yu. Veretennikov, On the possion equation and diffusion approximation 2, Annals of Probability, 31, 1166, (2003) · Zbl 1054.60064
[24] E. Pardoux; A. Yu. Veretennikov, On the possion equation and diffusion approximation 3, Annals of Probability, 33, 1111, (2005) · Zbl 1071.60022
[25] D. Revuz and M. Yor, Continuous martingales and Brownian motion, Grundlehren der Mathematischen Wissenschaften, 293, 3\^{rd} edition, Springer-Verlag, Berlin, 1999, xiv+602 pp. · Zbl 0917.60006
[26] N. G. Van Kampen, The diffusion approximation for markov processes, Reprinted from: Thermodynamics & kinetics of biological processes (eds. I. Lamprecht and A. I. Zotin) Walter de Gruyter & Co., New York (1982). 181-195.
[27] V. N. Vapnik, The Nature of Statistical Learning Theory, Springer, 1995. · Zbl 0833.62008
[28] A. Yu. Veretennikov, On polynomial mixing and convergence rate for stochastic difference and differential equations, Theory of Probability and its Applications, 44, 361, (2000) · Zbl 0969.60070
[29] M. Wang; E. X. Fang; H. Liu, Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions, Mathematical Programming, 161, 419, (2016) · Zbl 1356.90099
[30] M. Wang, J. Liu and E. X. Fang, Accelerating stochastic composition optimization, Advances in Neural Information Processing Systems, 2016. arXiv: 1607.07329 · Zbl 1441.62217
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.