×

On sampling from a log-concave density using kinetic Langevin diffusions. (English) Zbl 07193949

Summary: Langevin diffusion processes and their discretizations are often used for sampling from a target density. The most convenient framework for assessing the quality of such a sampling scheme corresponds to smooth and strongly log-concave densities defined on \(\mathbb{R}^p\). The present work focuses on this framework and studies the behavior of the Monte Carlo algorithm based on discretizations of the kinetic Langevin diffusion. We first prove the geometric mixing property of the kinetic Langevin diffusion with a mixing rate that is optimal in terms of its dependence on the condition number. We then use this result for obtaining improved guarantees of sampling using the kinetic Langevin Monte Carlo method, when the quality of sampling is measured by the Wasserstein distance. We also consider the situation where the Hessian of the log-density of the target distribution is Lipschitz-continuous. In this case, we introduce a new discretization of the kinetic Langevin diffusion and prove that this leads to a substantial improvement of the upper bound on the sampling error measured in Wasserstein distance.

MSC:

65Cxx Probabilistic methods, stochastic differential equations
62Rxx Statistics on algebraic and topological structures

References:

[1] Baker, J., Fearnhead, P., Fox, E.B. and Nemeth, C. (2019). Control variates for stochastic gradient MCMC. Stat. Comput. 29 599-615. · Zbl 1430.62265
[2] Bernton, E. (2018). Langevin Monte Carlo and JKO splitting. In Proceedings of the 31st Conference on Learning Theory (S. Bubeck, V. Perchet and P. Rigollet, eds.). Proceedings of Machine Learning Research 75 1777-1798.
[3] Bolley, F., Guillin, A. and Malrieu, F. (2010). Trend to equilibrium and particle approximation for a weakly selfconsistent Vlasov-Fokker-Planck equation. ESAIM Math. Model. Numer. Anal. 44 867-884. · Zbl 1201.82029
[4] Bou-Rabee, N. and Hairer, M. (2013). Nonasymptotic mixing of the MALA algorithm. IMA J. Numer. Anal. 33 80-110. · Zbl 1305.65012
[5] Bou-Rabee, N., Eberle, A. and Zimmer, R. (2018). Coupling and convergence for Hamiltonian Monte Carlo. · Zbl 07325608
[6] Brosse, N., Durmus, A., Moulines, E. and Pereyra, M. (2017). Sampling from a log-concave distribution with compact support with proximal Langevin Monte Carlo. In Proceedings of the 2017 Conference on Learning Theory (S. Kale and O. Shamir, eds.). Proceedings of Machine Learning Research 65 319-342. Amsterdam, Netherlands.
[7] Bubeck, S., Eldan, R. and Lehec, J. (2018). Sampling from a log-concave distribution with projected Langevin Monte Carlo. Discrete Comput. Geom. 59 757-783. · Zbl 1397.65010
[8] Chatterji, N., Flammarion, N., Ma, Y., Bartlett, P. and Jordan, M. (2018). On the theory of variance reduction for stochastic gradient Monte Carlo. In Proceedings of the 35th International Conference on Machine Learning (J. Dy and A. Krause, eds.). Proceedings of Machine Learning Research 80 764-773. Stockholm, Sweden: Stockholmsmassan.
[9] Chen, Z. and Vempala, S.S. (2019). Optimal convergence rate of Hamiltonian Monte Carlo for strongly logconcave distributions. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. LIPIcs. Leibniz Int. Proc. Inform. 145 Art. No. 64, 12. Wadern: Schloss Dagstuhl. Leibniz-Zent. Inform. · Zbl 07650131
[10] Cheng, X. and Bartlett, P. (2018). Convergence of Langevin MCMC in KL-divergence. In Algorithmic Learning Theory 2018. Proc. Mach. Learn. Res. (PMLR) 83 26. · Zbl 1406.60114
[11] Cheng, X., Chatterji, N.S., Abbasi-Yadkori, Y., Bartlett, P.L. and Jordan, M.I. (2018). Sharp convergence rates for Langevin dynamics in the nonconvex setting. ArXiv e-prints.
[12] Cheng, X., Chatterji, N.S., Bartlett, P.L. and Jordan, M.I. (2017). Underdamped Langevin MCMC: A non-asymptotic analysis. ArXiv e-prints.
[13] Collier, O. and Dalalyan, A.S. (2017). Minimax estimation of a p-dimensional linear functional in sparse gaussian models and robust estimation of the mean. Available at arXiv:1712.05495. · Zbl 1432.62203
[14] Dalalyan, A. (2017). Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent. In Proceedings of the 2017 Conference on Learning Theory (S. Kale and O. Shamir, eds.). Proceedings of Machine Learning Research 65 678-689.
[15] Dalalyan, A.S. (2017). Theoretical guarantees for approximate sampling from smooth and log-concave densities. J. R. Stat. Soc. Ser. B. Stat. Methodol. 79 651-676. · Zbl 1411.62030
[16] Dalalyan, A.S. and Karagulyan, A. (2019). User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient. Stochastic Process. Appl. 129 5278-5311. · Zbl 1428.62316
[17] Desvillettes, L. and Villani, C. (2001). On the trend to global equilibrium in spatially inhomogeneous entropy-dissipating systems: The linear Fokker-Planck equation. Comm. Pure Appl. Math. 54 1-42. · Zbl 1029.82032
[18] Dieuleveut, A., Durmus, A. and Bach, F. (2017). Bridging the gap between constant step size stochastic gradient descent and Markov chains. ArXiv e-prints. · Zbl 1454.62242
[19] Dolbeault, J., Mouhot, C. and Schmeiser, C. (2015). Hypocoercivity for linear kinetic equations conserving mass. Trans. Amer. Math. Soc. 367 3807-3828. · Zbl 1342.82115
[20] Douc, R., Moulines, E. and Rosenthal, J.S. (2004). Quantitative bounds on convergence of time-inhomogeneous Markov chains. Ann. Appl. Probab. 14 1643-1665. · Zbl 1072.60059
[21] Durmus, A., Majewski, S. and Miasojedow, B. (2019). Analysis of Langevin Monte Carlo via convex optimization. J. Mach. Learn. Res. 20 Paper No. 73, 46. · Zbl 1491.65009
[22] Durmus, A. and Moulines, É. (2017). Nonasymptotic convergence analysis for the unadjusted Langevin algorithm. Ann. Appl. Probab. 27 1551-1587. · Zbl 1377.65007
[23] Durmus, A. and Moulines, É. (2019). High-dimensional Bayesian inference via the unadjusted Langevin algorithm. Bernoulli 25 2854-2882. · Zbl 1428.62111
[24] Dwivedi, R., Chen, Y., Wainwright, M.J. and Yu, B. (2018). Log-concave sampling: Metropolis-Hastings algorithms are fast! In Proceedings of the 31st Conference on Learning Theory. Proceedings of Machine Learning Research 75 793-797. · Zbl 1440.62039
[25] Eberle, A., Guillin, A. and Zimmer, R. (2019). Couplings and quantitative contraction rates for Langevin dynamics. Ann. Probab. 47 1982-2010. · Zbl 1466.60160
[26] Griewank, A. (1993). Some bounds on the complexity of gradients, Jacobians, and Hessians. In Complexity in Numerical Optimization 128-162. River Edge, NJ: World Sci. Publ. · Zbl 0968.90524
[27] Helffer, B. and Nier, F. (2005). Hypoelliptic Estimates and Spectral Theory for Fokker-Planck Operators and Witten Laplacians. Lecture Notes in Math. 1862. Berlin: Springer. · Zbl 1072.35006
[28] Lamberton, D. and Pagès, G. (2002). Recursive computation of the invariant distribution of a diffusion. Bernoulli 8 367-405. · Zbl 1006.60074
[29] Lamberton, D. and Pagès, G. (2003). Recursive computation of the invariant distribution of a diffusion: The case of a weakly mean reverting drift. Stoch. Dyn. 3 435-451. · Zbl 1044.60069
[30] Luu, T.D., Fadili, J. and Chesneau, C. (2017). Sampling from non-smooth distribution through Langevin diffusion. Working paper or Preprint. · Zbl 1477.60009
[31] Mangoubi, O. and Smith, A. (2019). Mixing of Hamiltonian Monte Carlo on strongly log-concave distributions 2: Numerical integrators. In Proceedings of Machine Learning Research (K. Chaudhuri and M. Sugiyama, eds.). Proceedings of Machine Learning Research 89 586-595. PMLR.
[32] Nelson, E. (1967). Dynamical Theories of Brownian Motion. Princeton, N.J.: Princeton Univ. Press. · Zbl 0165.58502
[33] Pavliotis, G.A. (2014). Stochastic Processes and Applications: Diffusion Processes, the Fokker-Planck and Langevin Equations. Texts in Applied Mathematics 60. New York: Springer. · Zbl 1318.60003
[34] Pillai, N.S., Stuart, A.M. and Thiéry, A.H. (2012). Optimal scaling and diffusion limits for the Langevin algorithm in high dimensions. Ann. Appl. Probab. 22 2320-2356. · Zbl 1272.60053
[35] Raginsky, M., Rakhlin, A. and Telgarsky, M. (2017). Non-convex learning via stochastic gradient Langevin dynamics: A nonasymptotic analysis. In Proceedings of the 2017 Conference on Learning Theory (S. Kale and O. Shamir, eds.). Proceedings of Machine Learning Research 65 1674-1703. Amsterdam, Netherlands.
[36] Roberts, G.O. and Rosenthal, J.S. (1998). Optimal scaling of discrete approximations to Langevin diffusions. J. R. Stat. Soc. Ser. B. Stat. Methodol. 60 255-268. · Zbl 0913.60060
[37] Roberts, G.O. and Stramer, O. (2002). Langevin diffusions and Metropolis-Hastings algorithms Methodol. Comput. Appl. Probab.. 4 337-357. · Zbl 1033.65003
[38] Roberts, G.O. and Tweedie, R.L. (1996). Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 2 341-363. · Zbl 0870.60027
[39] Stramer, O. and Tweedie, R.L. (1999). Langevin-type models. I. Diffusions with given stationary distributions and their discretizations. Methodol. Comput. Appl. Probab. 1 283-306. · Zbl 0947.60071
[40] Stramer, O. and Tweedie, R.L. (1999). Langevin-type models. II. Self-targeting candidates for MCMC algorithms. Methodol. Comput. Appl. Probab. 1 307-328. · Zbl 0946.60063
[41] Xifara, T., Sherlock, C., Livingstone, S., Byrne, S. and Girolami, M. (2014). Langevin diffusions and the Metropolis-adjusted Langevin algorithm. Statist. Probab. Lett. 91 14-19. · Zbl 1298.60081
[42] Xu, P., Chen, J., Zou, D. and Gu, Q. (2017). Global convergence of Langevin dynamics based algorithms for nonconvex optimization. ArXiv e-prints.
[43] Zhang, Y., Liang, P. and Charikar, M. (2017). A hitting time analysis of stochastic gradient Langevin dynamics. In Proceedings of the 2017 Conference on Learning Theory (S. Kale and O. Shamir, eds.). Proceedings of Machine Learning Research 65 1980-2022. Amsterdam, Netherlands.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.