Durmus, Alain; Moulines, Éric High-dimensional Bayesian inference via the unadjusted Langevin algorithm. (English) Zbl 1428.62111 Bernoulli 25, No. 4A, 2854-2882 (2019). Summary: We consider in this paper the problem of sampling a high-dimensional probability distribution \(\pi\) having a density w.r.t. the Lebesgue measure on \(\mathbb{R}^d\), known up to a normalization constant \(x\mapsto\pi(x)=\mathrm{e}^{-U(x)}/\int_{\mathbb{R}^d}\mathrm{e}^{-U(y)}\,\mathrm{d}y\). Such problem naturally occurs for example in Bayesian inference and machine learning. Under the assumption that \(U\) is continuously differentiable, \( \nabla U\) is globally Lipschitz and \(U\) is strongly convex, we obtain non-asymptotic bounds for the convergence to stationarity in Wasserstein distance of order 2 and total variation distance of the sampling method based on the Euler discretization of the Langevin stochastic differential equation, for both constant and decreasing step sizes. The dependence on the dimension of the state space of these bounds is explicit. The convergence of an appropriately weighted empirical measure is also investigated and bounds for the mean square error and exponential deviation inequality are reported for functions which are measurable and bounded. An illustration to Bayesian inference for binary regression is presented to support our claims. Cited in 83 Documents MSC: 62F15 Bayesian inference 60J60 Diffusion processes 65C05 Monte Carlo methods 62J02 General nonlinear regression Keywords:Langevin diffusion; Markov chain Monte Carlo; Metropolis adjusted Langevin algorithm; rate of convergence; total variation distance; regression Software:reglogit; BayesLogit × Cite Format Result Cite Review PDF Full Text: DOI arXiv Euclid References: [1] Albert, J.H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc.88 669-679. · Zbl 0774.62031 · doi:10.1080/01621459.1993.10476321 [2] Borodin, A.N. and Salminen, P. (2002). Handbook of Brownian Motion—Facts and Formulae, 2nd ed. Probability and Its Applications. Basel: Birkhäuser. · Zbl 1012.60003 [3] Bubeck, S., Eldan, R. and Lehec, J. (2015). Finite-time analysis of projected Langevin Monte Carlo. In Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS’15 1243-1251. Cambridge, MA, USA: MIT Press. · Zbl 1397.65010 [4] Bubley, R., Dyer, M. and Jerrum, M. (1998). An elementary analysis of a procedure for sampling points in a convex body. Random Structures Algorithms12 213-235. · Zbl 0972.60037 · doi:10.1002/(SICI)1098-2418(199805)12:3<213::AID-RSA1>3.0.CO;2-Y [5] Chen, M.F. and Li, S.F. (1989). Coupling methods for multidimensional diffusion processes. Ann. Probab.17 151-177. · Zbl 0686.60083 · doi:10.1214/aop/1176991501 [6] Choi, H.M. and Hobert, J.P. (2013). The Polya-gamma Gibbs sampler for Bayesian logistic regression is uniformly ergodic. Electron. J. Stat.7 2054-2064. · Zbl 1349.60123 · doi:10.1214/13-EJS837 [7] Chopin, N. and Ridgway, J. (2017). Leave Pima Indians alone: Binary regression as a benchmark for Bayesian computation. Statist. Sci.32 64-87. · Zbl 1442.62007 · doi:10.1214/16-STS581 [8] Dalalyan, A.S. Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent. In Proceedings of the 30th Annual Conference on Learning Theory. [9] Dalalyan, A.S. (2017). Theoretical guarantees for approximate sampling from smooth and log-concave densities. J. R. Stat. Soc. Ser. B. Stat. Methodol.79 651-676. · Zbl 1411.62030 · doi:10.1111/rssb.12183 [10] Durmus, A. and Moulines, É. (2017). Nonasymptotic convergence analysis for the unadjusted Langevin algorithm. Ann. Appl. Probab.27 1551-1587. · Zbl 1377.65007 · doi:10.1214/16-AAP1238 [11] Durmus, A. and Moulines, É. (2019). Supplement to “High-dimensional Bayesian inference via the unadjusted Langevin algorithm.” DOI:10.3150/18-BEJ1073SUPP. · Zbl 1428.62111 [12] Eberle, A. Quantitative contraction rates for Markov chains on continuous state spaces. In preparation. · Zbl 1466.60137 · doi:10.1214/19-EJP287 [13] Eberle, A. (2016). Reflection couplings and contraction rates for diffusions. Probab. Theory Related Fields166 851-886. · Zbl 1367.60099 · doi:10.1007/s00440-015-0673-1 [14] Eberle, A., Guillin, A. and Zimmer, R. (2018). Quantitative Harris type theorems for diffusions and McKean-Vlasov processes. Trans. Amer. Math. Soc. To appear. · Zbl 1481.60154 · doi:10.1090/tran/7576 [15] Ermak, D.L. (1975). A computer simulation of charged particles in solution. I. Technique and equilibrium properties. J. Chem. Phys.62 4189-4196. [16] Faes, C., Ormerod, J.T. and Wand, M.P. (2011). Variational Bayesian inference for parametric and nonparametric regression with missing data. J. Amer. Statist. Assoc.106 959-971. · Zbl 1229.62028 · doi:10.1198/jasa.2011.tm10301 [17] Frühwirth-Schnatter, S. and Frühwirth, R. (2010). Data augmentation and MCMC for binary and multinomial logic models. In Statistical Modelling and Regression Structures 111-132. Heidelberg: Physica-Verlag/Springer. · Zbl 1431.62373 [18] Gramacy, R.B. and Polson, N.G. (2012). Simulation-based regularized logistic regression. Bayesian Anal.7 567-589. · Zbl 1330.62301 · doi:10.1214/12-BA719 [19] Grenander, U. (1996). Elements of Pattern Theory. Johns Hopkins Studies in the Mathematical Sciences. Baltimore, MD: Johns Hopkins Univ. Press. · Zbl 0869.68096 [20] Grenander, U. and Miller, M.I. (1994). Representations of knowledge in complex systems. J. Roy. Statist. Soc. Ser. B56 549-603. With discussion and a reply by the authors. · Zbl 0814.62009 · doi:10.1111/j.2517-6161.1994.tb02000.x [21] Hanson, T.E., Branscum, A.J. and Johnson, W.O. (2014). Informative \(g\) -priors for logistic regression. Bayesian Anal.9 597-611. · Zbl 1327.62395 [22] Holmes, C.C. and Held, L. (2006). Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Anal.1 145-168. · Zbl 1331.62142 · doi:10.1214/06-BA105 [23] Joulin, A. and Ollivier, Y. (2010). Curvature, concentration and error estimates for Markov chain Monte Carlo. Ann. Probab.38 2418-2442. · Zbl 1207.65006 · doi:10.1214/10-AOP541 [24] Karatzas, I. and Shreve, S.E. (1991). Brownian Motion and Stochastic Calculus, 2nd ed. Graduate Texts in Mathematics113. New York: Springer. · Zbl 0734.60060 [25] Klartag, B. (2007). A central limit theorem for convex sets. Invent. Math.168 91-131. · Zbl 1144.60021 · doi:10.1007/s00222-006-0028-8 [26] Lamberton, D. and Pagès, G. (2002). Recursive computation of the invariant distribution of a diffusion. Bernoulli8 367-405. · Zbl 1006.60074 [27] Lamberton, D. and Pagès, G. (2003). Recursive computation of the invariant distribution of a diffusion: The case of a weakly mean reverting drift. Stoch. Dyn.3 435-451. · Zbl 1044.60069 · doi:10.1142/S0219493703000838 [28] Lindvall, T. and Rogers, L.C.G. (1986). Coupling of multidimensional diffusions by reflection. Ann. Probab.14 860-872. · Zbl 0593.60076 · doi:10.1214/aop/1176992442 [29] Mattingly, J.C., Stuart, A.M. and Higham, D.J. (2002). Ergodicity for SDEs and approximations: Locally Lipschitz vector fields and degenerate noise. Stochastic Process. Appl.101 185-232. · Zbl 1075.60072 · doi:10.1016/S0304-4149(02)00150-3 [30] Neal, R.M. (1993). Bayesian learning via stochastic dynamics. In Advances in Neural Information Processing Systems 5, [NIPS Conference] 475-482. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. [31] Nesterov, Y. (2004). Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization87. Boston, MA: Kluwer Academic. · Zbl 1086.90045 [32] Parisi, G. (1981). Correlation functions and computer simulations. Nuclear Phys. B180 378-384. [33] Polson, N.G., Scott, J.G. and Windle, J. (2013). Bayesian inference for logistic models using Pólya-Gamma latent variables. J. Amer. Statist. Assoc.108 1339-1349. · Zbl 1283.62055 · doi:10.1080/01621459.2013.829001 [34] Roberts, G.O. and Tweedie, R.L. (1996). Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli2 341-363. · Zbl 0870.60027 · doi:10.2307/3318418 [35] Rossky, P.J., Doll, J.D. and Friedman, H.L. (1978). Brownian dynamics as smart Monte Carlo simulation. J. Chem. Phys.69 4628-4633. [36] Sabanés Bové, D. and Held, L. (2011). Hyper- \(g\) priors for generalized linear models. Bayesian Anal.6 387-410. · Zbl 1330.62058 [37] Talay, D. and Tubaro, L. (1990). Expansion of the global error for numerical schemes solving stochastic differential equations. Stoch. Anal. Appl.8 483-509. · Zbl 0718.60058 · doi:10.1080/07362999008809220 [38] Villani, C. (2009). Optimal Transport: Old and New. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 338. Berlin: Springer. · Zbl 1156.53003 [39] Welling, M. and Teh, Y.W. (2011). Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) 681-688. [40] Windle, J., Polson, N.G. and Scott, J.G. (2013). Bayeslogit: Bayesian logistic regression. R package version 0.2. Available at http://cran.r-project.org/web/packages/BayesLogit/index.html. This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.