Monte Carlo gradient estimation in machine learning.

*(English)*Zbl 07255163Summary: This paper is a broad and accessible survey of the methods we have at our disposal for Monte Carlo gradient estimation in machine learning and across the statistical sciences: the problem of computing the gradient of an expectation of a function with respect to parameters defining the distribution that is integrated; the problem of sensitivity analysis. In machine learning research, this gradient problem lies at the core of many learning problems, in supervised, unsupervised and reinforcement learning. We will generally seek to rewrite such gradients in a form that allows for Monte Carlo estimation, allowing them to be easily and efficiently used and analysed. We explore three strategies – the pathwise, score function, and measure-valued gradient estimators – exploring their historical development, derivation, and underlying assumptions. We describe their use in other fields, show how they are related and can be combined, and expand on their possible generalisations. Wherever Monte Carlo gradient estimators have been derived and deployed in the past, important advances have followed. A deeper and more widely-held understanding of this problem will lead to further advances, and it is these advances that we wish to support.

##### MSC:

68T05 | Learning and adaptive systems in artificial intelligence |

##### Keywords:

gradient estimation; Monte Carlo; sensitivity analysis; score-function estimator; pathwise estimator; measure-valued estimator; variance reduction
PDF
BibTeX
XML
Cite

\textit{S. Mohamed} et al., J. Mach. Learn. Res. 21, Paper No. 132, 62 p. (2020; Zbl 07255163)

Full Text:
Link

##### References:

[1] | A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy. Deep variational information bottleneck. In International Conference on Learning Representations, 2017. |

[2] | A. N. Avramidis and J. R. Wilson. A splitting scheme for control variates.Operations Research Letters, 14(4):187-198, 1993. |

[3] | F. L. Bauer. Computational graphs and rounding error.SIAM Journal on Numerical Analysis, 11 (1):87-96, 1974. |

[4] | A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind. Automatic differentiation in machine learning: A survey.Journal of Machine Learning Research, 18:1-43, 2018. |

[5] | Y. Bengio, E. Laufer, G. Alain, and J. Yosinski. Deep generative stochastic networks trainable by backprop. InInternational Conference on Machine Learning, 2014. |

[6] | E. Benhamou. Optimal Malliavin weighting function for the computation of the Greeks.Mathematical Finance: An International Journal of Mathematics, Statistics and Financial Economics, 13 (1):37-53, 2003. |

[7] | P. J. Bickel and K. A. Doksum.Mathematical Statistics: Basic Ideas and Selected Topics, Volume I. CRC Press, 2015. |

[8] | P. Billingsley.Probability and Measure. John Wiley & Sons, 2008. |

[9] | D. M. Blei, A. Kucukelbir, and J. D. McAuliffe. Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518):859-877, 2017. |

[10] | G. Bonnet. Transformations des signaux al´eatoires a travers les systemes non lin´eaires sans m´emoire. Annales des T´el´ecommunications, 19:203-220, 1964. |

[11] | A. Buchholz, F. Wenzel, and S. Mandt. Quasi-Monte Carlo variational inference. InInternational Conference on Machine Learning, 2018. |

[12] | L. Buesing, T. Weber, and S. Mohamed. Stochastic gradient estimation with finite differences. In NeurIPS Workshop on Advances in Approximate Inference, 2016. |

[13] | L. Capriotti. Reducing the variance of likelihood ratio Greeks in Monte Carlo. InWinter Simulation Conference, 2008. |

[14] | C. G. Cassandras and S. Lafortune.Introduction to Discrete Event Systems. Springer Science & Business Media, 2009. |

[15] | K. Chaloner and I. Verdinelli. Bayesian experimental design: A review.Statistical Science, 10: 273-304, 1995. |

[16] | N. Chen and P. Glasserman. Malliavin Greeks without Malliavin calculus.Stochastic Processes and their Applications, 117(11):1689-1723, 2007. |

[17] | N. Chriss.Black Scholes and Beyond: Option Pricing Models. McGraw-Hill, 1996. |

[18] | Y. Cong, M. Zhao, K. Bai, and L. Carin. GO gradient for expectation-based objectives. InInternational Conference on Learning Representations, 2019. |

[19] | M. P. Deisenroth, G. Neumann, and J. Peters.A Survey on Policy Search for Robotics, volume 2. Now Publishers, Inc., 2013. |

[20] | L. Devroye. Random variate generation in one line of code. InWinter simulation Conference, 1996. |

[21] | L. Devroye.Nonuniform random variate generation. Elsevier, 2006. |

[22] | D. Dua and C. Graff. UCI machine learning repository, 2017. URLhttp://archive.ics.uci.edu/ ml. |

[23] | S. A. Eslami, D. J. Rezende, F. Besse, F. Viola, A. S. Morcos, M. Garnelo, A. Ruderman, A. A. Rusu, I. Danihelka, K. Gregor, et al. Neural scene representation and rendering.Science, 360 (6394):1204-1210, 2018. |

[24] | K. Fan, Z. Wang, J. Beck, J. Kwok, and K. A. Heller. Fast second order stochastic backpropagation for variational inference. InAdvances in Neural Information Processing Systems, 2015. |

[25] | M. Figurnov, S. Mohamed, and A. Mnih. Implicit reparameterization gradients. InAdvances in Neural Information Processing Systems, 2018. |

[26] | H. Flanders. Differentiation under the integral sign.The American Mathematical Monthly, 80(6): 615-627, 1973. |

[27] | J. Foerster, G. Farquhar, M. Al-Shedivat, T. Rockt¨aschel, E. P. Xing, and S. Whiteson. DiCE: The infinitely differentiable Monte Carlo estimator. InInternational Conference on Machine Learning, 2018. |

[28] | E. Fourni´e, J.-M. Lasry, J. Lebuchoux, P.-L. Lions, and N. Touzi. Applications of Malliavin calculus to Monte Carlo methods in finance.Finance and Stochastics, 3(4):391-412, 1999. |

[29] | D. A. Fournier, H. J. Skaug, J. Ancheta, J. Ianelli, A. Magnusson, M. N. Maunder, A. Nielsen, and J. Sibert. AD model builder: Using automatic differentiation for statistical inference of highly parameterized complex nonlinear models.Optimization Methods & Software, 27(2):233-249, 2012. |

[30] | M. C. Fu. Optimization via simulation: A review.Annals of Operations Research, 53(1):199-247, 1994. |

[31] | M. C. Fu and J.-Q. Hu. Second derivative sample path estimators for the GI/G/M queue.Management Science, 39(3):359-383, 1993. |

[32] | M. C. Fu and J.-Q. Hu.Conditional Monte Carlo: Gradient estimation and optimization applications. Springer Science & Business Media, 2012. |

[33] | M. C. Fu, B. Heidergott, H. Leahu, and F. Vazquez-Abad. Differentiation via logarithmic expansions. Asia-Pacific Journal of Operational Research, 2018. |

[34] | J. Geweke. Getting it right: Joint distribution tests of posterior simulators.Journal of the American Statistical Association, 99(467):799-804, 2004. |

[35] | P. Glasserman.Monte Carlo Methods in Financial Engineering. Springer Science & Business Media, 2013. |

[36] | P. Glasserman and Y. C. Ho.Gradient estimation via Perturbation Analysis. Springer Science & Business Media, 1991. |

[37] | P. W. Glynn. Likelilood ratio gradient estimation: An overview. InWinter simulation Conference, 1987. |

[38] | P. W. Glynn. Likelihood ratio gradient estimation for stochastic systems.Communications of the ACM, 33(10):75-84, 1990. |

[39] | P. W. Glynn and D. L. Iglehart. Importance sampling for stochastic simulations.Management Science, 35(11):1367-1392, 1989. |

[40] | P. W. Glynn and P. L’Ecuyer. Likelihood ratio gradient estimation for stochastic recursions.Advances in applied probability, 27(4):1019-1053, 1995. |

[41] | P. W. Glynn and R. Szechtman. Some new perspectives on the method of control variates. InMonte Carlo and Quasi-Monte Carlo Methods 2000, pages 27-49. Springer, 2002. |

[42] | P. W. Glynn and W. Whitt. Indirect estimation vial=λw.Operations Research, 37(1):82-103, 1989. |

[43] | E. Gobet and R. Munos. Sensitivity analysis using Itˆo-Malliavin calculus and martingales, and application to stochastic optimal control.SIAM Journal on Control and Optimization, 43(5): 1676-1713, 2005. |

[44] | W.-B. Gong and Y. C. Ho. Smoothed (conditional) perturbation analysis of discrete event dynamical systems.IEEE Transactions on Automatic Control, 32(10):858-866, 1987. |

[45] | I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. InAdvances in Neural Information Processing Systems, 2014. |

[46] | J. Gorham and L. Mackey. Measuring sample quality with Stein’s method. InAdvances in Neural Information Processing Systems, pages 226-234, 2015. |

[47] | C. Gouri´eroux and A. Monfort.Simulation-based Econometric Methods. Oxford University Press, 1996. |

[48] | A. Graves. Stochastic backpropagation through mixture density distributions.arXiv:1607.05690, 2016. |

[49] | E. Greensmith, P. L. Bartlett, and J. Baxter. Variance reduction techniques for gradient estimates in reinforcement learning.Journal of Machine Learning Research, 5:1471-1530, 2004. |

[50] | A. Griewank et al. On automatic differentiation.Mathematical Programming: Recent Developments and Applications, 6(6):83-107, 1989. |

[51] | G. Grimmett and D. Stirzaker.Probability and Random Processes. Oxford University Press, 2001. |

[52] | S. Gu, S. Levine, I. Sutskever, and A. Mnih. MuProp: Unbiased backpropagation for stochastic neural networks. InInternational Conference on Learning Representations, 2016. |

[53] | J. Hartford, G. Lewis, K. Leyton-Brown, and M. Taddy. Counterfactual prediction with deep instrumental variables networks. InInternational Conference on Machine Learning, 2016. |

[54] | N. Heess, G. Wayne, D. Silver, T. Lillicrap, T. Erez, and Y. Tassa. Learning continuous control policies by stochastic value gradients. InAdvances in Neural Information Processing Systems, 2015. |

[55] | P. Heidelberger, X.-R. Cao, M. A. Zazanis, and R. Suri. Convergence properties of infinitesimal perturbation analysis estimates.Management Science, 34(11):1281-1302, 1988. |

[56] | B. Heidergott and H. Leahu. Weak differentiability of product measures.Mathematics of Operations Research, 35(1):27-51, 2010. |

[57] | B. Heidergott and F. V´azquez-Abad. Measure-valued differentiation for stochastic processes: The finite horizon case. InEURANDOM report 2000-033, 2000. |

[58] | B. Heidergott, G. Pflug, and F. V´azquez-Abad. Measure-valued differentiation for stochastic systems: From simple distributions to Markov chains. 2003. |

[59] | B. Heidergott, F. J. V´azquez-Abad, and W. Volk-Makarewicz. Sensitivity estimation for Gaussian systems.European Journal of Operational Research, 187(1):193-207, 2008. |

[60] | I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner. Beta-VAE: Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations, 2017. |

[61] | Y. Ho and X. Cao. Optimization and perturbation analysis of queueing networks.Journal of Optimization Theory and Applications, 40(4):559-582, 1983. |

[62] | Y. C. L. Ho and X. R. Cao.Perturbation analysis of discrete event dynamic systems, volume 145. Springer Science & Business Media, 2012. |

[63] | M. Hoffman and D. Blei. Stochastic structured variational inference. InInternational Conference on Artificial Intelligence and Statistics, 2015. |

[64] | A. Honkela and H. Valpola. Variational learning and bits-back coding: An information-theoretic view to Bayesian learning.IEEE Transactions on Neural Networks, 15(4):800-810, 2004. |

[65] | X. Hu, L. Prashanth, A. Gy¨orgy, and C. Szepesv´ari. (Bandit) Convex optimization with biased noisy gradient oracles. InInternational Conference on Artificial Intelligence and Statistics, 2016. |

[66] | T. Jaakkola and M. Jordan. A variational approach to Bayesian logistic regression models and their extensions. InInternational Conference on Artificial Intelligence and Statistics, 1997. |

[67] | S. Jacobson, A. Buss, and L. Schruben. Driving frequency selection for frequency domain simulation experiments. Technical report, Cornell University Operations Research and Industrial Engineering, 1986. |

[68] | S. H. Jacobson. Optimal mean squared error analysis of the harmonic gradient estimators.Journal of Optimization Theory and Applications, 80(3):573-590, 1994. |

[69] | S. H. Jacobson and L. Schruben. A harmonic analysis approach to simulation sensitivity analysis. IIE Transactions, 31(3):231-243, 1999. |

[70] | M. Jankowiak and T. Karaletsos. Pathwise derivatives for multivariate distributions. InInternational Conference on Artificial Intelligence and Statistics, 2019. |

[71] | M. Jankowiak and F. Obermeyer. Pathwise derivatives beyond the reparameterization trick. In International Conference on Machine Learning, 2018. |

[72] | M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul. An introduction to variational methods for graphical models.Machine learning, 37(2):183-233, 1999. |

[73] | R. Kapuscinski and S. Tayur. Optimal policies and simulation-based optimization for capacitated production inventory systems. InQuantitative Models for Supply Chain Management, pages 7-40. Springer, 1999. |

[74] | M. E. Khan, G. Bouchard, K. P. Murphy, and B. M. Marlin. Variational bounds for mixed-data factor analysis. InAdvances in Neural Information Processing Systems, 2010. |

[75] | D. Kingma and M. Welling. Efficient gradient-based inference through transformations between Bayes nets and neural nets. InInternational Conference on Machine Learning, 2014a. |

[76] | D. P. Kingma and M. Welling. Auto-encoding variational Bayes. InInternational Conference on Learning Representations, 2014b. |

[77] | J. P. Kleijnen and R. Y. Rubinstein. Optimization and sensitivity analysis of computer simulation models by the score function method.European Journal of Operational Research, 88(3):413-427, 1996. |

[78] | A. Kucukelbir, D. Tran, R. Ranganath, A. Gelman, and D. M. Blei. Automatic differentiation variational inference.Journal of Machine Learning Research, 18(1):430-474, 2017. |

[79] | H. Kushner and G. G. Yin.Stochastic Approximation and Recursive Algorithms and Applications, volume 35. Springer Science & Business Media, 2003. |

[80] | M. J. Kusner, B. Paige, and J. M. Hern´andez-Lobato. Grammar variational autoencoder. InInternational Conference on Machine Learning, 2017. |

[81] | P. L’Ecuyer. Note: On the interchange of derivative and expectation for likelihood ratio derivative estimators.Management Science, 41(4):738-747, 1995. |

[82] | W. Lee, H. Yu, and H. Yang. Reparameterization gradient for non-differentiable models. InAdvances in Neural Information Processing Systems, 2018. |

[83] | E. L. Lehmann and G. Casella.Theory of Point Estimation. Springer Science & Business Media, 2006. |

[84] | G. Leobacher and F. Pillichshammer.Introduction to Quasi-Monte Carlo Integration and Applications. Springer, 2014. |

[85] | Y. Li and R. E. Turner. Gradient estimators for implicit models. InInternational Conference on Learning Representations, 2017. |

[86] | T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra. Continuous control with deep reinforcement learning. InInternational Conference on Learning Representations, 2016. |

[87] | Q. Liu, J. Lee, and M. Jordan. A kernelized Stein discrepancy for goodness-of-fit tests. InInternational Conference on Machine Learning, 2016. |

[88] | I. Loshchilov and F. Hutter. SGDR: Stochastic gradient descent with warm restarts. InInternational Conference on Learning Representations, 2017. |

[89] | C. J. Maddison, A. Mnih, and Y. W. Teh. The Concrete distribution: A continuous relaxation of discrete random variables. InInternational Conference on Learning Representations, 2016. |

[90] | J. Mao, J. Foerster, T. Rockt¨aschel, M. Al-Shedivat, G. Farquhar, and S. Whiteson. A baseline for any order gradient estimation in stochastic computation graphs. InInternational Conference on Machine Learning, 2018. |

[91] | Z. Mark and Y. Baram. The bias-variance dilemma of the Monte Carlo method. InInternational Conference on Artificial Neural Networks, 2001. |

[92] | B. M. Marlin, M. E. Khan, and K. P. Murphy. Piecewise bounds for estimating Bernoulli-logistic latent Gaussian models. InInternational Conference on Machine Learning, 2011. |

[93] | N. Metropolis and S. Ulam. The Monte Carlo method.Journal of the American Statistical Association, 44(247):335-341, 1949. |

[94] | A. Miller, N. Foti, A. D’Amour, and R. P. Adams. Reducing reparameterization gradient variance. InAdvances in Neural Information Processing Systems, 2017. |

[95] | L. B. Miller. Monte Carlo analysis of reactivity coefficients in fast reactors general theory and applications. Technical report, Argonne National Laboratory, 1967. |

[96] | A. Mnih and K. Gregor. Neural variational inference and learning in belief networks. InAdvances in Neural Information Processing Systems, 2014. |

[97] | S. Mohamed and B. Lakshminarayanan. Learning in implicit generative models.arXiv:1610.03483, 2016. |

[98] | R. Munos. Policy gradient in continuous time.Journal of Machine Learning Research, 7:771-791, 2006. |

[99] | K. P. Murphy.Machine Learning: A Probabilistic Perspective. MIT Press, 2012. |

[100] | C. Naesseth, F. Ruiz, S. Linderman, and D. Blei. Reparameterization gradients through acceptancerejection sampling algorithms. InInternational Conference on Artificial Intelligence and Statistics, 2017. |

[101] | B. L. Nelson. Control variate remedies.Operations Research, 38(6):974-992, 1990. |

[102] | C. Newell.Applications of Queueing Theory. Springer Science & Business Media, 2013. |

[103] | D. Nualart.The Malliavin Calculus and Related Topics. Springer, 2006. |

[104] | C. J. Oates, M. Girolami, and N. Chopin. Control functionals for Monte Carlo integration.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(3):695-718, 2017. |

[105] | M. Opper and C. Archambeau. The variational Gaussian approximation revisited.Neural Computation, 21(3):786-792, 2009. |

[106] | A. B. Owen.Monte Carlo theory, methods and examples. 2013. |

[107] | J. Paisley, D. Blei, and M. Jordan. Variational Bayesian inference with stochastic search.International Conference in Machine Learning, 2012. |

[108] | O. Papaspiliopoulos, G. O. Roberts, and M. Sk¨old. A general framework for the parametrization of hierarchical models.Statistical Science, 22(1):59-73, 2007. |

[109] | U. Paquet and N. Koenigstein. One-class collaborative filtering with random graphs. InInternational Conference on World Wide Web, 2013. |

[110] | P. Parmas. Total stochastic gradient algorithms and applications in reinforcement learning. In Advances in Neural Information Processing Systems, 2018. |

[111] | P. Parmas, C. E. Rasmussen, J. Peters, and K. Doya. PIPPS: Flexible model-based policy search robust to the curse of chaos. InInternational Conference on Machine Learning, 2018. |

[112] | G. C. Pflug. Sampling derivatives of probabilities.Computing, 42(4):315-328, 1989. |

[113] | G. C. Pflug.Optimization of Stochastic Models: The Interface between Simulation and Optimization. Springer Science & Business Media, 1996. |

[114] | R. Price. A useful theorem for nonlinear devices having Gaussian inputs.IRE Transactions on Information Theory, 4(2):69-72, 1958. |

[115] | R. Ranganath.Black Box Variational Inference: Scalable, Generic Bayesian Computation and its Applications. PhD thesis, Princeton University, 2017. |

[116] | R. Ranganath, S. Gerrish, and D. Blei. Black box variational inference. InInternational Conference on Artificial Intelligence and Statistics, 2014. |

[117] | R. Ranganath, D. Tran, J. Altosaar, and D. Blei. Operator variational inference. InAdvances in Neural Information Processing Systems, 2016. |

[118] | S. Ravuri, S. Mohamed, M. Rosca, and O. Vinyals. Learning implicit generative models with the method of learned moments. InInternational Conference on Machine Learning, 2018. |

[119] | M. I. Reiman and A. Weiss. Sensitivity analysis for simulations via likelihood ratios.Operations Research, 37(5):830-844, 1989. |

[120] | D. Rezende and S. Mohamed. Variational inference with normalizing flows. InInternational Conference on Machine Learning, 2015. |

[121] | D. J. Rezende, S. Mohamed, and D. Wierstra. Stochastic backpropagation and approximate inference in deep generative models. InInternational Conference on Machine Learning, 2014. |

[122] | D. Ritchie, P. Horsfall, and N. D. Goodman. Deep amortized inference for probabilistic programs. arXiv:1610.05735, 2016. |

[123] | H. Robbins and S. Monro. A stochastic approximation method.The Annals of Mathematical Statistics, 22:400-407, 1951. |

[124] | C. Robert and G. Casella.Monte Carlo Statistical Methods. Springer Science & Business Media, 2013. |

[125] | M. Rosca, M. Figurnov, S. Mohamed, and A. Mnih. Measure-valued derivatives for approximate Bayesian inference. InNeurIPS Workshop on Approximate Bayesian Inference, 2019. |

[126] | R. Y. Rubinstein.Some Problems in Monte Carlo Optimization. PhD thesis, University of Riga, Latvia., 1969. |

[127] | R. Y. Rubinstein.Monte Carlo Optimization, Simulation and Sensitivity of Queueing Networks. John Wiley & Sons, 1986. |

[128] | R. Y. Rubinstein. Sensitivity analysis of discrete event systems by the push out method.Annals of Operations Research, 39(1):229-250, 1992. |

[129] | R. Y. Rubinstein and J. Kreimer. About one Monte Carlo method for solving linear equations. Mathematics and Computers in Simulation, 25(4):321-334, 1983. |

[130] | R. Y. Rubinstein and A. Shapiro.Discrete Event Systems: Sensitivity analysis and stochastic optimization by the score function method. John Wiley & Sons Inc, 1993. |

[131] | R. Y. Rubinstein, A. Shapiro, and S. Uryas´ev. The score function method.Encyclopedia of Management Sciences, 1996. |

[132] | F. R. Ruiz, M. Titsias, and D. Blei. The generalized reparameterization gradient. InAdvances in Neural Information Processing Systems, 2016. |

[133] | T. Salimans and D. A. Knowles. Fixed-form variational posterior approximation through stochastic linear regression.Bayesian Analysis, 8(4):837-882, 2013. |

[134] | F. Santambrogio.Optimal Transport for Applied Mathematicians. Springer, 2015. |

[135] | J. Schulman, N. Heess, T. Weber, and P. Abbeel. Gradient estimation using stochastic computation graphs. InAdvances in Neural Information Processing Systems, 2015. |

[136] | B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. De Freitas. Taking the human out of the loop: A review of Bayesian optimization.Proceedings of the IEEE, 104(1):148-175, 2016. |

[137] | J. Shi, S. Sun, and J. Zhu. A spectral approach to gradient estimation for implicit distributions. In International Conference on Machine Learning, 2018. |

[138] | P. Siekman. New victories in the supply-chain revolution still looking for ways to tighten shipping, inventory, and even manufacturing costs at your company? Here’s how four masters of supplychain efficiency are doing it.Fortune Magazine, 2000. URLhttps://money.cnn.com/magazines/ fortune/fortune_archive/2000/10/30/290626/index.htm. |

[139] | D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. Mastering the game of Go with deep neural networks and tree search.Nature, 529(7587):484, 2016. |

[140] | S. M. Stigler. The epic story of maximum likelihood.Statistical Science, 22(4):598-620, 2007. |

[141] | R. Suri and M. A. Zazanis. Perturbation analysis gives strongly consistent sensitivity estimates for the M/G/1 queue.Management Science, 34(1):39-64, 1988. |

[142] | R. S. Sutton and A. Barto.Reinforcement Learning: An Introduction. MIT Press, 1998. |

[143] | R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. InAdvances in Neural Information Processing Systems, 2000. |

[144] | M. Titsias and M. L´azaro-Gredilla. Doubly stochastic variational Bayes for non-conjugate inference. InInternational Conference on Machine Learning, 2014. |

[145] | M. Titsias and M. L´azaro-Gredilla. Local expectation gradients for black box variational inference. InAdvances in Neural Information Processing Systems, 2015. |

[146] | G. Tucker, A. Mnih, C. J. Maddison, J. Lawson, and J. Sohl-Dickstein. REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models. InAdvances in Neural Information Processing Systems, 2017. |

[147] | F. J. V´azquez-Abad. A course on sensitivity analysis for gradient estimation of DES performance measures. InDiscrete Event Systems, pages 3-28. Springer, 2000. |

[148] | F. J. V´azquez-Abad and S. H. Jacobson. Application of RPA and the harmonic gradient estimators to a priority queueing system. InWinter Simulation Conference, 1994. |

[149] | C. J. Walder, P. Rousse, R. Nock, C. S. Ong, and M. Sugiyama. New tricks for estimating gradients of expectations.arXiv:1901.11311, 2019. |

[150] | C. Wang, X. Chen, A. J. Smola, and E. P. Xing. Variance reduction for stochastic gradient optimization. InAdvances in Neural Information Processing Systems, 2013. |

[151] | L. Weaver and N. Tao. The optimal reward baseline for gradient-based reinforcement learning. In Uncertainty in Artificial Intelligence, 2001. |

[152] | T. Weber, N. Heess, L. Buesing, and D. Silver. Credit assignment techniques in stochastic computation graphs. InInternational Conference on Artificial Intelligence and Statistics, 2019. |

[153] | R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning.Machine Learning, 8:5-32, 1992. |

[154] | J. T. Wilson, F. Hutter, and M. P. Deisenroth. Maximizing acquisition functions for Bayesian optimization. InAdvances in Neural Information Processing Systems, 2018. |

[155] | D. Wingate and T. Weber.Automated variational inference in probabilistic programming. arXiv:1301.1299, 2013. |

[156] | D. Wingate, N. Goodman, A. Stuhlm¨uller, and J. M. Siskind. Nonstandard interpretations of probabilistic programs for efficient inference. InAdvances in Neural Information Processing Systems, 2011. |

[157] | J. Winn and C. M. Bishop. Variational message passing.Journal of Machine Learning Research, 6: 661-694, 2005. |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.