## A version of the Euler equation in discounted Markov decision processes.(English)Zbl 1272.49045

The paper deals with optimal control problems in discrete time and with an infinite horizon. The problems are considered by means of the Markov decision processes theory. The optimal control problems are proposed to be solved with the dynamic programming technique. The optimal solution is characterized by a functional equation known as the dynamic programming equation. In simple cases, the value iteration procedure is used to approximate the optimal value function. However, this method does not work in more complicated functional forms. The essential tool used in the paper is the Euler equation, this equation is established and solved (in some cases empirically).
The authors present an iterative method of finding the solution of the Euler equation in terms of the value iteration function. Under certain conditions, the validity of the Euler equation can be guaranteed. Using the maximizers’ convergence of the optimal policy, the optimal control problem is solved. A linear quadratic problem illustrates the theory.

### MSC:

 49L20 Dynamic programming in optimal control and differential games 49N10 Linear-quadratic optimal control problems 90C39 Dynamic programming 90C40 Markov and semi-Markov decision processes 90B50 Management decision making, including multiple objectives
Full Text:

### References:

 [1] D. P. Bertsekas, Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall, Belmont, Tenn, USA, 1987. · Zbl 0649.93001 [2] E. B. Dynkin and A. A. Yushkevich, Controlled Markov Processes, vol. 235, Springer, New York, NY, USA, 1979. · Zbl 0073.34801 [3] O. Hernández-Lerma and J. B. Lasserre, Discrete-Time Markov Control Processes: Basic Optimality Criteria, vol. 30, Springer, New York, NY, USA, 1996. · Zbl 0840.93001 [4] O. Hernández-Lerma and J. B. Lasserre, Further Topics on Discrete-Time Markov Control Processes, vol. 42, Springer, New York, NY, USA, 1999. · Zbl 0928.93002 [5] N. Stokey, R. Lucas, and E. Prescott, Recursive Methods in Economic Dynamics, Harvard University Press, Cambridge, UK, 1989. · Zbl 0774.90018 [6] K. J. Arrow, “A note on uncertainty and discounting in models of economic growth,” Journal of Risk and Uncertainty, vol. 38, no. 2, pp. 87-94, 2009. · Zbl 1166.91321 [7] W. A. Brock and L. J. Mirman, “Optimal economic growth and uncertainty: the discounted case,” Journal of Economic Theory, vol. 4, no. 3, pp. 479-513, 1972. [8] H. Cruz-Suárez, R. Montes-de-Oca, and G. Zacarías, “A consumption-investment problem modelled as a discounted Markov decision process,” Kybernetika, vol. 47, no. 6, pp. 740-760, 2011. · Zbl 1241.93053 [9] T. Kamihigashi, “Stochastic optimal growth with bounded or unbounded utility and with bounded or unbounded shocks,” Journal of Mathematical Economics, vol. 43, no. 3-4, pp. 477-500, 2007. · Zbl 1154.91032 [10] I. Karatzas and W. D. Sudderth, “Two characterizations of optimality in dynamic programming,” Applied Mathematics and Optimization, vol. 61, no. 3, pp. 421-434, 2010. · Zbl 1196.49023 [11] A. Jaśkiewicz and A. S. Nowak, “Discounted dynamic programming with unbounded returns: application to economic models,” Journal of Mathematical Analysis and Applications, vol. 378, no. 2, pp. 450-462, 2011. · Zbl 1254.90292 [12] D. Levhari and T. N. Srinivasan, “Optimal savings under uncertainty,” Review of Economic Studies, vol. 36, pp. 153-163, 1969. [13] L. J. Mirman and I. Zilcha, “On optimal growth under uncertainty,” Journal of Economic Theory, vol. 2, no. 3, pp. 329-339, 1975. · Zbl 0362.90024 [14] H. Cruz-Suárez and R. Montes-de-Oca, “Discounted Markov control processes induced by deterministic systems,” Kybernetika, vol. 42, no. 6, pp. 647-664, 2006. · Zbl 1249.90312 [15] L. M. Benveniste and J. A. Scheinkman, “On the differentiability of the value function in dynamic models of economics,” Econometrica, vol. 47, no. 3, pp. 727-732, 1979. · Zbl 0435.90031 [16] H. Cruz-Suárez and R. Montes-de-Oca, “An envelope theorem and some applications to discounted Markov decision processes,” Mathematical Methods of Operations Research, vol. 67, no. 2, pp. 299-321, 2008. · Zbl 1149.90171 [17] P. Milgrom and I. Segal, “Envelope theorems for arbitrary choice sets,” Econometrica. Journal of the Econometric Society, vol. 70, no. 2, pp. 583-601, 2002. · Zbl 1103.90400 [18] X.-R. Cao, Stochastic Learning and Optimization, Springer, New York, NY, USA, 2007. [19] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, USA, 1970. · Zbl 0193.18401 [20] D. Cruz-Suárez, R. Montes-de-Oca, and F. Salem-Silva, “Conditions for the uniqueness of optimal policies of discounted Markov decision processes,” Mathematical Methods of Operations Research, vol. 60, no. 3, pp. 415-436, 2004. · Zbl 1104.90053 [21] A. De la Fuente, Mathematical Methods and Models for Economists, Cambridge University Press, Cambridge, UK, 2000. · Zbl 0943.91001
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.