×

zbMATH — the first resource for mathematics

An optimality system for finite average Markov decision chains under risk-aversion. (English) Zbl 1243.93127
Summary: This work concerns controlled Markov chains with finite state space and compact action sets. The decision maker is risk-averse with constant risk-sensitivity, and the performance of a control policy is measured by the long-run average cost criterion. Under standard continuity-compactness conditions, it is shown that the (possibly non-constant) optimal value function is characterized by a system of optimality equations which allows to obtain an optimal stationary policy. Also, it is shown that the optimal superior and inferior limit average cost functions coincide.

MSC:
93E20 Optimal stochastic control
60J05 Discrete-time Markov processes on general state spaces
93C55 Discrete-time control/observation systems
90C40 Markov and semi-Markov decision processes
49K45 Optimality conditions for problems involving randomness
PDF BibTeX XML Cite
Full Text: Link EuDML
References:
[1] A. Arapstathis, V. K. Borkar, E. Fernández-Gaucherand, M. K. Gosh, S. I. Marcus: Discrete-time controlled Markov processes with average cost criteria: a survey. SIAM J. Control Optim. 31 (1993), 282-334. · Zbl 0770.93064 · doi:10.1137/0331018
[2] P. Billingsley: Probability and Measure. Third edition. Wiley, New York 1995. · Zbl 0822.60002
[3] R. Cavazos-Cadena, E. Fernández-Gaucherand: Controlled Markov chains with risk-sensitive criteria: average cost, optimality equations and optimal solutions. Math. Method Optim. Res. 43 (1999), 121-139. · Zbl 0953.93077
[4] R. Cavazos-Cadena, E. Fernández-Gaucherand: Risk-sensitive control in communicating average Markov decision chains. Modelling Uncertainty: An examination of Stochastic Theory, Methods and Applications (M. Dror, P. L’Ecuyer and F. Szidarovsky, Kluwer, Boston 2002, pp. 525-544.
[5] R. Cavazos-Cadena: Solution to the risk-sensitive average cost optimality equation in a class of Markov decision processes with finite state space. Math. Method Optim. Res. 57 (2003), 263-285. · Zbl 1023.90076 · doi:10.1007/s001860200256
[6] R. Cavazos-Cadena, D. Hernández-Hernández: A characterization of the optimal risk-sensitive average cost in finite controlled Markov chains. Ann. App. Probab., 15 (2005), 175-212. · Zbl 1076.93045 · doi:10.1214/105051604000000585 · arxiv:math/0503478
[7] R. Cavazos-Cadena, D. Hernández-Hernández: A system of Poisson equations for a non-constant Varadhan functional on a finite state space. Appl. Math. Optim. 53 (2006), 101-119. · Zbl 1110.60066 · doi:10.1007/s00245-005-0840-3
[8] R. Cavazos-Cadena, F. Salem-Silva: The discounted method and equivalence of average criteria for risk-sensitive Markov decision processes on Borel spaces. Appl. Math. Optim. 61 (2009), 167-190. · Zbl 1196.60127 · doi:10.1007/s00245-009-9080-2
[9] G. B. Di Masi, L. Stettner: Risk-sensitive control of discrete time Markov processes with infinite horizon. SIAM J. Control Optim. 38 1999, 61-78. · Zbl 0946.93043 · doi:10.1137/S0363012997320614
[10] G. B. Di Masi, L. Stettner: Infinite horizon risk sensitive control of discrete time Markov processes with small risk. Syst. Control Lett. 40 (2000), 15-20. · Zbl 0977.93083 · doi:10.1016/S0167-6911(99)00118-8
[11] G. B. Di Masi, L. Stettner: Infinite horizon risk sensitive control of discrete time Markov processes under minorization property. SIAM J. Control Optim. 46 (2007), 231-252. · Zbl 1141.93067 · doi:10.1137/040618631
[12] W. H. Fleming, W. M. McEneany: Risk-sensitive control on an infinite horizon. SIAM J. Control Optim. 33 (1995), 1881-1915. · Zbl 0949.93079 · doi:10.1137/S0363012993258720
[13] F. R. Gantmakher: The Theory of Matrices. Chelsea, London 1959. · Zbl 0050.24804
[14] D. Hernández-Hernández, S. I. Marcus: Risk-sensitive control of Markov processes in countable state space. Syst. Control Lett. 29 (1996), 147-155. · Zbl 0866.93101 · doi:10.1016/S0167-6911(96)00051-5
[15] D. Hernández-Hernández, S. I. Marcus: Existence of risk sensitive optimal stationary policies for controlled Markov processes. Appl. Math. Optim. 40 (1999), 273-285. · Zbl 0937.90115 · doi:10.1007/s002459900126
[16] A. R. Howard, J. E. Matheson: Risk-sensitive Markov decision processes. Management Sci. 18 (1972), 356-369. · Zbl 0238.90007 · doi:10.1287/mnsc.18.7.356
[17] D. H. Jacobson: Optimal stochastic linear systems with exponential performance criteria and their relation to stochastic differential games. IEEE Trans. Automat. Control 18 (1973), 124-131. · Zbl 0274.93067 · doi:10.1109/TAC.1973.1100265
[18] S. C. Jaquette: Markov decison processes with a new optimality criterion: discrete time. Ann. Statist. 1 (1973), 496-505. · Zbl 0259.90054 · doi:10.1214/aos/1176342415
[19] S. C. Jaquette: A utility criterion for Markov decision processes. Management Sci. 23 (1976), 43-49. · Zbl 0337.90053 · doi:10.1287/mnsc.23.1.43
[20] A. Jaśkiewicz: Average optimality for risk sensitive control with general state space. Ann. App. Probab. 17 (2007), 654-675. · Zbl 1128.93056 · doi:10.1214/105051606000000790 · arxiv:0704.0394
[21] U. G. Rothblum, P. Whittle: Growth optimality for branching Markov decision chains. Math. Oper. Res. 7 (1982), 582-601. · Zbl 0498.90082 · doi:10.1287/moor.7.4.582
[22] K. Sladký: Successive approximation methods for dynamic programming models. Proc. Third Formator Symposium on the Analysis of Large-Scale Systems (J. Beneš and L. Bakule, Academia, Prague 1979, pp. 171-189. · Zbl 0496.90081
[23] K. Sladký: Bounds on discrete dynamic programming recursions I. Kybernetika 16 (1980), 526-547. · Zbl 0454.90085 · eudml:28460
[24] K. Sladký: Growth rates and average optimality in risk-sensitive Markov decision chains. Kybernetika 44 (2008), 205-226. · Zbl 1154.90612 · www.kybernetika.cz · eudml:33922
[25] K. Sladký, R. Montes-de-Oca: Risk-sensitive average optimality in Markov decision chains. Operations Research Proceedings, Vol. 2007, Part III (2008), pp. 69-74. · Zbl 1209.90348 · doi:10.1007/978-3-540-77903-2_11
[26] P. Whittle: Optimization Over Time-Dynamic Programming and Stochastic Control. Wiley, Chichester 1983. · Zbl 0577.90046
[27] W. H. M. Zijm: Nonnegative Matrices in Dynamic Programming. Mathematical Centre Tract, Amsterdam 1983. · Zbl 0526.90059
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.