×

zbMATH — the first resource for mathematics

Growth rates and average optimality in risk-sensitive Markov decision chains. (English) Zbl 1154.90612
Summary: We focus the attention on characterizations of policies maximizing growth rate of expected utility, along with average of the associated certainty equivalent, in risk-sensitive Markov decision chains with finite state and action spaces. In contrast to the existing literature the problem is handled by methods of stochastic dynamic programming on condition that the transition probabilities are replaced by general nonnegative matrices. Using the block-triangular decomposition of a collection of nonnegative matrices we establish necessary and sufficient conditions guaranteeing independence of optimal values on starting state along with partition of the state space into subsets with constant optimal values. Finally, for models with growth rate independent of the starting state we show how the methods work if we minimize growth rate or average of the certainty equivalent.

MSC:
90C40 Markov and semi-Markov decision processes
60J10 Markov chains (discrete-time Markov processes on discrete state spaces)
93E20 Optimal stochastic control
PDF BibTeX XML Cite
Full Text: Link EuDML
References:
[1] Bellman R.: On a quasi-linear equation. Canadian J. Math. 8 (1956), 198-202 · Zbl 0071.34902 · doi:10.4153/CJM-1956-023-8
[2] Bellman R.: Dynamic Programming. Princenton Univ. Press, Princenton, N.J. 1957 · Zbl 1205.90002
[3] Berman A., Plemmons R. J.: Nonnegative Matrices in the Mathematical Sciences. Academic Press, New York 1979 · Zbl 0815.15016
[4] Bielecki T. D., Hernández-Hernández, D., Pliska S. R.: Risk-sensitive control of finite state Markov chains in discrete time, with application to portfolio management. Math. Methods Oper. Res. 50 (1999), 167-188 · Zbl 0959.91029 · doi:10.1007/s001860050094
[5] Cavazos-Cadena R.: Solution to the risk-sensitive average cost optimality equation in a class of Markov decision processes with finite state space. Math. Methods Oper. Res. 57 (2003), 253-285 · Zbl 1023.90076 · doi:10.1007/s001860200256
[6] Cavazos-Cadena R., Montes-de-Oca R.: The value iteration algorithm in risk-sensitive average Markov decision chains with finite state space. Math. Oper. Res. 28 (2003), 752-756 · Zbl 1082.90125 · doi:10.1287/moor.28.4.752.20515
[7] Cavazos-Cadena R., Montes-de-Oca R.: Nonstationary value iteration in controlled Markov chains with risk-sensitive average criterion. J. Appl. Probab. 42 (2005), 905-918 · Zbl 1105.90101 · doi:10.1239/jap/1134587805
[8] Cavazos-Cadena R., Hernández-Hernández D.: Solution fo the risk-sensitive average optimality equation in communicating Markov decision chains with finite state space: An alternative approach. Math. Methods Oper. Res. 56 (2002), 473-479 · Zbl 1040.93070 · doi:10.1007/s001860200229
[9] Cavazos-Cadena R., Hernández-Hernández D.: A characterization exponential functionals in finite Markov chains. Math. Methods Oper. Res. 60 (2004), 399-414 · Zbl 1072.60053 · doi:10.1007/s001860400373
[10] Cavazos-Cadena R., Hernández-Hernández D.: A characterization of the optimal risk-sensitive average cost in finite controlled Markov chains. Ann. Appl. Probab. 15 (2005), 175-212 · Zbl 1076.93045 · doi:10.1214/105051604000000585 · arxiv:math/0503478
[11] Gantmakher F. R.: The Theory of Matrices. Chelsea, London 1959 · Zbl 0050.24804
[12] Howard R. A., Matheson J.: Risk-sensitive Markov decision processes. Manag. Sci. 23 (1972), 356-369 · Zbl 0238.90007 · doi:10.1287/mnsc.18.7.356
[13] Jaquette S. A.: A utility criterion for Markov decision processes. Manag. Sci. 23 (1976), 43-49 · Zbl 0337.90053 · doi:10.1287/mnsc.23.1.43
[14] Mandl P.: Decomposable non-negative matrices in a dynamic programming problem. Czechoslovak Math. J. 20 (1970), 504-510 · Zbl 0209.22902 · eudml:12547
[15] Rothblum U. G., Whittle P.: Growth optimality for branching Markov decision chains. Math. Oper. Res. 7 (1982), 582-601 · Zbl 0498.90082 · doi:10.1287/moor.7.4.582
[16] Sladký K.: On dynamic programming recursions for multiplicative Markov decision chains. Math. Programming Study 6 (1976), 216-226 · Zbl 0374.60089
[17] Sladký K.: Successive approximation methods for dynamic programming models. Proc. of the Third Formator Symposium on the Analysis of Large-Scale Systems (J. Beneš and L. Bakule. Academia, Prague 1979, pp. 171-189 · Zbl 0496.90081
[18] Sladký K.: Bounds on discrete dynamic programming recursions I. Kybernetika 16 (1980), 526-547 · Zbl 0454.90085 · eudml:28460
[19] Whittle P.: Optimization Over Time - Dynamic Programming and Stochastic Control. Volume II, Chapter 35, Wiley, Chichester 1983 · Zbl 0577.90046
[20] Zijm W. H. M.: Nonnegative Matrices in Dynamic Programming. Mathematical Centre Tract, Amsterdam 1983 · Zbl 0526.90059
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.