×

zbMATH — the first resource for mathematics

Nonstationary value iteration in controlled Markov chains with risk-sensitive average criterion. (English) Zbl 1105.90101
Summary: This work concerns Markov decision chains with finite state spaces and compact action sets. The performance index is the long-run risk-sensitive average cost criterion, and it is assumed that, under each stationary policy, the state space is a communicating class and that the cost function and the transition law depend continuously on the action. These latter data are not directly available to the decision-maker, but convergent approximations are known or are more easily computed. In this context, the nonstationary value iteration algorithm is used to approximate the solution of the optimality equation, and to obtain a nearly optimal stationary policy.

MSC:
90C40 Markov and semi-Markov decision processes
93E20 Optimal stochastic control
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Bielecki, T., Hernández-Hernández, D. and Pliska, S. R. (1999). Risk sensitive control of finite state Markov chains in discrete time, with applications to portfolio management. Math . Meth. Operat. Res. 50 , 167–188. · Zbl 0959.91029
[2] Borkar, V. S. and Meyn, S. P. (2002). Risk-sensitive optimal control for Markov decision processes with monotone cost. Math . Operat. Res. 27 , 192–209. · Zbl 1082.90577
[3] Cavazos-Cadena, R. (1988). Necessary and sufficient conditions for a bounded solution to the optimality equation in average reward Markov decision chains. Systems Control Lett . 10 , 71–78. · Zbl 0645.90099
[4] Cavazos-Cadena, R. (2003). An alternative derivation of Birkhoff’s formula for the contraction coefficient of a positive matrix. Linear Algebra Appl . 375 , 291–297. · Zbl 1048.15018
[5] Cavazos-Cadena, R. and Fernández-Gaucherand, E. (2002). Risk-sensitive optimal control in communicating average Markov decision chains. In Modeling Uncertainty , eds M. Dror, P. L’Ecuyer and F. Szydarovszky, Kluwer, Boston, MA, pp. 515–553.
[6] Cavazos-Cadena, R. and Montes-de-Oca, R. (2003). The value iteration algorithm in risk-sensitive average Markov decision chains with finite state space. Math . Operat. Res. 28 , 752–776. · Zbl 1082.90125
[7] Di Masi, G. B. and Stettner, L. (1999). Risk-sensitive control of discrete-time Markov processes with infinite horizon. SIAM J . Control Optimization 38 , 61–78. · Zbl 0946.93043
[8] Di Masi, G. B. and Stettner, L. (2000). Infinite horizon risk sensitive control of discrete time Markov processes with small risk. Systems Control Lett . 40 , 15–20. · Zbl 0977.93083
[9] Duncan, T. E., Pasik-Duncan, B. and Stettner, L. (2001). Risk sensitive adaptive control of discrete time Markov processes. Prob . Math. Statist. 21 , 493–512. · Zbl 1030.93062
[10] Federgruen, A. and Schweitzer, P. J. (1981). Nonstationary Markov decision problems with converging parameters. J . Optimization Theory Appl. 34 , 207–241. · Zbl 0426.90091
[11] Hernández-Lerma, O. (1989). Adaptive Markov Control Processes . Springer, New York. · Zbl 0698.90053
[12] Puterman, M. L. (1994). Markov Decision Processes : Discrete Stochastic Dynamic Programming. John Wiley, New York. · Zbl 0829.90134
[13] Royden, H. L. (1968). Real Analysis . MacMillan, London. · Zbl 0121.05501
[14] Schweitzer, P. J. (1971). Iterative solution of the functional equations of undiscounted Markov renewal programming. J . Math. Anal. Appl. 34 , 495–501. · Zbl 0218.90070
[15] Seneta, E. (1981). Non -Negative Matrices and Markov Chains, 2nd edn. Springer, New York. · Zbl 0471.60001
[16] Thomas, L. C. (1980). Connectedness conditions for denumerable state Markov decision processes. In Recent Advances in Markov Decision Processes , eds R. Hartley, L. C. Thomas and D. J. White, Academic Press, New York, pp. 181–204.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.