# zbMATH — the first resource for mathematics

Nonstationary value iteration in controlled Markov chains with risk-sensitive average criterion. (English) Zbl 1105.90101
Summary: This work concerns Markov decision chains with finite state spaces and compact action sets. The performance index is the long-run risk-sensitive average cost criterion, and it is assumed that, under each stationary policy, the state space is a communicating class and that the cost function and the transition law depend continuously on the action. These latter data are not directly available to the decision-maker, but convergent approximations are known or are more easily computed. In this context, the nonstationary value iteration algorithm is used to approximate the solution of the optimality equation, and to obtain a nearly optimal stationary policy.

##### MSC:
 90C40 Markov and semi-Markov decision processes 93E20 Optimal stochastic control
Full Text:
##### References:
  Bielecki, T., Hernández-Hernández, D. and Pliska, S. R. (1999). Risk sensitive control of finite state Markov chains in discrete time, with applications to portfolio management. Math . Meth. Operat. Res. 50 , 167–188. · Zbl 0959.91029  Borkar, V. S. and Meyn, S. P. (2002). Risk-sensitive optimal control for Markov decision processes with monotone cost. Math . Operat. Res. 27 , 192–209. · Zbl 1082.90577  Cavazos-Cadena, R. (1988). Necessary and sufficient conditions for a bounded solution to the optimality equation in average reward Markov decision chains. Systems Control Lett . 10 , 71–78. · Zbl 0645.90099  Cavazos-Cadena, R. (2003). An alternative derivation of Birkhoff’s formula for the contraction coefficient of a positive matrix. Linear Algebra Appl . 375 , 291–297. · Zbl 1048.15018  Cavazos-Cadena, R. and Fernández-Gaucherand, E. (2002). Risk-sensitive optimal control in communicating average Markov decision chains. In Modeling Uncertainty , eds M. Dror, P. L’Ecuyer and F. Szydarovszky, Kluwer, Boston, MA, pp. 515–553.  Cavazos-Cadena, R. and Montes-de-Oca, R. (2003). The value iteration algorithm in risk-sensitive average Markov decision chains with finite state space. Math . Operat. Res. 28 , 752–776. · Zbl 1082.90125  Di Masi, G. B. and Stettner, L. (1999). Risk-sensitive control of discrete-time Markov processes with infinite horizon. SIAM J . Control Optimization 38 , 61–78. · Zbl 0946.93043  Di Masi, G. B. and Stettner, L. (2000). Infinite horizon risk sensitive control of discrete time Markov processes with small risk. Systems Control Lett . 40 , 15–20. · Zbl 0977.93083  Duncan, T. E., Pasik-Duncan, B. and Stettner, L. (2001). Risk sensitive adaptive control of discrete time Markov processes. Prob . Math. Statist. 21 , 493–512. · Zbl 1030.93062  Federgruen, A. and Schweitzer, P. J. (1981). Nonstationary Markov decision problems with converging parameters. J . Optimization Theory Appl. 34 , 207–241. · Zbl 0426.90091  Hernández-Lerma, O. (1989). Adaptive Markov Control Processes . Springer, New York. · Zbl 0698.90053  Puterman, M. L. (1994). Markov Decision Processes : Discrete Stochastic Dynamic Programming. John Wiley, New York. · Zbl 0829.90134  Royden, H. L. (1968). Real Analysis . MacMillan, London. · Zbl 0121.05501  Schweitzer, P. J. (1971). Iterative solution of the functional equations of undiscounted Markov renewal programming. J . Math. Anal. Appl. 34 , 495–501. · Zbl 0218.90070  Seneta, E. (1981). Non -Negative Matrices and Markov Chains, 2nd edn. Springer, New York. · Zbl 0471.60001  Thomas, L. C. (1980). Connectedness conditions for denumerable state Markov decision processes. In Recent Advances in Markov Decision Processes , eds R. Hartley, L. C. Thomas and D. J. White, Academic Press, New York, pp. 181–204.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.