# zbMATH — the first resource for mathematics

The value iteration algorithm in risk-sensitive average Markov decision chains with finite state space. (English) Zbl 1082.90125
Summary: This work concerns discrete-time Markov decision chains with finite state space and bounded costs. The controller has constant risk sensitivity $$\lambda$$, and the performance of a control policy is measured by the corresponding risk-sensitive average cost criterion. Assuming that the optimality equation has a solution, it is shown that the value iteration scheme can be implemented to obtain, in a finite number of steps, (1) an approximation to the optimal $$\lambda$$-sensitive average cost with an error less than a given tolerance, and (2) a stationary policy whose performance index is arbitrarily close to the optimal value. The argument used to establish these results is based on a modification of the original model, which is an extension of a transformation introduced by P. J. Schweitzer [J. Math. Anal. Appl. 34, 495–501 (1971; Zbl 0218.90070)] to analyze the the risk-neutral case.

##### MSC:
 90C40 Markov and semi-Markov decision processes 90C39 Dynamic programming
Full Text: