zbMATH — the first resource for mathematics

The value iteration algorithm in risk-sensitive average Markov decision chains with finite state space. (English) Zbl 1082.90125
Summary: This work concerns discrete-time Markov decision chains with finite state space and bounded costs. The controller has constant risk sensitivity \(\lambda\), and the performance of a control policy is measured by the corresponding risk-sensitive average cost criterion. Assuming that the optimality equation has a solution, it is shown that the value iteration scheme can be implemented to obtain, in a finite number of steps, (1) an approximation to the optimal \(\lambda\)-sensitive average cost with an error less than a given tolerance, and (2) a stationary policy whose performance index is arbitrarily close to the optimal value. The argument used to establish these results is based on a modification of the original model, which is an extension of a transformation introduced by P. J. Schweitzer [J. Math. Anal. Appl. 34, 495–501 (1971; Zbl 0218.90070)] to analyze the the risk-neutral case.

90C40 Markov and semi-Markov decision processes
90C39 Dynamic programming
Full Text: DOI