Identification of optimal policies in Markov decision processes. (English) Zbl 1195.93148

Summary: We focus attention on identifying optimal policies and on elimination suboptimal policies minimizing optimality criteria in discrete-time Markov decision processes with finite state space and compact action set. We present unified approach to value iteration algorithms that enables to generate lower and upper bounds on optimal values, as well as on the current policy. Using the modified value iterations it is possible to eliminate suboptimal actions and to identify an optimal policy or nearly optimal policies in a finite number of steps without knowing precise values of the performance function.


93E20 Optimal stochastic control
90C40 Markov and semi-Markov decision processes
60J10 Markov chains (discrete-time Markov processes on discrete state spaces)
Full Text: EuDML Link


[1] Cruz-Suárez, D., Montes-de-Oca, R.: Uniform convergence of the value iteration policies for discounted Markov decision processes. Bol. de la Soc. Mat. Mexicana 12 (2006), 133-148. · Zbl 1136.90042
[2] Cruz-Suárez, D., Montes-de-Oca, R., Salem-Silva, F.: Uniform approximations of discounted Markov decision processes to optimal policies. Proceedings of Prague Stochastics 2006 (M. Hušková and M. Janžura, Matfyzpress, Prague 2006, pp. 278-287. · Zbl 1131.90068
[3] Grinold, J.: Elimination of suboptimal actions in Markov decision problems. Oper. Res. 21 (1973), 848-851. · Zbl 0259.90053
[4] Hastings, N. A. J.: Bounds on the gain of a Markov decision processes. Oper. Res. 19 (1971), 240-243. · Zbl 0216.54804
[5] Hastings, N. A. J., Mello, J.: Tests for suboptimal actions in discounted Markov programming. Manag. Sci. 19 (1971), 1019-1022. · Zbl 0304.90116
[6] Hastings, N. A. J., Mello, J.: Tests for suboptimal actions in undiscounted Markov decision chains. Manag. Sci. 23 (1976), 87-91. · Zbl 0345.90042
[7] MacQueen, J.: A modified dynamic programming method for Markov decision problems. J. Math. Anal. Appl. 14 (1966), 38-43. · Zbl 0141.17203
[8] MacQueen, J.: A test of suboptimal actions in Markovian decision problems. Oper. Res. 15 (1967), 559-561. · Zbl 0171.18401
[9] Odoni, A. R.: On finding the maximal gain for Markov decision processes. Oper. Res. 17 (1969), 857-860. · Zbl 0184.23202
[10] Puterman, M. L., Shin, M. C.: Modified policy iteration algorithms for discounted Markov decision problems. Manag. Sci. 24 (1978), 1127-1137. · Zbl 0391.90093
[11] Puterman, M. L., Shin, M. C.: Action elimination procedures for modified policy iteration algorithm. Oper. Res. 30 (1982), 301-318. · Zbl 0481.90089
[12] Puterman, M. L.: Markov Decision Processes - Discrete Stochastic Dynamic Programming. Wiley, New York 1994. · Zbl 1184.90170
[13] Sladký, K.: O metodě postupných aproximací pro nalezení optimálního řízení markovského řetězce (On successive approximation method for finding optimal control of a Markov chain). Kybernetika 4 (1969), 2, 167-176.
[14] White, D. J.: Dynamic programming, Markov chains and the method of successive approximation. J. Math. Anal. Appl. 6 (1963), 296-306. · Zbl 0124.36404
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.