First passage optimality and variance minimisation of Markov decision processes with varying discount factors. (English) Zbl 1327.90374

Summary: This paper deals with the first passage optimality and variance minimisation problems of discrete-time Markov decision processes (MDPs) with varying discount factors and unbounded rewards/costs. First, under suitable conditions slightly weaker than those in the previous literature on the standard (infinite horizon) discounted MDPs, we establish the existence and characterisation of the first passage expected-optimal stationary policies. Second, to further distinguish the expected-optimal stationary policies, we introduce the variance minimisation problem, prove that it is equivalent to a new first passage optimality problem of MDPs, and, thus, show the existence of a variance-optimal policy that minimises the variance over the set of all first passage expected-optimal stationary policies. Finally, we use a computable example to illustrate our main results and also to show the difference between the first passage optimality here and the standard discount optimality of MDPs in the previous literature.


90C40 Markov and semi-Markov decision processes
93E20 Optimal stochastic control
60J27 Continuous-time Markov processes on discrete state spaces
Full Text: DOI Euclid


[1] Bäuerle, N. and Rieder, U. (2011). Markov Decision Processes with Applications in Finance. Springer, Heidelberg. · Zbl 1236.90004
[2] Derman, C. (1970). Finite State Markovian Decision Processes (Math. Sci. Eng. 67 ). Academic Press, New York. · Zbl 0262.90001
[3] Feinberg, E. A. and Shwartz, A. (1994). Markov decision models with weighted discounted criteria. Math. Operat. Res. 19, 152-168. · Zbl 0803.90123
[4] González-Hernández, J., López-Martínez, R. R. and Minjárez-Sosa, J. A. (2008). Adaptive policies for stochastic systems under a randomized discounted cost criterion. Bol. Soc. Mat. Mexicana (3) 14, 149-163. · Zbl 1201.93130
[5] Guo, X. and Hernández-Lerma, O. (2009). Continuous-Time Markov Decision Processes. Springer, Berlin. · Zbl 1209.90002
[6] Guo, X. and Song, X. (2009). Mean-variance criteria for finite continuous-time Markov decision processes. IEEE Trans. Automatic Control 54, 2151-2157. · Zbl 1367.90113
[7] Guo, X., Hernández-del-Valle, A. and Hernández-Lerma, O. (2012). First passage problems for nonstationary discrete-time stochastic control systems. Europ. J. Control 18, 528-538. · Zbl 1291.93328
[8] Guo, X., Ye, L. and Yin, G. (2012). A mean-variance optimization problem for discounted Markov decision processes. Europ. J. Operat. Res. 220, 423-429. · Zbl 1253.90214
[9] Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-Time Markov Control Processes. Springer, New York. · Zbl 0928.93002
[10] Hernández-Lerma, O. and Lasserre, J. B. (1999). Further Topics on Discrete-Time Markov Control Processes. Springer, New York. · Zbl 0928.93002
[11] Hernández-Lerma, O., Vega-Amaya, O. and Carrasco, G. (1999). Sample-path optimality and variance-minimization of average cost Markov control processes. SIAM J. Control Optimization 38, 79-93. · Zbl 0951.93074
[12] Hordijk, A. and Yushkevich, A. A. (1999). Blackwell optimality in the class of all policies in Markov decision chains with a Borel state space and unbounded rewards. Math. Meth. Operat. Res. 50, 421-448. · Zbl 0939.90020
[13] Huang, Y. and Guo, X. (2009). Optimal risk probability for first passage models in semi- Markov decision processes. J. Math. Anal. Appl. 359, 404-420. · Zbl 1176.90625
[14] Huang, Y.-H. and Guo, X.-P. (2011). First passage models for denumerable semi-Markov decision processes with nonnegative discounted costs. Acta. Math. Appl. Sinica (English Ser.) 27, 177-190. · Zbl 1235.90177
[15] Kurano, M. (1987). Markov decision processes with a minimum-variance criterion. J. Math. Anal. Appl. 123, 572-583. · Zbl 0619.90080
[16] Liu, J. and Huang, S. (2001). Markov decision processes with distribution function criterion of first-passage time. Appl. Math. Optimization 43, 187-201. · Zbl 1014.90110
[17] Liu, J. Y. and Liu, K. (1992). Markov decision programming–the first passage model with denumerable state space. Systems Sci. Math. Sci. 5, 340-351. · Zbl 0795.90082
[18] Mamabolo, R. M. and Beichelt, F. E. (2004). Maintenance policies with minimal repair. Econ. Qual. Control 19, 143-166. · Zbl 1061.62165
[19] Prieto-Rumeau, T. and Hernández-Lerma, O. (2009). Variance minimization and the overtaking optimality approach to continuous-time controlled Markov chains. Math. Meth. Operat. Res. 70, 527-540. · Zbl 1177.93101
[20] Puterman, M. L. (1994). Markov Decision Processes. John Wiley, New York. · Zbl 0829.90134
[21] Schäl, M. (2005). Control of ruin probabilities by discrete-time investments. Math. Meth. Operat. Res. 62, 141-158. · Zbl 1101.93087
[22] Sobel, M. J. (1982). The variance of discounted Markov decision processes. J. Appl. Prob. 19, 794-802. · Zbl 0503.90091
[23] Wei, Q. and Guo, X. (2011). Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Operat. Res. Lett. 39, 369-374. · Zbl 1235.90178
[24] Yu, S. X., Lin, Y. and Yan, P. (1998). Optimization models for the first arrival target distribution function in discrete time. J. Math. Analysis Appl. 225, 193-223. · Zbl 0924.90133
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.