First passage models for denumerable semi-Markov decision processes with nonnegative discounted costs. (English) Zbl 1235.90177

Summary: This paper considers a first passage model for discounted semi-Markov decision processes with denumerable states and nonnegative costs. The criterion to be optimized is the expected discounted cost incurred during a first passage time to a given target set. We first construct a semi-Markov decision process under a given semi-Markov decision kernel and a policy. Then, we prove that the value function satisfies the optimality equation and there exists an optimal (or \(\varepsilon \)-optimal) stationary policy under suitable conditions by using a minimum nonnegative solution approach. Further we give some properties of optimal policies. In addition, a value iteration algorithm for computing the value function and optimal policies is developed and an example is given. Finally, it is showed that our model is an extension of the first passage models for both discrete-time and continuous-time Markov decision processes.


90C40 Markov and semi-Markov decision processes
93E20 Optimal stochastic control
Full Text: DOI


[1] Derman, C. Finite state Markov decision processes. Academic Press, New York, 1970 · Zbl 0262.90001
[2] Guo, X.P., Hernández-Lerma, O. Continuous-time controlled Markov chains. Ann. Appl. Probab., 13: 363–388 (2003) · Zbl 1049.60067
[3] Guo, X.P., Hernández-Lerma, O. Continuous-time controlled Markov chains with discounted rewards. Acta. Appl. Math., 79: 195–216 (2003) · Zbl 1043.93067
[4] Guo X.P., Hernández-Lerma, O. Continuous-Time Markov Decision Processes: Theory and Applications (Chapter 3). Springer-Verlag, Berlin Heidelberg, 2009 · Zbl 1209.90002
[5] Hernández-Lerma, O., Lasserre, J.B. Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer-Verlag, New York, 1996 · Zbl 0840.93001
[6] Hernández-Lerma, O., Lasserre, J.B. Futher Topics on Discrete-Time Markov Control Processes. Springer-Verlag, New York, 1999 · Zbl 0928.93002
[7] Limnios, N., Oprisan, J. Semi-Markov Processes and Reliability. Birkhäuser, Boston, 2001
[8] Lin, Y.L. Continuous time first arrival target models (1)-discounted moment optimal models. Acta. Math. Appl. Sinica, 14: 115–124 (1991) (in Chinese) · Zbl 0728.90086
[9] Lin, Y.L., Tomkins, R.J., Wang, C.L. Optimal models for the first arrival time distribution function in continuous time-with a special case. Acta. Math. Appl. Sinica, 10: 194–212 (1994) · Zbl 0831.90121
[10] Lippman, S.A. Semi-Markov decision processes with unbounded rewards. Management Science, 19: 717–731 (1973) · Zbl 0259.60044
[11] Liu, J.Y., Huang, S.M. Markov decision processes with distribution function criterion of first-passage time. Appl. Math. Optim., 43: 187–201 (2001) · Zbl 1014.90110
[12] Liu, J.Y., Liu, K. Markov decision programming-the first passage model with denumerable state space. Systems Sci. Math. Sci., 5: 340–351 (1992) · Zbl 0795.90082
[13] Liu, J.Y., Liu, K. Markov decision programming-the moment optimal problem for the first-passage model. J. Austral. Math. Soc. Ser. B, 38: 542–562 (1997) · Zbl 0893.90172
[14] Ohtsubo, Y. Optimal threshold probability in undiscounted Markov decision processes with a target set. Appl. Math. Anal. Comp., 149: 519–532 (2004) · Zbl 1084.91022
[15] Puterman, M.L. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons. Inc., New York, 1994 · Zbl 0829.90134
[16] Ross, S.M. Average cost semi-Markov decision processes. J. Appl. Probab., 7: 649–656 (1970) · Zbl 0204.51704
[17] Sennott, L.I. Stochastic Dynamic Programming and the Control of Queuing Systems. John Wiley & Sons. Inc., New York, 1999 · Zbl 0997.93503
[18] Yu, S.X., Lin, Y.L., Yan, P.F. Optimization models for the first arrival target distribution function in discrete time. J. Math. Anal. Appl., 225: 193–223 (1998) · Zbl 0924.90133
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.