×

Continuous-time Markov decision processes with state-dependent discount factors. (English) Zbl 1281.90082

Summary: We consider continuous-time Markov decision processes in Polish spaces. The performance of a control policy is measured by the expected discounted reward criterion associated with state-dependent discount factors. All underlying Markov processes are determined by the given transition rates which are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. By using the dynamic programming approach, we establish the discounted reward optimality equation (DROE) and the existence and uniqueness of its solutions. Under suitable conditions, we also obtain a discounted optimal stationary policy which is optimal in the class of all randomized stationary policies. Moreover, when the transition rates are uniformly bounded, we provide an algorithm to compute (or at least to approximate) the discounted reward optimal value function as well as a discounted optimal stationary policy. Finally, we use an example to illustrate our results. Specially, we first derive an explicit and exact solution to the DROE and an explicit expression of a discounted optimal stationary policy for such an example.

MSC:

90C40 Markov and semi-Markov decision processes
60J27 Continuous-time Markov processes on discrete state spaces
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Carmon, Y., Shwartz, A.: Markov decision processes with exponentially representable discounting. Oper. Res. Lett. 37(1), 51–55 (2009) · Zbl 1154.90610 · doi:10.1016/j.orl.2008.10.005
[2] Doshi, B.T.: Continuous-time control of Markov processes on an arbitrary state space: discounted rewards. Ann. Stat. 4, 1219–1235 (1976) · Zbl 0345.93073 · doi:10.1214/aos/1176343653
[3] Dynkin, E.B., Yushkevich, A.A.: Controlled Markov Processes. Springer, New York (1979) · Zbl 0073.34801
[4] Feinberg, E.A.: Continuous-time jump Markov decision processes: A discrete-event approach. Math. Oper. Res. 29, 492–524 (2004) · Zbl 1082.90126 · doi:10.1287/moor.1040.0089
[5] Feinberg, E.A., Shwartz, A.: Markov decision models with weighted discounted criteria. Math. Oper. Res. 19(1), 152–168 (1994) · Zbl 0803.90123 · doi:10.1287/moor.19.1.152
[6] Feinberg, E.A., Shwartz, A.: Constrained dynamic programming with two discount factors: Applications and algorithm. IEEE Trans. Autom. Control 44(3), 628–631 (1999) · Zbl 0957.90127 · doi:10.1109/9.751365
[7] Feller, W.: On the integro-differential equations of purely discontinuous Markoff processes. Trans. Am. Math. Soc. 48, 488–515 (1940) · JFM 66.0624.02 · doi:10.1090/S0002-9947-1940-0002697-3
[8] Gihman, I.I., Skorohod, A.V.: The Theory of Stochastic Processes II. Springer, Berlin (2004) (This is a reprint of the First edition 1975) · Zbl 0305.60027
[9] González-Hernández, J., López-Martínez, R.R., Pérez-Hernández, J.R.: Markov control processes with randomized discounted cost. Math. Methods Oper. Res. 65, 27–44 (2007) · Zbl 1126.90075
[10] Guo, X.P.: Continuous-time Markov decision processes with discounted rewards: The case of Polish spaces. Math. Oper. Res. 32(1), 73–87 (2007) · Zbl 1278.90426 · doi:10.1287/moor.1060.0210
[11] Guo, X.P.: Constrained optimization for average cost continuous-time Markov decision processes. IEEE Trans. Autom. Control 52(6), 1139–1143 (2007) · Zbl 1366.90217 · doi:10.1109/TAC.2007.899040
[12] Guo, X.P., Hernández-Lerma, O.: Continuous-time controlled Markov chains. Ann. Appl. Probab. 13, 363–388 (2003) · Zbl 1049.60067 · doi:10.1214/aoap/1042765671
[13] Guo, X.P., Hernández-Lerma, O.: Continuous-time Markov Decision Processes: Theory and Applications. Springer, Berlin (2009) · Zbl 1209.90002
[14] Guo, X.P., Song, X.Y.: Discounted continuous-time constrained Markov decision processes in Polish spaces. Ann. Appl. Probab. 21(5), 2016–2049 (2011) · Zbl 1258.90104 · doi:10.1214/10-AAP749
[15] Guo, X.P., Ye, L.E.: New discount and average optimality conditions for continuous-time Markov decision processes. Adv. Appl. Probab. 42, 953–985 (2010) · Zbl 1225.90152 · doi:10.1239/aap/1293113146
[16] Hernández-Lerma, O., Lasserre, J.B.: Discrete-Time Markov Control Processes. Springer, New York (1996) · Zbl 0853.93106
[17] Hernández-Lerma, O., Lasserre, J.B.: Further Topics on Discrete-Time Markov Control Processes. Springer, New York (1999) · Zbl 0928.93002
[18] Hinderer, K.: Foundations of Non Stationary Dynamic Programming with Discrete Time Parameter. Lecture Notes in Operations Research, vol. 33. Springer, New York (1970) · Zbl 0202.18401
[19] Kakumanu, P.: Continuously discounted Markov decision models with countable state and action spaces. Ann. Math. Stat. 42, 919–926 (1971) · Zbl 0234.93027 · doi:10.1214/aoms/1177693321
[20] Puterman, M.L.: Markov Decision Processes. Wiley, New York (1994)
[21] Schäl, M.: Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal. Z. Wahrscheinlichkeitstheor. Verw. Geb. 32(3), 179–196 (1975) · Zbl 0316.90080 · doi:10.1007/BF00532612
[22] Ye, L.E., Guo, X.P.: (2012) Construction and regularity of transition functions on Polish spaces under measurability conditions. Acta Math. Appl. Sin. (accepted) · Zbl 1263.60070
[23] Ye, L.E., Guo, X.P., Hernández-Lerma, O.: Existence and regularity of a nonhomogeneous transition matrix under measurability conditions. J. Theor. Probab. 21, 604–627 (2008) · Zbl 1147.60050 · doi:10.1007/s10959-008-0163-9
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.