##
**First passage Markov decision processes with constraints and varying discount factors.**
*(English)*
Zbl 1317.90319

Summary: This paper focuses on the constrained optimality problem (COP) of first passage discrete-time Markov decision processes (DTMDPs) in denumerable state and compact Borel action spaces with multi-constraints, state-dependent discount factors, and possibly unbounded costs. By means of the properties of a so-called occupation measure of a policy, we show that the constrained optimality problem is equivalent to an (infinite-dimensional) linear programming on the set of occupation measures with some constraints, and thus prove the existence of an optimal policy under suitable conditions. Furthermore, using the equivalence between the constrained optimality problem and the linear programming, we obtain an exact form of an optimal policy for the case of finite states and actions. Finally, as an example, a controlled queueing system is given to illustrate our results.

### MSC:

90C40 | Markov and semi-Markov decision processes |

60J27 | Continuous-time Markov processes on discrete state spaces |

### Keywords:

discrete-time Markov decision process (DTMDP); constrained optimality; varying discount factor; unbounded cost
PDFBibTeX
XMLCite

\textit{X. Wu} et al., Front. Math. China 10, No. 4, 1005--1023 (2015; Zbl 1317.90319)

Full Text:
DOI

### References:

[1] | Altman E. Denumerable constrained Markov decision processes and finite approximations. Math Methods Oper Res, 1994, 19: 169-191 · Zbl 0807.90122 |

[2] | Altman E. Constrained Markov Decision Processes. Boca Raton: Chapman & Hall/CRC, 1999 · Zbl 0963.90068 |

[3] | Alvarez-Mena J, Hern´andez-Lerma O. Convergence of the optimal values of constrained Markov control processes. Math Methods Oper Res, 2002, 55: 461-484 · Zbl 1031.90058 |

[4] | Borkar V. A convex analytic approach to Markov decision processes. Probab Theory Related Fields, 1988, 78: 583-602 · Zbl 0628.90090 |

[5] | Derman C. Finite State Markovian Decision Processes. Mathematics in Science and Engineering, Vol 67. New York: Academic Press, 1970 · Zbl 0262.90001 |

[6] | Dufour F, Prieto-Rumeau T. Finite linear programming approximations of constrained discounted Markov decision processes. SIAM J Control Optim, 2013, 51: 1298-1324 · Zbl 1282.90228 |

[7] | Gonz´alez-Hern´andez J, Hern´andez-Lerma O. Extreme points of sets of randomized strategies in constrained optimization and control problems. SIAM J Optim, 2005, 15: 1085-1104 · Zbl 1097.90040 |

[8] | Guo X P, Hern´andez-del-Valle A, Hern´andez-Lerma O. First passage problems for nonstationary discrete-time stochastic control systems. Eur J Control, 2012, 18: 528-538 · Zbl 1291.93328 |

[9] | Guo X P, Hern´andez-Lerma O. Continuous-Time Markov Decision Processes: Theory and Applications. Berlin: Springer-Verlag, 2009 · Zbl 1209.90002 |

[10] | Guo X P, Piunovskiy A. Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Math Oper Res, 2011, 36(1): 105-132 · Zbl 1218.90209 |

[11] | Guo X P, Song X Y, Zhang Y. First passage criteria for continuous-time Markov decision processes with varying discount factors and history-dependent policies. IEEE Trans Automat Control, 2014, 59: 163-174 · Zbl 1360.90278 |

[12] | Guo X P, Zhang W Z. Convergence of controlled models and finite-state approximation for discounted continuous-time Markov decision processes with constraints. Eur J Control, 2014, 238: 486-496 · Zbl 1338.90446 |

[13] | Hern´andez-Lerma O, Gonz´alez-Hern´andez J. Constrained Markov decision processes in Borel spaces: the discounted case. Math Methods Oper Res, 2000, 52: 271-285 · Zbl 1032.90061 |

[14] | Hern´andez-Lerma, O.; Lasserre, J. B., Discrete-Time Markov Control Processes (1996), New York · Zbl 0724.93087 |

[15] | Hern´andez-Lerma O, Lasserre J B. Further Topics on Discrete-Time Markov Control Processes. New York: Springer-Verlag, 1999 · Zbl 0928.93002 |

[16] | Huang Y H, Wei Q D, Guo X P. Constrained Markov decision processes with first passage criteria. Ann Oper Res, 2013, 206: 197-219 · Zbl 1271.90104 |

[17] | Mao X, Piunovskiy A. Strategic measures in optimal control problems for stochastic sequences. Stoch Anal Appl, 2000, 18: 755-776 · Zbl 0973.93060 |

[18] | Piunovskiy A. Optimal Control of Random Sequences in Problems with Constraints. Dordrecht: Kluwer Academic, 1997 · Zbl 0894.93001 |

[19] | Piunovskiy A. Controlled random sequences: the convex analytic approach and constrained problems. Russian Math Surveys, 2000, 53: 1233-1293 · Zbl 0941.93056 |

[20] | Prokhorov Y. Convergence of random processes and limit theorems in probability theory. Theory Probab Appl, 1956, 1: 157-214 |

[21] | Saldi N, Linder T, Y¨uksel S. Asymptotic optimality and rates of convergence of quantized stationary policies in stochastic control. IEEE Trans Automat Control, 2015, 60 (to appear) · Zbl 1360.93412 |

[22] | Sennott L I. Stochastic Dynamic Programming and the Control of Queueing Systems. New York: Wiley, 1999 · Zbl 0997.93503 |

[23] | Wei Q D, Guo X P. Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Oper Res Lett, 2011, 39: 369-374 · Zbl 1235.90178 |

[24] | Wu X, Guo X P. First passage optimality and variance minimization of Markov decision processes with varying discount factors. J Appl Probab, 2015, 52(2) (to appear) · Zbl 1327.90374 |

[25] | Zhang Y. Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factors. TOP, 2013, 21: 378-408 · Zbl 1273.90235 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.