Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factors. (English) Zbl 1273.90235

Summary: In this paper we develop the convex analytic approach to a discounted discrete-time Markov decision process (DTMDP) in Borel state and action spaces with \(N\) constraints. Unlike the classic discounted models, we allow a non-constant discount factor. After defining and characterizing the corresponding occupation measures, the original constrained DTMDP is written as a convex program in the space of occupation measures, whose compactness and convexity we show. In particular, we prove that every extreme point of the space of occupation measures can be generated by a deterministic stationary policy for the DTMDP. For the resulting convex program, we prove that it admits a solution that can be expressed as a convex combination of \(N + 1\) extreme points of the space of occupation measures. One of its consequences is the existence of a randomized stationary optimal policy for the original constrained DTMDP.


90C40 Markov and semi-Markov decision processes
60J05 Discrete-time Markov processes on general state spaces
Full Text: DOI


[1] Aliprantis C, Border K (2007) Infinite dimensional analysis. Springer, New York · Zbl 1156.46001
[2] Altman E (1999) Constrained Markov decision processes. Chapman & Hall/CRC, Boca Raton · Zbl 0963.90068
[3] Bertsekas D, Shreve S (1978) Stochastic optimal control. Academic Press, New York · Zbl 0471.93002
[4] Borkar V (1988) A convex analytic approach to Markov decision processes. Probab Theory Relat Fields 78:583–602 · Zbl 0628.90090
[5] Chen R, Blankenship G (2004) Dynamic programming equations for discounted constrained stochastic control. IEEE Trans Autom Control 49:699–709 · Zbl 1365.93540
[6] Dynkin E, Yushkevich A (1979) Controlled Markov processes. Springer, New York · Zbl 0073.34801
[7] Frid E (1972) On optimal strategies in control problems with constraints. Theory Probab Appl 17:188–192 · Zbl 0279.93051
[8] González-Hernández J, Hernández-Lerma O (2005) Extreme points of sets of randomized strategies in constrained optimization and control problems. SIAM J Optim 15:1085–1104 · Zbl 1097.90040
[9] González-Hernández J, López-Martińez R, Pérez-Hernández J (2007) Markov control processes with randomized discounted cost. Math Methods Oper Res 65:27–44 · Zbl 1126.90075
[10] González-Hernández J, López-Martińez R, Minjárez-Sosa J (2009) Approximation, estimation and control of stochastic systems under a randomized discounted cost criterion. Kybernetika 45:737–754 · Zbl 1190.93105
[11] Guo X, Piunovskiy A (2011) Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Math Oper Res 36:105–132 · Zbl 1218.90209
[12] Hernández-Lerma O, Lasserre J (1996) Discrete-time Markov control processes. Springer, New York · Zbl 0853.93106
[13] Hernández-Lerma O, Lasserre J (1999) Further topics on discrete-time Markov control processes. Springer, New York · Zbl 0928.93002
[14] Hernández-Lerma O, González-Hernández J (2000) Constrained Markov control processes in Borel spaces: the discounted case. Math Methods Oper Res 52:271–285 · Zbl 1032.90061
[15] Mao X, Piunovskiy A (2000) Strategic measures in optimal control problems for stochastic sequences. Stoch Anal Appl 18:755–776 · Zbl 0973.93060
[16] Meyer P (1966) Probability and potentials. Blaisdell, Waltham · Zbl 0138.10401
[17] Piunovskiy A (1997) Optimal control of random sequences in problems with constraints. Kluwer Academic, Dordrecht · Zbl 0894.93001
[18] Piunovskiy A (2000) Controlled random sequences: the convex analytic approach and constrained problems. Russ Math Surv 53:1233–1293 · Zbl 0941.93056
[19] Piunovskiy A (2006) Dynamic programming in constrained Markov decision processes. Control Cybern 35:645–660 · Zbl 1115.90063
[20] Piunovskiy A, Mao X (2000) Constrained Markovian decision processes: the dynamic programming approach. Oper Res Lett 27:119–126 · Zbl 0969.90091
[21] Prokhorov Yu (1956) Convergence of random processes and limit theorems in probability theory. Theory Probab Appl 1:157–214 · Zbl 0075.29001
[22] Rockafellar R (1974) Conjugate duality and optimization. SIAM, Philadelphia · Zbl 0296.90036
[23] Schäll M (1975) Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal. Z Wahrscheinlichkeitstheor Verw Geb 32:179–196 · Zbl 0316.90080
[24] Varadarajan V (1958) Weak convergence of measures on separable metric spaces. Sankhya 19:15–22 · Zbl 0082.26505
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.