×

Constrained discounted Markov decision processes with Borel state spaces. (English) Zbl 1434.90212

Summary: We study discrete-time discounted constrained Markov decision processes (CMDPs) with Borel state and action spaces. These CMDPs satisfy either weak (W) continuity conditions, that is, the transition probability is weakly continuous and the reward function is upper semicontinuous in state-action pairs, or setwise (S) continuity conditions, that is, the transition probability is setwise continuous and the reward function is upper semicontinuous in actions. Our main goal is to study models with unbounded reward functions, which are often encountered in applications, e.g., in consumption/investment problems. We provide some general assumptions under which the optimization problems in CMDPs are solvable in the class of randomized stationary policies and in the class of chattering policies introduced in this paper. If the initial distribution and transition probabilities are atomless, then using a general “purification result” of E. A. Feinberg and A. Piunovskiy [SIAM J. Control Optim. 57, No. 1, 163–191 (2019; Zbl 1411.90351)] we show the existence of a deterministic (stationary) optimal policy. Our main results are illustrated by examples.

MSC:

90C40 Markov and semi-Markov decision processes
90C15 Stochastic programming
90C39 Dynamic programming

Citations:

Zbl 1411.90351
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Altman, E., Constrained Markov decision processes with total cost criteria: Lagrangian approach and dual linear program, Mathematical Methods of Operations Research, 48, 387-417 (1998) · Zbl 0930.90081
[2] Altman, E., Constrained Markov decision processes (1999), Chapman & Hall/CRC: Chapman & Hall/CRC Boca Raton, FL · Zbl 0963.90068
[3] Balder, E. J., On compactness of the space of policies in stochastic dynamic programming, Stochastic Processes and their Applications, 32, 141-150 (1989) · Zbl 0675.90088
[4] Berge, C., Topological spaces (1963), Macmillan: Macmillan New York · Zbl 0114.38602
[5] Bertsekas, D. P.; Shreve, S. E., Stochastic optimal control: The discrete-time case (1978), Academic Press: Academic Press New York · Zbl 0471.93002
[6] Borkar, V. S., A convex analytic approach to Markov decision processes, Probability Theory and Related Fields, 78, 583-602 (1988) · Zbl 0628.90090
[7] Borkar, V. S., Ergodic control of Markov chains with constraints - the general case, SIAM Journal on Control and Optimization, 32, 176-186 (1994) · Zbl 0795.93100
[8] Brown, L. D.; Purves, R., Measurable selections of extrema, The Annals of Statistics, 1, 902-912 (1973) · Zbl 0265.28003
[9] Chen, R. C.; Blankenship, G. L., Dynamic programming equations for discounted constrained stochastic control, IEEE Transactions on Automatic Control, 49, 699-709 (2004) · Zbl 1365.93540
[10] Chen, R. C.; Feinberg, E. A., Non-randomized policies in constrained Markov decision processes, Mathematical Methods of Operations Research, 66, 165-179 (2007) · Zbl 1126.90074
[11] Dufour, F.; Genadot, A., On the expected total reward with unbounded returns for Markov decision processes, Applied Mathematics and Optimization (2019), (in press) · Zbl 1441.90176
[12] Dufour, F.; Prieto-Rumeau, T., Conditions for the solvability of the linear programming formulation for constrained discounted Markov decision processes, Applied Mathematics and Optimization, 74, 27-51 (2016) · Zbl 1346.90801
[13] Dynkin, E. B.; Yushkevich, A. A., Controlled Markov processes and their applications (1979), Springer-Verlag: Springer-Verlag New York · Zbl 0426.60063
[14] Feinberg, E. A.; Hu, J.; Yuan, E., A stochastic search algorithm for voltage and reactive power control with switching costs and ZIP load model, Electric Power Systems Research, 133, 328-337 (2016)
[15] Feinberg, E. A.; Kasyanov, P. O.; Liang, Y., Fatou’s lemma for weakly converging measures under the uniform integrability condition, Theory of Probability and its Applications (2018), (in press). https://arxiv.org/abs/180707931 · Zbl 1434.28004
[16] Feinberg, E. A.; Piunovskiy, A. B., Multiple objective nonatomic Markov decision processes with total reward criteria, Journal of Mathematical Analysis and Applications, 247, 45-66 (2000) · Zbl 1056.90135
[17] Feinberg, E. A.; Piunovskiy, A. B., Nonatomic total rewards Markov decision processes with multiple criteria, Journal of Mathematical Analysis and Applications, 273, 93-111 (2002) · Zbl 1012.90069
[18] Feinberg, E. A.; Piunovskiy, A. B., Sufficiency of deterministic policies for atomless discounted and uniformly absorbing MDPs with multiple criteria, SIAM Journal on Control and Optimization, 57, 163-191 (2019) · Zbl 1411.90351
[19] Feinberg, E. A.; Rothblum, U. G., Splitting randomized stationary policies in total reward Markov decision processes, Mathematics of Operations Research, 37, 129-153 (2012) · Zbl 1243.90233
[20] Feinberg, E. A.; Shwartz, A., Constrained Markov decision models with weighted discounted rewards, Mathematics of Operations Research, 20, 302-320 (1995) · Zbl 0837.90120
[21] Feinberg, E. A.; Shwartz, A., Constrained discounted dynamic programming, Mathematics of Operations Research, 21, 922-945 (1996) · Zbl 0867.90123
[22] Feinberg, E. A.; Shwartz, A., Constrained dynamic programming with two discount factors: applications and an algorithm, IEEE Transactions on Automatic Control, 42, 628-631 (1999) · Zbl 0957.90127
[23] Feinberg, E. A.; Sonin, I. M., Notes on equivalent stationary policies in Markov decision processes with total rewards, Mathematical Methods of Operations Research, 44, 205-221 (1996) · Zbl 0860.90124
[24] Frid, E., On optimal strategies in control problems with constraints, Theory Probability and Applications, 17, 188-192 (1972) · Zbl 0279.93051
[25] Hernández-Lerma, O.; González-Hernández, J., Constrained Markov control processes in Borel spaces: the discounted case, Mathematical Methods of Operations Research, 52, 271-285 (2000) · Zbl 1032.90061
[26] Hernández-Lerma, O.; Lasserre, J. B., Discrete-time Markov control processes: Basic optimality criteria (1996), Springer: Springer New York
[27] Hernández-Lerma, O.; Lasserre, J. B., Further topics on Markov control processes (1999), Springer: Springer New York · Zbl 0928.93002
[28] Hordijk, A.; Spieksma, F., Constrained admission control to a queueing system, Advances in Applied Probability, 21, 409-431 (1989) · Zbl 0674.60086
[29] Hu, J., & Chang, H. S. (2010). An approximate stochastic annealing algorithm for finite horizon Markov decision processes. In 49th IEEE Conference on Decision and Control (pp. 5338-5343).
[30] Jaśkiewicz, A.; Nowak, A. S., Discounted dynamic programming with unbounded returns: application to economic models, Journal of Mathematical Analysis and Applications, 378, 450-462 (2011) · Zbl 1254.90292
[31] Kallenberg, L. C. M. (1983). Mathematical centre tracts, Vol. 148: Linear programming and finite Markov control problems. Amsterdam. · Zbl 0503.90061
[32] Kechris, A. S., Classical descriptive set theory (1995), Springer: Springer New York · Zbl 0819.04002
[33] Kucia, A., Scorza-dragoni type theorems, Fundamenta Mathematicae, 138, 197-203 (1991) · Zbl 0744.28011
[34] Lazar, A., Optimal flow control of a class of queueing networks in equilibrium, IEEE Transactions on Automatic Control, 28, 1001-1007 (1983) · Zbl 0526.90041
[35] Mao, X.; Piunovskiy, A., Strategic measures in optimal control problrms in stochastic sequences, Stochastic Analysis and Applications, 18, 755-766 (2000) · Zbl 0973.93060
[36] Neveu, J., Mathematical foundations of the calculus of probability (1965), Holden Day: Holden Day San Francisco · Zbl 0137.11301
[37] Nowak, A. S., On the weak topology on a space of probability measures induced by policies, Bulletin of the Polish Academy of Sciences, Series Mathematics, 36, 181-186 (1988) · Zbl 0676.90095
[38] Piunovskiy, A. B., Optimal control of random sequences in problems with constraints (1997), Kluwer Academic Publishers: Kluwer Academic Publishers Dordrecht · Zbl 0894.93001
[39] Piunovskiy, A. B.; Mao, X., Constrained Markov decision processes: the dynamic programming approach, Operations Research Letters, 27, 119-126 (2000) · Zbl 0969.90091
[40] Roubic̆ek, T., Relaxation in Optimization Theory and Variational Calculus (1997), de Gruyter: de Gruyter Berlin · Zbl 0880.49002
[41] Royden, H. L., Real analysis (1988), Prentice Hall: Prentice Hall Upper Saddle River, New Jersey · Zbl 0704.26006
[42] Saldi, N.; Linder, T.; Yüksel, S., Finite approximations in discrete-time stochastic control: quantized models and asymptotic optimality (2018), Springer: Springer Cham · Zbl 1471.93005
[43] Schäl, M., On dynamic programming: compactness of the space of policies, Stochastic Processes and their Applications, 3, 345-364 (1975) · Zbl 0317.60025
[44] Schäl, M., On dynamic programming and statistical decision theory, The Annals of Statistics, 7, 432-445 (1979) · Zbl 0417.62002
[45] Sennott, L. I., Constrained Markov decision chains, Probability in the Engineering and Information Sciences, 5, 463-475 (1991) · Zbl 1134.90531
[46] Stachurski, J., Economic dynamics: Theory and computation (2009), MIT Press: MIT Press Cambridge · Zbl 1163.91300
[47] Vakil, F.; Lazar, A., Flow control protocols for integrated networks with partially observed voice traffic, IEEE Transactions on Automatic Control, 32, 2-14 (1987)
[48] van der Wal, J. (1981). Mathematical centre tracts, Vol. 139: Stochastic dynamic programming. Amsterdam. · Zbl 0462.90055
[49] Wessels, J., Markov programming by successive approximations with respect to weighted supremum norms, Journal of Mathematical Analysis and Applications, 58, 326-335 (1977) · Zbl 0354.90087
[50] Zapała, A. M., Unbounded mappings and weak convergence of measures, Statistics & Probability Letters, 78, 698-706 (2008) · Zbl 1137.60005
[51] Zhang, Y., Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factors, TOP, 21, 378-408 (2013) · Zbl 1273.90235
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.