×

Existence of average optimal policies in Markov control processes with strictly unbounded costs. (English) Zbl 0792.93120

Summary: This paper deals with discrete-time Markov control processes on Borel spaces and strictly unbounded one-stage costs, i.e. costs that grow without bound on the complement of compact sets. Under mild assumptions, the existence of a minimum pair for the average cost problem is ensured, as well as the existence of stable optimal and pathwise-optimal control policies. It is shown that the existence of a minimum pair is equivalent to the existence of a solution to an “optimality inequality”, which is a weaker version of the dynamic programming (or optimality) equation.

MSC:

93E20 Optimal stochastic control
PDFBibTeX XMLCite
Full Text: EuDML Link

References:

[1] D. P. Bertsekas: Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, Englewood Cliffs, N. J. 1987. · Zbl 0649.93001
[2] D. P. Bertsekas, S. E. Shreve: Stochastic Optimal Control: The Discrete Time Case. Academic Press, New York 1978. · Zbl 0471.93002
[3] P. Billingsley: Convergence of Probability Measures. Wiley, New York 1968. · Zbl 0172.21201
[4] D. Blackwell: Memoryless strategies in finite-stage dynamic programming. Ann. Math. Statist. 35 (1964), 863-865. · Zbl 0127.36406 · doi:10.1214/aoms/1177703586
[5] D. Blackwell: Discounted dynamic programming. Ann. Math. Statist. 36 (1965), 226-235. · Zbl 0133.42805 · doi:10.1214/aoms/1177700285
[6] V. S. Borkar: Control of Markov chains with long-run average cost criterion: the dynamic programming equations. SIAM J. Control Optim. 27 (1989), 642-657. · Zbl 0668.60059 · doi:10.1137/0327034
[7] R. Cavazos-Cadena: Solution to the optimality equation in a class of average Markov decision chains with unbounded costs. Kybernetika 27 (1991), 23-37. · Zbl 0734.90112
[8] J. Diebolt, D. Guegan: Probabilistic properties of the general nonlinear markovian process of order one and applications to time series modelling. Rapport Technique No. 125, Laboratoire de Statistique Theorique et Appliquee, CNR-URA 1321, Universite Paris VI, 1990.
[9] J. L. Doob: Stochastic Processes. Wiley, New York 1953. · Zbl 0053.26802
[10] M. Duflo: Methodes Recursives Aleatoires. Masson, Paris 1990. · Zbl 0703.62084
[11] E. B. Dynkin, A. A. Yushkevich: Controlled Markov Processes. Springer - Verlag, Berlin 1979.
[12] R. Hartley: Dynamic programming and an undiscounted, infinite horizon, convex stochastic control problem. Recent Developments in Markov Decision Processes (R. Hartley, L. C. Thomas and D.J. White. Academic Press, London 1980, pp. 277-300.
[13] O. Hernandez-Lerma: Lyapunov criteria for stability of differential equations with Markov parameters. Boletin Soc. Mat. Mexicana 24 (1979), 27-48. · Zbl 0486.60051
[14] O. Hernandez-Lerma: Adaptive Markov Control Processes. Springer - Verlag, New York 1989. · Zbl 0698.90053 · doi:10.1007/978-1-4419-8714-3
[15] O. Hernandez-Lerma: Average optimality in dynamic programming on Borel spaces - unbounded costs and controls. Syst. Control Lett. 17 (1991), 237-242. · Zbl 0771.90098 · doi:10.1016/0167-6911(91)90069-Q
[16] O. Hernandez-Lerma, J. B. Lasserre: Average cost optimal policies for Markov control processes with Borel state space and unbounded costs. Syst. Control Lett. 15 (1990), 349-356. · Zbl 0723.93080 · doi:10.1016/0167-6911(90)90108-7
[17] O. Hernandez-Lerma, J. B. Lasserre: Linear programming and average optimality of Markov control processes on Borel spaces - unbounded costs. Rapport LAAS, LAAS-CNRS, Toulouse 1992. To appear in SIAM J. Control Optim.
[18] O. Hernandez-Lerma R. Montes de Oca, R. Cavazos-Cadena: Recurrence conditions for Markov decision processes with Borel state space: a survey. Ann. Oper. Res. 28 (1991), 29-46. · Zbl 0717.90087
[19] K. Hinderer: Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter. Springer-Verlag, Berlin 1970. · Zbl 0202.18401
[20] M. Yu. Kitayev: Semi-Markov and jump Markov control models: average cost criterion. Theory Probab. Appl. 30 (1985), 272-288. · Zbl 0586.90093
[21] M. Kurano: The existence of a minimum pair of state and policy for Markov decision processes under the hypothesis of Doeblin. SIAM J. Control Optim. 27 (1989), 296-307. · Zbl 0677.90085 · doi:10.1137/0327016
[22] H. J. Kushner: Introduction to Stochastic Control. Holt, Rinehart and Winston, New York 1971. · Zbl 0293.93018
[23] A. Leizarowitz: Optimal controls for diffusions in \(R^n\). J. Math. Anal. Appl. 149 (1990), 180-209, · Zbl 0699.49020
[24] S. P. Meyn: Ergodic theorems for discrete time stochastic systems using a stochastic Lyapunov function. SIAM J. Control Optim. 27 (1989), 1409-1439. · Zbl 0681.60067 · doi:10.1137/0327073
[25] A. Mokkadem: Sur un modele autoregressif nonlineaire. Ergodicite et ergodicite geometrique. J. Time Series Anal. 8 (1987), 195-205.
[26] D. Revuz: Markov Chains. Second edition. North-Holland, Amsterdam 1984. · Zbl 0539.60073
[27] U. Rieder: Measurable selection theorems for optimization problems. Manuscripta Math. 24 (1978), 507-518. · Zbl 0385.28005 · doi:10.1007/BF01168566
[28] V. I. Rotar, T. A. Konyuhova: Two papers on asymptotic optimality in probability and almost surely. Preprint, Central Economic Mathematical Institute (CEMI), Moscow 1991.
[29] R. H. Stockbridge: Time-average control of martingale problems: a linear programming formulation. Ann. Probab. 18 (1990), 206-217. · Zbl 0699.49019 · doi:10.1214/aop/1176990945
[30] J. Wijngaard: Existence of average optimal strategies in markovian decision problems with strictly unbounded costs. Dynamic Programming and Its Applications (M. L. Puterman, Academic Press, New York 1978, pp. 369-386. · Zbl 0458.90081
[31] K. Yosida: Functional Analysis. Fifth edition. Springer-Verlag, Berlin 1978. · Zbl 0365.46001
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.