×

Denumerable controlled Markov chains with average reward criterion: Sample path optimality. (English) Zbl 0835.90116

Summary: We consider discrete-time nonlinear controlled stochastic systems, modeled by controlled Markov chains with denumerable state space and compact action space. The corresponding stochastic control problem of maximizing average rewards in the long-run is studied. Departing from the most common position which uses expected values of rewards, we focus on a sample path analysis of the stream of states/rewards. Under a Lyapunov function condition, we show that stationary policies obtained from the average reward optimality equation are not only average reward optimal, but indeed sample path average reward optimal, for almost all sample paths.

MSC:

90C40 Markov and semi-Markov decision processes
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Arapostathis A, Borkar VS, Fernández-Gaucherand E, Ghosh MK, Marcus SI (1993) Discrete-time controlled Markov processes with an average cost criterion: A survey. SIAM J Control Optim 31:282-344 · Zbl 0770.93064 · doi:10.1137/0331018
[2] Bertsekas DP (1987) Dynamic programming: Deterministic and stochastic models. Prentice-Hall Englewood Cliffs · Zbl 0649.93001
[3] Borkar VS (1991) Topics in controlled Markov chains. Pitman Research Notes in Mathematics Series 240, Longman Scientific & Technical UK · Zbl 0725.93082
[4] Cavazos-Cadena R (1991) Recent results on conditions for the existence of average optimal stationary policies. Annals Operat Res 28:3-28 · Zbl 0744.90097 · doi:10.1007/BF02055572
[5] Cavazos-Cadena R (1992) Existence of optimal stationary policies in average reward Markov decision processes with a recurrent state. Appl Math Optim 26:171-194 · Zbl 0766.90083 · doi:10.1007/BF01189029
[6] Cavazos-Cadena R, Hernández-Lerma O (1992) Equivalence of Lyapunov stability criteria in a class of Markov decision processes. Appl Math Optim 26:113-137 · Zbl 0766.90082 · doi:10.1007/BF01189027
[7] Cavazos-Cadena R, Fernández-Gaucherand E (1993) Denumerable controlled Markov chains with strong average optimality criterion: Bounded & unbounded costs. SIE Working paper 93-15, SIE Department The University of Arizona · Zbl 0851.90135
[8] Dynkin EB, Yushkevich AA (1979) Controlled Markov processes. Springer-Verlag New York
[9] Ephremides A, Verdú S (1989) Control and optimization methods in communication networks. IEEE Trans Automat Control 34:930-942 · Zbl 0709.94667 · doi:10.1109/9.35806
[10] Fernández-Gaucherand E, Arapostathis A, Marcus SI (1990) Remarks on the existence of solutions to the average cost optimality equation in Markov decision processes. Syst Control Lett 15:425-432 · Zbl 0742.93079 · doi:10.1016/0167-6911(90)90067-5
[11] Fernández-Gaucherand E, Ghosh MK, Marcus SI (1994) Controlled Markov processes on the infinite planning horizon: Weighted and overtaking cost criteria.ZOR: Methods and Models of Operations Research 39:131-155 · Zbl 0809.90128 · doi:10.1007/BF01415579
[12] Flynn J (1980) On optimality criteria for dynamic programs with long finite horizon. J Math Anal Appl 76:202-208 · Zbl 0438.90100 · doi:10.1016/0022-247X(80)90072-4
[13] Foster FG (1953) On the stochastic processes associated with certain queueing processes. Ann Math Stat 24:355-360 · Zbl 0051.10601 · doi:10.1214/aoms/1177728976
[14] Georgin J-P (1978) Contrôle des chaines de Markov sur des espaces arbitraires. Ann Inst H Poincaré, Sect B, 14:255-277 · Zbl 0391.60066
[15] Ghosh MK, Marcus SI (1992) On strong average optimality of Markov decision processes with unbounded costs. Operat Res Lett 11:99-104 · Zbl 0768.90085 · doi:10.1016/0167-6377(92)90040-A
[16] Hernández-Lerma O (1989) Adaptive Markov control processes. Springer-Verlag New York · Zbl 0698.90053
[17] Hinderer K (1970) Foundations of non-stationary dynamic programming with discrete time parameters. Lect Notes Operat Res Math Syst 33, Springer-Verlag Berlin · Zbl 0202.18401
[18] Hordijk A (1974) Dynamic programming and Markov potential theory. Math Centre Tracts 51, Mathematisch Centrum Amsterdam · Zbl 0284.49012
[19] Ross SM (1970) Applied probability models with optimization applications. Holden-Day San Francisco · Zbl 0213.19101
[20] Ross SM (1983) Introduction to stochastic dynamic programming. Academic Press New York · Zbl 0567.90065
[21] Royden HL (1968) Real analysis, 2nd ed. Macmillan New York
[22] Stidham S, Weber R (1993) A survey of Markov decision models for control of networks of queues. Queueing Syst 13:291-314 · Zbl 0772.90082 · doi:10.1007/BF01158935
[23] Tijms HC (1986) Stochastic modelling and analysis: A computational approach. John Wiley Chichester
[24] Yushkevich AA (1973) On a class of strategies in general Markov decision models. Theory Prob Applications 18:777-779 · Zbl 0311.90081 · doi:10.1137/1118099
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.