×

zbMATH — the first resource for mathematics

Zero-sum continuous-time Markov games with unbounded transition and discounted payoff rates. (English) Zbl 1125.91016
Summary: This paper is concerned with two-person zero-sum games for continuous-time Markov chains, with possibly unbounded payoff and transition rate functions, under the discounted payoff criterion. We give conditions under which the existence of the value of the game and a pair of optimal stationary strategies is ensured by using the optimality (or Shapley) equation. We prove the convergence of the value iteration scheme to the game’s value and to a pair of optimal stationary strategies. Moreover, when the transition rates are bounded we further show that the convergence of value iteration is exponential. Our results are illustrated with a controlled queueing system with unbounded transition and reward rates.

MSC:
91A15 Stochastic games, stochastic differential games
60J27 Continuous-time Markov processes on discrete state spaces
60K25 Queueing theory (aspects of probability theory)
91A05 2-person games
PDF BibTeX XML Cite
Full Text: DOI Euclid
References:
[1] Altman, E. (2005) Applications of dynamic games in queues. In A.S. Nowak and K. Szajowski (eds), Advances in Dynamic Games. Boston: Birkhauser. · Zbl 1098.90019
[2] Anderson, W.J. (1991) Continuous-Time Markov Chains. New York: Springer-Verlag. · Zbl 0731.60067
[3] Ardanuy, R. and Alcalá, A. (1992) Weak infinitesimal operators and stochastic differential games. Stochastica, 13, 5-12. · Zbl 0769.90086
[4] Basar, T. and Olsder, G.J. (1999) Dynamic Noncooperative Game Theory, 2nd edition. Philadelphia: Society for Industrial and Applied Mathematics. · Zbl 0479.90085
[5] Chung, K.L. (1960) Markov Chains with Stationary Transition Probabilities. Berlin: Springer-Verlag. · Zbl 0092.34304
[6] Fan, K. (1953) Minimax theorems. Proc. Natl. Acad. Sci. USA, 39, 42-47. · Zbl 0050.06501
[7] Fainberg, E.A. (2004) Continuous-time discounted jump Markov decision processes: a discrete-event approach. Math. Oper. Res., 29, 492-524. · Zbl 1082.90126
[8] Feller, W. (1940) On the integro-differential equations of purely discontinuous Markoff processes. Trans. Amer. Math. Soc., 48, 488-515. JSTOR: · Zbl 0025.34704
[9] Filar, J.A. and Vrieze, K. (1997) Competitive Markov Decision Processes. New York: Springer-Verlag. · Zbl 0934.91002
[10] Guo, X.P. and Hernández-Lerma, O. (2003a) Continuous-time controlled Markov chains. Ann. Appl. Probab., 13, 363-388. · Zbl 1049.60067
[11] Guo, X.P. and Hernández-Lerma, O. (2003b) Drift and monotonicity conditions for continuous-time Markov control processes with an average criterion. IEEE Trans. Automat. Control, 48, 236-245. · Zbl 1364.90346
[12] Guo, X.P. and Liu, K. (2001) A note on optimality conditions for continuous-time Markov decision processes. IEEE Trans. Automat. Control, 146, 1984-1989. · Zbl 1017.90120
[13] Guo, X.P. and Zhu, W.P. (2002) Denumerable state continuous time Markov decision processes with unbounded transition and reward rates under discounted criterion. J. Appl. Probab., 39, 233-250. · Zbl 1028.90078
[14] Hamadène, S. (1999) Nonzero sum linear-quadratic stochastic differential games and backwardforward equations. Stochastic Anal. Appl., 17, 117-130. · Zbl 0922.60050
[15] Hernández-Lerma, O. (1994) Lectures on Continuous-Time Markov Control Processes. Mexico City: Sociedad Matemática Mexicana. · Zbl 0866.93102
[16] Hernández-Lerma, O. and Lasserre, J.B. (1999) Further Topics on Discrete-Time Markov Control Processes. New York: Springer-Verlag. · Zbl 0928.93002
[17] Hernández-Lerma, O. and Lasserre, J.B. (2001) Zero-sum stochastic games in Borel spaces: average payoff criterion. SIAM J. Control Optim., 39, 1520-1539. · Zbl 1140.91319
[18] Hou, Z.T. (1994) The Q-Matrix Problems on Markov Processes. Changsha: Science and Technology Press of Hunan. (In Chinese.)
[19] Hou, Z.T. and Guo, X.P. (1998) Markov Decision Processes. Changsha: Science and Technology Press of Hunan. (In Chinese.)
[20] Lai, H.C. and Tanaka, K. (1984) On an N-person noncooperative Markov game with a metric state space. J. Math. Anal. Appl., 101, 78-96. · Zbl 0615.90101
[21] Lal, A.K. and Sinha, S. (1992) Zero-sum two-person semi-Markov games. J. Appl. Probab., 29, 56-72. JSTOR: · Zbl 0761.90111
[22] Lippman, S.A. (1973) Semi-Markov decision processes with unbounded rewards. Management Sci., 19, 717-731. · Zbl 0259.60044
[23] Lippman, S.A. (1975) On dynamic programming with unbounded rewards. Management Sci., 21, 1225-1233. · Zbl 0309.90017
[24] Puterman, M.L. (1994) Markov Decision Processes. New York: Wiley. · Zbl 0829.90134
[25] Ramachandran, K.M. (1999) A convergence method for stochastic differential games with a small parameter. Stochastic Anal. Appl., 17, 219-252. · Zbl 0931.91003
[26] Sennott, L.I. (1994) Zero-sum stochastic games with unbounded cost: discounted and average cost cases. Z. Oper. Res., 39, 209-225. · Zbl 0833.90142
[27] Sennott, L.I. (1999) Stochastic Dynamic Programming and the Control of Queueing Systems. New York: Wiley. · Zbl 0997.93503
[28] Shapley, L. (1953) Stochastic games. Proc. Natl. Acad. Sci. USA, 39, 1095-1100. · Zbl 0051.35805
[29] Tanaka, K. and Homma, H. (1978) Continuous time non-cooperative n-person Markov games. Bull. Math. Statist., 15, 93-105. · Zbl 0393.90108
[30] Tanaka, K. and Wakuta, K. (1978) On continuous time Markov games with countable state space. J. Oper. Res. Soc. Japan, 21, 17-27. · Zbl 0382.90098
[31] Tijms, H.C. (1994) Stochastic Models: An Algorithmic Approach. Chichester: Wiley. · Zbl 0838.60075
[32] Van Nunen, J.A.E.E. and Wessels, J. (1978) A note on dynamic programming with unbounded rewards. Management Sci., 24, 576-580. · Zbl 0374.49015
[33] Yushkevich, A.A. and Fainberg, E.A. (1979) On homogeneous Markov models with continuous time and finite or countable state space. Theory Probab. Appl., 24, 156-161. · Zbl 0437.93033
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.