×

Average cost Markov decision processes: Optimality conditions. (English) Zbl 0739.90072

The authors consider the following discrete-time Markov decision processes with long run expected average cost criterion: both state space \(X\) and action set \(A\) are Borel sets (i.e. Borel subsets of complete separable metric spaces) and, for each state \(x\in X\), a nonempty measurable subset \(A(x)\) of \(A\), which is the set of the admissible actions when the process is in state \(x\), is compact. The transition law \(q(\cdot\mid\cdot,\cdot)\) is a stochastic kernel on \(X\) given \(X\times A\) such that \(\int_ X v(y)q(dy\mid x,a)\) is a lower semi-continuous function in \(a\in A(x)\) for each \(x\in X\) and any bounded measurable function \(v\) on \(X\). The one-stage cost function \(c\) is bounded measurable on \(X\times A\) and lower semi-continuous in \(a\in A(x)\) for each \(x\in X\). The authors give ergodicity conditions with respect to the transition law \(q\) under which a duality theorem holds, that is, the existence of an optimal solution to the primal problem, which is equivalently a solution to the optimality equation for the Markov decision model with the average cost criterion, yields an optimal solution to the dual problem or the deterministic version and conversely, and furthermore the corresponding optimal values of the problems are equal. This result extends those of K. Yamada [J. Math. Anal. Appl. 50, 579-595 (1975; Zbl 0323.90053)] and J. A. Filar and T. A. Schultz [Oper. Res. Lett. 7, 303-307 (1988; Zbl 0659.90095)] to the model with general Borel spaces. Also, using the concept of opportunity cost introduced by J. Flynn [J. Math. Anal. Appl. 76, 202-208 (1980; Zbl 0438.90100); ibid. 144, 586-594 (1989; Zbl 0679.90084)], they show that a stationary policy determined from the optimality equation is strong average optimal.
Reviewer: Y.Ohtsubo (Tobata)

MSC:

90C40 Markov and semi-Markov decision processes
93E20 Optimal stochastic control
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Filar, J. A.; Schultz, T. A., Communicating MDPs: equivalence and LP properties, Oper. Res. Lett., 7, 303-307 (1988) · Zbl 0659.90095
[2] Flynn, J., On optimality criteria for dynamic programs with long finite horizons, J. Math. Anal. Appl., 76, 202-208 (1980) · Zbl 0438.90100
[3] Flynn, J., Optimal steady states, excessive functions, and deterministic dynamic programs, J. Math. Anal. Appl., 144, 586-594 (1989) · Zbl 0697.90084
[4] Georgin, J. P., Contrôle de chaînes de Markov sur des espaces arbitraires, Ann. Inst. H. Poincaré, 14, 255-277 (1978) · Zbl 0391.60066
[5] Hernández-Lerma, O., Adaptive Markov Control Processes (1989), Springer-Verlag: Springer-Verlag New York · Zbl 0698.90053
[6] Hernández-Lerma, O.; Lasserre, J. B., A forecast horizon and a stopping rule for general Markov decision processes, J. Math. Anal. Appl., 132, 388-400 (1988) · Zbl 0646.90090
[7] Hernández-Lerma, O.; Montes-de-Oca, R.; Cavazos-Cadena, R., Recurrence conditions for Markov decision processes with Borel state space, Ann. Oper. Res. (1991), to appear · Zbl 0717.90087
[8] Hinderer, K., Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter, (Lecture Notes Oper. Res., Vol. 33 (1970), Springer-Verlag: Springer-Verlag New York) · Zbl 0202.18401
[9] Kurano, M., The existence of a minimum pair of state and policy for Markov decision processes under the hypothesis of Doeblin, SIAM J. Control Optim., 27, 296-307 (1989) · Zbl 0677.90085
[10] Tweedie, R. L., Criteria for rates of convergence of Markov chains, with applications to queueing and storage theory, (Kingman, J. F.C; Reuter, G. E.H, Papers in Probability Statistics and Analysis (1983), Cambridge Univ. Press: Cambridge Univ. Press Cambridge), 260-276 · Zbl 0501.60072
[11] Yakowitz, S., Dynamic programming applications in water resources, Water Resources Res., 18, 673-696 (1982)
[12] Yamada, K., Duality theorem in Markovian decision problems, J. Math. Anal. Appl., 50, 579-595 (1975) · Zbl 0323.90053
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.