The existence of a minimum pair of state and policy for Markov decision processes under the hypothesis of Doeblin. (English) Zbl 0677.90085

The paper studies average-cost Markov decision processes under compactness and (semi-)continuity conditions but no irreducibility is assumed. So only a pair of starting distribution and (randomized, non- Markov) policy can be found with minimal average expected cost. Under a Doeblin condition the existence of a stationary, uniformly optimal policy can be shown only on a subset of states (closed under this policy). Under additional reachability and continuity conditions this result can be extended to all states. A related result is shown under a different simple continuity condition.
Reviewer: G.Hübner


90C40 Markov and semi-Markov decision processes
90C39 Dynamic programming
Full Text: DOI