Stationary policies in dynamic programming models under compactness assumptions. (English) Zbl 0533.90093

Summary: The present work deals with the usual stationary decision model of dynamic programming. The imposed convergence condition on the expected total rewards is so general that both the negative (unbounded) case and the positive (unbounded) case are included. However, the gambling model studied by Dubins and Savage is not covered by the present model. In addition to the convergence condition, a continuity and compactness condition is imposed. The main result states that the supremum of the expected total rewards under all stationary policies is equal to the supremum under all (possibly randomized and non-Markovian) policies.


90C39 Dynamic programming
90C40 Markov and semi-Markov decision processes
Full Text: DOI