Average-optimal adaptive policies in semi-Markov decision processes including an unknown parameter. (English) Zbl 0579.90098

Summary: We consider the problem of minimizing the long-run average (expected) cost per unit time in a semi-Markov decision process including an unknown parameter. In the case of general state and action spaces and compact parameter space we construct the adaptive policy which has good properties under some identifiability conditions weaker than those for the strong consistency of the estimator. As example, we treat the age replacement with an unknown failure distribution.


90C40 Markov and semi-Markov decision processes
90B25 Reliability, availability, maintenance, inspection in operations research
60K20 Applications of Markov renewal processes (reliability, queueing networks, etc.)
62N05 Reliability and life testing
90C90 Applications of mathematical programming
