Transient phenomena and self-optimizing control of Markov chains. (English) Zbl 0618.90096

The paper considers Markov decision processes where the underlying law of motion depends on an unknown statistical parameter \(\alpha\). The paper is an extension of a well-known paper of the first author [Advances Appl. Probab. 6, 40-60 (1974; Zbl 0281.60070)] to the case where \(\alpha\) may depend on time n. The cases \(\alpha_ n=\alpha_ 0+b_ 0/n^ g\), \(0<g<1/2\) and \(g=1/2\) are studied in detail, where \((\alpha_ 0,b_ 0)\) is unknown.
The main topics are the applicability of the certainty equivalence principle and a central limit theorem for the rewards under an optimal adaptive policy. The continuity statement at the top of page 39 follows from a paper by M. Kolonko [Math. Operationsforsch. Stat., Ser. Optimization 13, 567-591 (1982; Zbl 0518.90092)].
Reviewer: M.Schäl


90C40 Markov and semi-Markov decision processes
62M09 Non-Markovian processes: estimation
Full Text: EuDML