Mandl, Petr; Hübner, Gerhard Transient phenomena and self-optimizing control of Markov chains. (English) Zbl 0618.90096 Acta Univ. Carol., Math. Phys. 26, No. 1, 35-51 (1985). The paper considers Markov decision processes where the underlying law of motion depends on an unknown statistical parameter \(\alpha\). The paper is an extension of a well-known paper of the first author [Advances Appl. Probab. 6, 40-60 (1974; Zbl 0281.60070)] to the case where \(\alpha\) may depend on time n. The cases \(\alpha_ n=\alpha_ 0+b_ 0/n^ g\), \(0<g<1/2\) and \(g=1/2\) are studied in detail, where \((\alpha_ 0,b_ 0)\) is unknown. The main topics are the applicability of the certainty equivalence principle and a central limit theorem for the rewards under an optimal adaptive policy. The continuity statement at the top of page 39 follows from a paper by M. Kolonko [Math. Operationsforsch. Stat., Ser. Optimization 13, 567-591 (1982; Zbl 0518.90092)]. Reviewer: M.Schäl Cited in 2 Documents MSC: 90C40 Markov and semi-Markov decision processes 62M09 Non-Markovian processes: estimation Keywords:unknown statistical parameter; certainty equivalence principle; central limit theorem; optimal adaptive policy Citations:Zbl 0281.60070; Zbl 0518.90092 PDF BibTeX XML Cite \textit{P. Mandl} and \textit{G. Hübner}, Acta Univ. Carol., Math. Phys. 26, No. 1, 35--51 (1985; Zbl 0618.90096) Full Text: EuDML OpenURL