## Transient phenomena and self-optimizing control of Markov chains.(English)Zbl 0618.90096

The paper considers Markov decision processes where the underlying law of motion depends on an unknown statistical parameter $$\alpha$$. The paper is an extension of a well-known paper of the first author [Advances Appl. Probab. 6, 40-60 (1974; Zbl 0281.60070)] to the case where $$\alpha$$ may depend on time n. The cases $$\alpha_ n=\alpha_ 0+b_ 0/n^ g$$, $$0<g<1/2$$ and $$g=1/2$$ are studied in detail, where $$(\alpha_ 0,b_ 0)$$ is unknown.
The main topics are the applicability of the certainty equivalence principle and a central limit theorem for the rewards under an optimal adaptive policy. The continuity statement at the top of page 39 follows from a paper by M. Kolonko [Math. Operationsforsch. Stat., Ser. Optimization 13, 567-591 (1982; Zbl 0518.90092)].
Reviewer: M.Schäl

### MSC:

 90C40 Markov and semi-Markov decision processes 62M09 Non-Markovian processes: estimation

### Citations:

Zbl 0281.60070; Zbl 0518.90092
Full Text: