# zbMATH — the first resource for mathematics

Stability estimates in the problem of average optimal switching of a Markov chain. (English) Zbl 1116.90401
Summary: We consider a switching model for a Markov chain $$x_t$$ with a transition probability $$p(x|B)$$. The goal of a controller is to maximize the average gain by selecting a sequence of stopping times, in which the controller gets rewards and pays costs (depending on $$x_t$$) in an alternating order. We suppose that the exact transition probability function of the original “real” chain$$x_t$$ is not available to the controller, and he/she is forced to rely on a given approximation $$\widetilde p$$ to the unknown $$p$$. The controller finds a switching policy $$\bar\pi$$ optimal for the Markov chain with the transition probability $$\widetilde p$$, with a view to apply to the original Markov chain $$x_t$$. Under certain restrictions on $$p$$ we give an upper bound for the difference between the maximal gain attainable in switching of $$x_t$$, and the gain made under the policy in the original model. The bound is expressed in terms of the total variation distance $$\sup_x\text{Var}\;(p(x|\cdot)\widetilde p(x|\cdot))$$.

##### MSC:
 90C40 Markov and semi-Markov decision processes 60J05 Discrete-time Markov processes on general state spaces
Full Text: