Stability estimates in the problem of average optimal switching of a Markov chain. (English) Zbl 1116.90401

Summary: We consider a switching model for a Markov chain \(x_t\) with a transition probability \(p(x|B)\). The goal of a controller is to maximize the average gain by selecting a sequence of stopping times, in which the controller gets rewards and pays costs (depending on \(x_t\)) in an alternating order. We suppose that the exact transition probability function of the original “real” chain\(x_t\) is not available to the controller, and he/she is forced to rely on a given approximation \(\widetilde p\) to the unknown \(p\). The controller finds a switching policy \(\bar\pi\) optimal for the Markov chain with the transition probability \(\widetilde p\), with a view to apply to the original Markov chain \(x_t\). Under certain restrictions on \(p\) we give an upper bound for the difference between the maximal gain attainable in switching of \(x_t\), and the gain made under the policy in the original model. The bound is expressed in terms of the total variation distance \(\sup_x\text{Var}\;(p(x|\cdot)\widetilde p(x|\cdot))\).


90C40 Markov and semi-Markov decision processes
60J05 Discrete-time Markov processes on general state spaces
Full Text: DOI