##
**Estimates for perturbations of average Markov decision processes with a minimal state and upper bounded by stochastically ordered Markov chains.**
*(English)*
Zbl 1249.90313

Summary: This paper deals with Markov decision processes (MDPs) with real state space which attain their minimum, and which are upper-bounded by (uncontrolled) stochastically ordered (SO) Markov chains. We consider MDPs with (possibly) unbounded costs, and to evaluate the quality of each policy, we use the objective function known as the average cost. For this objective function, we consider two Markov control models \(\mathbb {P}\) and \(\mathbb {P}_1\). \(\mathbb {P}\) and \(\mathbb {P}_1\) have the same components except for the transition laws. The transition \(q\) of \(\mathbb {P}\) is taken as unknown, and the transition \(q_1\) of \(\mathbb {P}_1\), as a known approximation of \(q\). Under certain irreducibility, recurrence and ergodic conditions imposed on the bounding SO Markov chain (these conditions give the rate of convergence of the transition probability to the invariant measure in \(t\)-steps, \(t=1,2,\ldots \)), the difference between the optimal cost to drive \(\mathbb {P}\) and the cost obtained to drive \(\mathbb {P}\) using the optimal policy of \(\mathbb {P}_1\) is estimated. This difference is defined as the index of perturbations and upper bounds of it are provided. An example to illustrate the theory developed here is added.

PDF
BibTeX
XML
Cite

\textit{R. Montes-de-Oca} and \textit{F. Salem-Silva}, Kybernetika 41, No. 6, 757--772 (2005; Zbl 1249.90313)

### References:

[1] | Favero F., Runglandier W. J.: A robustness result for stochastic control. Systems Control Lett. 46 (2002), 91-97 · Zbl 0994.93008 |

[2] | Gordienko E. I.: An estimate of the stability of optimal control of certain stochastic and deterministic systems. J. Soviet Math. 50 (1992), 891-899 · Zbl 1267.49026 |

[3] | Gordienko E. I.: Lecture Notes on Stability Estimation in Markov Decision Processes. Universidad Autónoma Metropolitana, México D.F., 1994 |

[4] | Gordienko E. I., Hernández-Lerma O.: Average cost Markov control processes with weighted norms: value iteration. Appl. Math. 23 (1995), 219-237 · Zbl 0829.93068 |

[5] | Gordienko E. I., Salem-Silva F. S.: Robustness inequality for Markov control processes with unbounded costs. Systems Control Lett. 33 (1998), 125-130 · Zbl 0902.93068 |

[6] | Gordienko E. I., Salem-Silva F. S.: Estimates of stability of Markov control processes with unbounded costs. Kybernetika 36 (2000), 2, 195-210 · Zbl 1249.93176 |

[7] | Hernández-Lerma O.: Adaptive Markov Control Processes. Springer-Verlag, New York 1989 · Zbl 0698.90053 |

[8] | Hernández-Lerma O., Lasserre J. B.: Further Topics on Discrete-Time Markov Control Processes. Springer-Verlag, New York 1999 · Zbl 0928.93002 |

[9] | Hinderer K.: Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter. (Lectures Notes in Operations Research and Mathematical Systems 33.) Springer-Verlag, Berlin - Heidelberg - New York 1970 · Zbl 0202.18401 |

[10] | Lindvall T.: Lectures on the Coupling Method. (Wiley Series in Probability and Mathematical Statistics.) Wiley, New York 1992 · Zbl 1013.60001 |

[11] | Lund R.: The geometric convergence rates of a Lindley random walk. J. Appl. Probab. 34 (1997), 806-811 · Zbl 0891.60090 |

[12] | Lund R., Tweedie R.: Geometric convergence rates for stochastically ordered Markov chains. Math. Oper. Res. 20 (1996), 182-194 · Zbl 0847.60053 |

[13] | Meyn S., Tweedie R.: Markov Chains and Stochastic Stability. Springer-Verlag, New York 1993 · Zbl 1165.60001 |

[14] | Montes-de-Oca R., Sakhanenko, A., Salem-Silva F.: Estimates for perturbations of general discounted Markov control chains. Appl. Math. 30 (2003), 3, 287-304 · Zbl 1055.90086 |

[15] | Nummelin E.: General Irreducible Markov Chains and Non-negative Operators. Cambrigde University Press, Cambridge 1984 · Zbl 0551.60066 |

[16] | Rachev S. T.: Probability Metrics and the Stability of Stochastic Models. Wiley, New York 1991 · Zbl 0744.60004 |

[17] | Zolotarev V. M.: On stochastic continuity of queueing systems of type G/G/1. Theory Probab. Appl. 21 (1976), 250-269 · Zbl 0363.60090 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.