Variance minimization and the overtaking optimality approach to continuous-time controlled Markov chains. (English) Zbl 1177.93101

Summary: This paper deals with denumerable-state continuous-time controlled Markov chains with possibly unbounded transition and reward rates. It concerns optimality criteria that improve the usual expected average reward criterion. First, we show the existence of average reward optimal policies with minimal average variance. Then we compare the variance minimization criterion with overtaking optimality. We present an example showing that they are opposite criteria, and therefore we cannot optimize them simultaneously. This leads to a multiobjective problem for which we identify the set of Pareto optimal policies (also known as nondominated policies).


93E20 Optimal stochastic control
90C40 Markov and semi-Markov decision processes
60J25 Continuous-time Markov processes on general state spaces
Full Text: DOI


[1] Bäuerle N (2005) Benchmark and mean-variance problems for insurers. Math Methods Oper Res 62: 159–165 · Zbl 1101.93081
[2] Bhatt AG, Borkar VS (1996) Occupation measures for controlled Markov processes: characterization and optimality. Ann Probab 24: 1531–1562 · Zbl 0863.93086
[3] Bhattacharya RN (1982) On the functional central limit theorem and the law of the iterated logarithm for Markov processes. Z Wahrsch Verw Geb 60: 185–201 · Zbl 0468.60034
[4] Down D, Meyn SP, Tweedie RL (1995) Exponential and uniform ergodicity of Markov processes. Ann Probab 23: 1671–1691 · Zbl 0852.60075
[5] Guo XP, Hernández-Lerma O (2003a) Continuous-time controlled Markov chains with discounted rewards. Acta Appl Math 79: 195–216 · Zbl 1043.93067
[6] Guo XP, Hernández-Lerma O (2003b) Drift and monotonicity conditions for continuous-time controlled Markov chains. IEEE Trans Automat Control 48: 236–244 · Zbl 1364.90346
[7] Guo XP, Hernández-Lerma O, Prieto-Rumeau T (2006) A survey of recent results on continuous-time Markov decision processes. Top 14: 177–261 · Zbl 1278.90427
[8] Hernández-Lerma O, Romera R (2004) The scalarization approach to multiobjective Markov control problems: why does it work? Appl Math Optim 50: 279–293 · Zbl 1081.90056
[9] Hernández-Lerma O, Vega-Amaya O, Carrasco G (1999) Sample-path optimality and variance- minimization of average cost Markov control processes. SIAM J Control Optim 38: 79–93 · Zbl 0951.93074
[10] Kurtz TG, Stockbridge RH (1998) Existence of Markov controls and characterization of optimal Markov controls. SIAM J Control Optim 36: 609–653 (Erratum, ibid. 37:1310–1311) · Zbl 0935.93064
[11] Mandl P, Romera MR (1987) On controlled Markov processes with average cost criterion. Kybernetica (Prague) 23: 433–442 · Zbl 0633.90090
[12] Prieto-Rumeau T (2006) Blackwell optimality in the class of Markov policies for continuous-time controlled Markov chains. Acta Appl Math 92: 77–96 · Zbl 1108.93080
[13] Prieto-Rumeau T, Hernández-Lerma O (2005) The Laurent series, sensitive discount and Blackwell optimality for continuous-time controlled Markov chains. Math Methods Oper Res 61: 123–145 · Zbl 1077.93055
[14] Prieto-Rumeau T, Hernández-Lerma O (2006) Bias optimality for continuous-time controlled Markov chains. SIAM J Control Optim 45: 51–73 · Zbl 1134.93049
[15] Prieto-Rumeau T, Hernández-Lerma O (2008) Ergodic control of continuous-time Markov chains with pathwise constraints. SIAM J Control Optim 47: 1888–1908 · Zbl 1165.93040
[16] Ye L, Guo XP, Hernández-Lerma O (2008) Existence and regularity of a nonhomogeneous transition matrix under measurability conditions. J Theoret Probab 21: 604–627 · Zbl 1147.60050
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.