Semi-Markov decision processes with variance minimization criterion. (English) Zbl 1310.93087

Summary: We consider a variance minimization problem for semi-Markov decision processes with state-dependent discount factors in Borel spaces. The reward function may be unbounded from below and from above. Under suitable conditions, we first prove that the discount variance minimization criterion can be transformed into an equivalent expected discount criterion, and then show the existence of a discount variance minimal policy over the class of expected discount optimal stationary policies. Furthermore, we also give a value iteration algorithm for calculating the expected discount optimal value function. Finally, two examples are used to illustrate our results.


93E20 Optimal stochastic control
90C40 Markov and semi-Markov decision processes
Full Text: DOI


[1] Bertsekas DP (2001) Dynamic programming and optimal control. Athena Scientific, Belmont
[2] Berument H, Kilinc Z, Ozlale U (2004) The effects of different inflation risk prepius on interest rate spreads. Phys A 333:317-324
[3] Cruz-Suárez D, Montes-de-Oca R, Salem-Silva F (2004) Conditions for the uniqueness of optima policies of discounted Markov decision processes. Math Methods Oper Res 60:415-436 · Zbl 1104.90053
[4] Filar JA, Kallenberg LCM, Lee HM (1989) Variance-penalized Markov decision processes. Math Oper Res 14:147-161 · Zbl 0676.90096
[5] González-Hernández J, López-Martínez RR, Minjárez-Sosa JA (2008) Adaptive policies for stochastic systems under a randomized cost criterion. Bol Soc Mat Mex 14:149-163 · Zbl 1201.93130
[6] González-Hernández J, López-Martínez RR, Minjárez-Sosa JA (2009) Approximation, estimation and control of stochastic systems under randomized discounted cost criterion. Kybernetika 45:737-754 · Zbl 1190.93105
[7] Guo XP, Yang J (2008) A new condition and approach for zero-sum stochastic games with average payoffs. Stoch Anal Appl 26:537-561 · Zbl 1142.91019
[8] Guo XP, Hernández-Lerma O (2009) Continuous-time Markov decision processes: theory and applications. Springer, Berlin Heidelberg · Zbl 1209.90002
[9] Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer, New York
[10] Hernández-Lerma O, Lasserre JB (1999) Further topics on discrete-time Markov control processes. Springer, New York · Zbl 0928.93002
[11] Hernández-Lerma O, Vega-Amaya O, Carrasco G (1999) Sample-path optimality and variance-minimization of average cost Markov control processes. SIAM J Control Optim 38:79-93 · Zbl 0951.93074
[12] Hinderer K (1970) Foundations of non-stationary dynamical programming with discrete time parameter. Springer, New York · Zbl 0202.18401
[13] Huang Y, Kallenberg LCM (1994) On finding optimal policies for Markov decision chains: a unifying framework for mean-variance-tradeoffs. Math Oper Res 19:434-448 · Zbl 0842.90120
[14] Jaquette SC (1973) Markov decision processes with a new optimality criterion: discrete time. Ann Stat 1:496-505 · Zbl 0259.90054
[15] Kadota Y, Kurano M, Yasuda M (1995) Discounted Markov decision processes with general utility. In: Proceeding of APORS’ 94. World Scientific, pp 330-337 · Zbl 0939.91509
[16] Kitaev MY, Rykov VV (1995) Controlled queueing systems. CRC Press, Florida
[17] Newell RG, Pizer WA (2003) Discounting the distant future: how much do uncertain rates increase valuation. J Environ Econ Manage 46:52-71 · Zbl 1041.91502
[18] Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York · Zbl 0829.90134
[19] Schäl M (1975) Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal. Z Wahrscheinlichkeitstheorie Verw Gebiete 32:179-196 · Zbl 0316.90080
[20] Sobel MJ (1982) The variance of discounted Markov decision processes. J Appl Probab 19:794-802 · Zbl 0503.90091
[21] Vega-Amaya, O.; Hernández-Hernández, D. (ed.); Minjárez-Sosa, JA (ed.), On the regularity property of semi-Markov processes with Borel state spaces, 301-309 (2012), New York · Zbl 1374.60177
[22] Wakuta W (1987) Arbitrary state semi-Markov decision processes with unbounded rewards. Optimization 18:447-454 · Zbl 0631.90083
[23] Wei QD, Guo XP (2011) Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Oper Res Lett 39:369-374 · Zbl 1235.90178
[24] Wei QD, Guo XP (2012) New average optimality conditions for semi-Markov decision processes in Borel spaces. J Optim Theory Appl 153:709-732 · Zbl 1266.90190
[25] Zhang Y (2013) Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factors. Top 21:378-408 · Zbl 1273.90235
[26] Zhu QX, Guo XP (2007) Markov decision processes with variance minimization: a new condition and approach. Stoch Anal Appl 25:577-592 · Zbl 1152.90646
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.