Mean-variance problems for finite horizon semi-Markov decision processes. (English) Zbl 1343.93100

Summary: This paper deals with a mean-variance problem for finite horizon semi-Markov decision processes. The state and action spaces are Borel spaces, while the reward function may be unbounded. The goal is to seek an optimal policy with minimal finite horizon reward variance over the set of policies with a given mean. Using the theory of \(N\)-step contraction, we give a characterization of policies with a given mean and convert the second order moment of the finite horizon reward to a mean of an infinite horizon reward/cost generated by a discrete-time Markov Decision Process (MDP) with a two-dimensional state space and a new one-step reward/cost under suitable conditions. We then establish the optimality equation and the existence of mean-variance optimal policies by employing the existing results for discrete-time MDPs. We also provide a value iteration and a policy improvement algorithm for computing the value function and mean-variance optimal policies, respectively. In addition, a linear program and the dual program are developed for solving the mean-variance problem.


93E20 Optimal stochastic control
93C55 Discrete-time control/observation systems
49J55 Existence of optimal solutions to problems involving randomness
49K45 Optimality conditions for problems involving randomness
90C40 Markov and semi-Markov decision processes
49L20 Dynamic programming in optimal control and differential games
90C39 Dynamic programming
90C05 Linear programming
Full Text: DOI


[1] Alp, ÖS; Korn, R, Continuous-time Mean-variance portfolio optimization in a jump-diffusion market, Decis. Econ. Financ., 34, 21-40, (2011) · Zbl 1232.91603
[2] B\(\ddot{a}\)uerle, N., Rieder, U.: Markov Decision Processes with Applications to Finance. Universitext, Springer, Heidelberg (2011) · Zbl 0676.90096
[3] B\(\ddot{a}\)uerle, N., Ott, J.: Markov decision processes with average-value-at-risk criteria. Math. Methods Oper. Res. 74, 361-379 (2011) · Zbl 1259.49035
[4] Bielecki, T; Hernández-Hernández, D; Pliska, SR, Risk sensitive control of finite state Markov chains in discrete time, with applications to portfolio management. financial optimization, Math. Methods Oper. Res., 50, 16-188, (1999) · Zbl 0959.91029
[5] Collins, EJ, Finite-horizon variance penalised Markov decision processes, Oper. Res. Spektrum, 19, 35-39, (1997) · Zbl 0894.90161
[6] Filar, JA; Kallenberg, LCM; Lee, HM, Variance-penalized Markov decision processes, Math. Oper. Res., 14, 147-161, (1989) · Zbl 0676.90096
[7] Guo, X.P., Hernández-Lerma, O.: Continuous-Time Markov Decision Processes: Theory and Applications. Springer, Berlin (2009) · Zbl 1209.90002
[8] Guo, XP; Song, XY, Mean-variance criteria for finite continuous-time Markov decision processes, IEEE Trans. Automat. Contr., 54, 2151-2157, (2009) · Zbl 1367.90113
[9] Guo, XP; Ye, L; Yin, G, A Mean-variance optimization problem for discounted Markov decision processes, Eur. J. Oper. Res., 220, 423-429, (2012) · Zbl 1253.90214
[10] Hernández-Lerma, O., Lasserre, J.B.: Further Topics on Discrete-Time Markov Control Processes. Springer, New York (1999) · Zbl 0928.93002
[11] Hernández-Lerma, O; Vega-Amaya, O; Carrasco, G, Sample-path optimality and variance-minimization of average cost Markov control processes, SIAM J. Control Optim., 38, 79-93, (1999) · Zbl 0951.93074
[12] Huang, YH; Guo, XP, Optimal risk probability for first passage models in semi-Markov decision processes, J. Math. Anal. Appl., 359, 404-420, (2009) · Zbl 1176.90625
[13] Huang, YH; Guo, XP, Finite horizon semi-Markov decision processes with application to maintenance systems, Eur. J. Oper. Res., 212, 131-140, (2011) · Zbl 1237.90249
[14] Huang, YH; Guo, XP; Li, ZF, Minimum risk probability for finite horizon semi-Markov decision processes, J. Math. Anal. Appl., 402, 378-391, (2013) · Zbl 1267.90169
[15] Kharroubi, I; Lim, T; Ngoupeyou, A, Mean-variance hedging on uncertain time horizon in a market with a jump, Appl. Math. Optim., 68, 413-444, (2013) · Zbl 1282.93231
[16] Kurano, M, Markov decision processes with a minimum-variance criterion, J. Math. Anal. Appl., 123, 573-583, (1987) · Zbl 0619.90080
[17] Markowitz, H.M.: Portfolio Choice: Efficient Diversification of Investment. Wiley, New York (1959)
[18] Markowitz, H.M.: Mean-Variance Analysis in Portfolio Choice and Capital Markets. Basil Blackwell, Oxford (1987) · Zbl 0757.90003
[19] Mamer, JW, Successive approximations for finite horizon semi-Markov decision processes with application to asset liquidation, Oper. Res., 34, 638-644, (1986) · Zbl 0622.90089
[20] Prieto-Rumeau, T; Hernández-Lerma, O, Variance minimization and the overtaking optimality approach to continuous-time controlled Markov chains, Math. Methods Oper. Res., 70, 527-540, (2009) · Zbl 1177.93101
[21] Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994) · Zbl 0829.90134
[22] Ruszczyński, A, Risk-averse dynamic programming for Markov decision processes, Math. Program. Ser. B, 125, 235-261, (2010) · Zbl 1207.49032
[23] Sladký, K, On Mean reward variance in semi-Markov processes, Math. Methods Oper. Res., 62, 387-397, (2005) · Zbl 1080.90084
[24] Sobel, MJ, The variance of discounted Markov decision processes, J. Appl. Probab., 19, 794-802, (1982) · Zbl 0503.90091
[25] Sobel, MJ, Mean-variance tradeoffs in an undiscounted MDP, Oper. Res., 42, 175-183, (1994) · Zbl 0798.90130
[26] Stefan, A; Azzouz, D, Multiperiod Mean-variance portfolio optimization via market cloning, Appl. Math. Optim., 64, 135-154, (2011) · Zbl 1232.91604
[27] Dijk, NM; Sladký, K, On the total reward variance for continuous-time Markov reward chains, J. Appl. Probab., 43, 1044-1052, (2006) · Zbl 1169.90479
[28] White, DJ, Computational approaches to variance-penalised Markov decision processes, Oper. Res. Spektrum, 14, 79-83, (1992) · Zbl 0768.90087
[29] Zeng, Y; Li, ZF, Optimal time-consistent investment and reinsurance policies for Mean-variance insurers, Insur. Math. Econ., 49, 145-154, (2011) · Zbl 1218.91167
[30] Zhou, XY; Li, D, Continuous-time Mean-variance portfolio selection: a stochastic LQ framework, Appl. Math. Optim., 42, 19-33, (2000) · Zbl 0998.91023
[31] Zhou, XY; Yin, G, Markowitz’s Mean-variance portfolio selection with regime switching: a continuous-time model, SIAM J. Control Optim., 42, 1466-1482, (2003) · Zbl 1175.91169
[32] Zhu, QX; Guo, XP, Markov decision processes with variance minimization: a new condition and approach, Stoch. Anal. Appl., 25, 577-592, (2007) · Zbl 1152.90646
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.