A mean-variance optimization problem for discounted Markov decision processes. (English) Zbl 1253.90214

Summary: We consider a mean-variance optimization problem for Markov decision processes (MDPs) over the set of (deterministic stationary) policies. Different from the usual formulation in MDPs, we aim to obtain the mean-variance optimal policy that minimizes the variance over a set of all policies with a given expected reward. For continuous-time MDPs with the discounted criterion and finite-state and action spaces, we prove that the mean-variance optimization problem can be transformed to an equivalent discounted optimization problem using the conditional expectation and Markov properties. Then, we show that a mean-variance optimal policy and the efficient frontier can be obtained by policy iteration methods with a finite number of iterations. We also address related issues such as a mutual fund theorem and illustrate our results with an example.


90C40 Markov and semi-Markov decision processes
Full Text: DOI


[1] Altman, E., Constrained Markov decision processes, (1999), Chapman & Hall CRC · Zbl 0963.90068
[2] Beauquier, D.; Burago, D.; Slissenko, A., On the complexity of finite memory policies for Markov decision processes, (), 191-200 · Zbl 1193.68126
[3] Bernstein, D.S.; Givan, R.; Immerman, N.; Zilberstein, S., The complexity of decentralized control of Markov decision processes, Mathematics operations research, 27, 4, 819-840, (2002) · Zbl 1082.90593
[4] Bertsimas, D.; Tsitsiklis, J.N., Introduction to linear optimization, (1997), Athena Scientific Belmont, MA
[5] Bielecki, T.R.; Jin, H.Q.; Pliska, S.R.; Zhou, X.Y., Continuous-time Mean-variance portfolio selection with bankruptcy prohibition, Mathematical finance, 15, 2, 213-244, (2005) · Zbl 1153.91466
[6] Blondel, V.D.; Tsitsiklis, J.N., A survey of computational complexity results in systems and control, Automatica, 36, 9, 1249-1274, (2000) · Zbl 0989.93006
[7] Chen, R.C.; Feinberg, E.A., Non-randomized policies for constrained Markov decision processes, Mathematical methods of operations research, 66, 1, 165-179, (2007) · Zbl 1126.90074
[8] Chung, K.L., Markov chains with stationary transition probabilities, (1967), Springer Berlin · Zbl 0146.38401
[9] Costa, O.L.V.; Maiali, A.C.; Pinto, A.; de, C., Sampled control for mean – variance hedging in a jump diffusion financial market, IEEE transactions on automatic control, 55, 7, 1704-1709, (2010) · Zbl 1368.93448
[10] Feinberg, E.A.; Shwartz, A., Handbook of Markov decision processes, (2002), Kluwer Academic Publishers Boston, MA · Zbl 0979.90001
[11] Fu, Ch.P.; Lari-Lavassani, A.; Li, X., Dynamic mean – variance portfolio selection with borrowing constraint, European journal of operations research, 200, 1, 312-319, (2010) · Zbl 1183.91192
[12] Guo, X.P., Constrained optimality for average cost continuous-time Markov decision processes, IEEE transactions on automatic control, 52, 1139-1143, (2007) · Zbl 1366.90217
[13] Guo, X.P.; Hernández-Lerma, O., Continuous-time Markov decision processes, (2009), Springer-Verlag Berlin Heidelberg
[14] Guo, X.P.; Hernández-Lerma, O.; Prieto-Rumeau, T., A survey of recent results on continuous-time Markov decision processes, Top, 14, 177-246, (2006) · Zbl 1278.90427
[15] Guo, X.P.; Song, X.Y., Discounted continuous-time constrained Markov decision processes in Polish spaces, Annals of applied probability, 21, 2016-2049, (2011) · Zbl 1258.90104
[16] Guo, X.P.; Song, X.Y., Mean – variance criteria for finite continuous-time Markov decision processes, IEEE transactions on automatic control, 54, 2151-2157, (2009) · Zbl 1367.90113
[17] Hernández-Lerma, O.; Lasserre, J.B., Further topics on discrete-time Markov control processes, (1999), Springer-Verlag New York · Zbl 0928.93002
[18] Hernández-Lerma, O.; Vega-Amaya, O.; Carrasco, G., Sample-path optimality and variance-minimization of average cost Markov control processes, SIAM journal on control and optimization, 38, 79-93, (1999) · Zbl 0951.93074
[19] Kitaev, M.Y.; Rykov, V.V., Controlled queueing systems, (1995), CRC Press · Zbl 0876.60077
[20] S. Mannor, J.N. Tsitsiklis, Mean – variance optimality in Markov decision processes, in: Proceedings of 28-th International Conference on Machine Learning. Bellevue, WA, USA, 2011. · Zbl 1317.90318
[21] Markowitz, H.M., Mean – variance analysis in portfolio choice and capital markets, (1987), Basil Blackwell Oxford, UK · Zbl 0757.90003
[22] Mundhenk, M.; Goldsmith, J.; Lusena, C.; Allender, E., Complexity of finite-horizon Markov decision process problems, Journal of the ACM, 47, 4, 681-720, (2000) · Zbl 1327.68136
[23] Papadimitriou, C.H.; Tsitsiklis, J.N., The complexity of Markov decision processes, Mathematics operations research, 12, 441-450, (1987) · Zbl 0638.90099
[24] Puterman, M.L., Markov decision processes, (1994), Wiley New York · Zbl 0336.93047
[25] Prieto-Rumeau, T.; Hernández-Lerma, O., Variance minimization and overtaking optimaity approach to continuous-time controlled Markov chains, Mathematical methods operations research, 70, 3, 527-540, (2009) · Zbl 1177.93101
[26] Sennott, L.I., Stochastic dynamic programming and the control of queueing system, (1999), Wiley New York · Zbl 0997.93503
[27] Sobel, M.J., The variance of discounted Markov decision proceses, Journal of applied probability, 19, 794-802, (1982) · Zbl 0503.90091
[28] Xia, J.M.; Yan, J.-A., Markowitz’s portfolio optimization in an incomplete market, Mathematical finance, 16, 1, 203-216, (2006) · Zbl 1128.91030
[29] Xiong, J.; Zhou, X.Y., Mean – variance portfolio selection under partial information, SIAM journal on control and optimization, 46, 1, 156-175, (2007) · Zbl 1142.91007
[30] Yin, G.; Zhou, X.Y., Markowitz’s Mean-variance portfolio selection with regime switching: from discrete-time models to their continuous-time limits, IEEE transactions on automatic control, 49, 3, 49-360, (2004) · Zbl 1366.91148
[31] Zhou, X.Y.; Li, D., Continuous-time Mean-variance portfolio selection: a stochastic LQ framework, Applied mathematics and optimization, 42, 19-33, (2000) · Zbl 0998.91023
[32] Zhang, L.L.; Guo, X.P., Constrained continuous-time Markov decision processes with average criterion, Mathematical methods of operations research, 67, 323-340, (2008) · Zbl 1143.90033
[33] Zhou, X.Y.; Yin, G., Markowitz Mean-variance portfolio selection with regime switching: a continuous-time model, SIAM journal on control optimization, 42, 1466-1482, (2003) · Zbl 1175.91169
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.