Finite-horizon variance penalised Markov decision processes. (English) Zbl 0894.90161

Summary: We consider a finite horizon Markov decision process with only terminal rewards. We describe a finite algorithm for computing a Markov deterministic policy which maximizes the variance penalized reward and we outline a vertex elimination algorithm which can reduce the computation involved.


90C40 Markov and semi-Markov decision processes
Full Text: DOI


[1] Collins EJ, McNamara JM (1985) Finite-horizon dynamic optimisation when the terminal reward is a concave functional of the distribution of the final state. Department of Mathematics University of Bristol Report no. S-95-10 · Zbl 0904.90171
[2] Derman C (1970) Finite State Markovian Decision Processes. Academic Press, New York · Zbl 0262.90001
[3] Filar JA, Kallenberg LCM, Lee HM (1989) Variance-penalised Markov decision processes. Math Oper Res 14: 147-161 · Zbl 0676.90096
[4] Huang Y, Kallenberg LCM (1994) On finding optimal policies for Markov decision chains: a unifying framework for mean-variance-tradeoffs. Math Oper Res 19: 434-448 · Zbl 0842.90120
[5] McMullen, P.; Shephard, GC, Convex polytopes and the upper bound conjecture (1971), Cambridge · Zbl 0217.46702
[6] Sobel MJ (1982) The variance of a discounted Markov decision process. J Appl Probab 19: 794-802 · Zbl 0503.90091
[7] White DJ (1992) Computational approaches to variance penalised Markov decision processes. OR Spektrum 14: 79-83 · Zbl 0768.90087
[8] White DJ (1993) A mathematical programming approach to a problem in variance penalised Markov decision processes. OR Spektrum 15: 225-230 · Zbl 0793.90092
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.