##
**Markov decision processes with a minimum-variance criterion.**
*(English)*
Zbl 0619.90080

The paper considers the optimization of the variance of the sum of costs as well as that of an average expected cost in Markov decision processes with unbounded cost. In case of general state and action space, the stationary policy which makes the average variances as small as possible in the class of policies which are \(\epsilon\)-optimal in an average expected cost is found.

Reviewer: H.Weiner

### MSC:

90C40 | Markov and semi-Markov decision processes |

### Keywords:

variance of the sum of costs; average expected cost; unbounded cost; general state and action space; stationary policy; \(\epsilon \)-optimal
Full Text:
DOI

### References:

[1] | Bertsekas, D.P; Shreve, S.E, Stochastic optimal control—the discrete time case, (1978), Academic Press New York · Zbl 0471.93002 |

[2] | Dekker, R; Hordijk, A, Blackwell optimality in denumerable Markov decision chains, (), 484 |

[3] | Doob, J.L, Stochastic processes, (1953), Wiley New York · Zbl 0053.26802 |

[4] | Federgruen, A; Hordijk, A; Tijms, H.C, Denumerable state semi-Markov decision processes with unbounded costs, average cost criterion, Stochastic process. appl., 9, 223-235, (1979) · Zbl 0422.90084 |

[5] | Hordijk, A, Dynamic programming and Markov potential theory, () · Zbl 0284.49012 |

[6] | Jaquette, S.C, Markov decision processes with a new optimality criterion: discrete time, Ann. statist., 1, 496-505, (1973) · Zbl 0259.90054 |

[7] | Kurano, M, Markov decision processes with a Borel measurable cost function—the average case, Math. oper. res., 11, 309-320, (1986) · Zbl 0607.90087 |

[8] | Kurano, M, Semi-Markov decision precesses with a reachable state-subclass, () · Zbl 0677.90087 |

[9] | Loève, M, Probability theory, (1960), Van Nostrand New York · Zbl 0108.14202 |

[10] | Mandl, P, On the variance of controlled Markov chains, Kybernetika, 7, 1-12, (1971) · Zbl 0215.25902 |

[11] | Mandl, P, Estimation and control in Markov chains, Advan. appl. probab., 6, 40-60, (1974) · Zbl 0281.60070 |

[12] | {\scJ. Van Nunen and J. Wessels}, Markov Decision Processes with Unbounded Rewards, “Markov Decision Theory” (H. Tijms and J. Wessels, Eds.), pp. 1-24. Math. Centre Tract, No. 93, Mathematisch Centrum, Amsterdam. · Zbl 0304.90118 |

[13] | Shreve, S.E; Bertskas, D.P, Alternative theoretical frameworks for finite horizon discrete-time stochastic optimal control, SIAM J. control optim., 16, 953-978, (1978) · Zbl 0405.93044 |

[14] | Sobel, M.J, The variance of discounted Markov decision precesses, J. appl. probab., 19, 794-802, (1982) · Zbl 0503.90091 |

[15] | Veinott, A.F, Discrete dynamic proramming with sensitive discounted optimality criterion, Ann. math. statist., 40, 1635-1660, (1969) · Zbl 0183.49102 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.