×

zbMATH — the first resource for mathematics

Estimation and adaptive control on span-contracting Markov decision processes. (English) Zbl 0744.90099
The author studies an adaptive control problem for a discrete time Markov decision process with a finite state space. He presents a successive approximation method under a multi-step span-contraction assumption. This assumption is weaker than a one-step span-contraction assumption considered earlier.
MSC:
90C40 Markov and semi-Markov decision processes
PDF BibTeX XML Cite
Full Text: EuDML Link
References:
[1] R. S. Acosta-Abreu, O. Hernandez-Lerma: Iterative adaptive control of denumerable state average-cost Markov systems. Control Cybernet. 14 (1985), 313 - 322. · Zbl 0606.90130
[2] V. V. Baranov: Recursive algorithms of adaptive control in stochastic systems. Cybernetics 17 (1981), 815-824. · Zbl 0531.93067
[3] A. Federgruen: Markovian Control Problems. Math. Centre Tracts 97, Amsterdam 1983. · Zbl 0541.90068
[4] A. Federgruen, P. J. Schweitzer: Nonstationary Markov decision problems with converging parameters. J. optim. Theory Appl. 34 (1981), 207-241. · Zbl 0426.90091
[5] A. Federgruen P. J. Schweitzer, H. C Tijms: Contraction mappings underlying undiscounted Markov decision problems. J. Math. Anal. Appl. 65 (1978), 711 - 730. · Zbl 0388.90084
[6] A. Federgruen, H. C Tijms: The optimality equation in average cost denumerable state semi-Markov decision problems, recurrency conditions and algorithms. J. Appl. Probab. 15 (1978), 356-373. · Zbl 0386.90060
[7] O. Hernandez-Lerma: Adaptive Control Processes. Springer-Verlag, Berlin-Heidelberg- New York 1989. · Zbl 0698.90053
[8] K. Hinderer: On approximate solutions of finite-stage dynamic programs. Dynamic Programming and its applications (M. L. Puterman, Academic Press, New York 1978, pp. 289-317. · Zbl 0461.90075
[9] G. Hiibner: Contraction properties of Markov decision models with applications to the elimination of non-optimal actions. Dynamische optimierung, Bonner Math. Schriften 98 (1977), 57-65.
[10] G. Hiibner: A unified approach to adaptive control of average reward Markov decision processes. OR Spektrum 10 (1988), 161-166.
[11] M. Kurano: Discrete-time Markovian decision processes with an unknown parameter - average return criterion. J. oper. Res. Soc. Japan 15 (1972), 67-76. · Zbl 0238.90006
[12] M. Kurano: Adaptive policies in Markov decision processes with uncertain matrices. J. Inf. Optim. 4 (1983), 21-40. · Zbl 0516.90077
[13] M. Kurano: Learning algorithms for Markov decision processes. J. Appl. Probab. 24 (1987), 270-276. · Zbl 0631.90085
[14] P. Mandl: Estimation and control of Markov chains. Adv. in Appl. Probab. 6 (1974), 40-60. · Zbl 0281.60070
[15] P. Mandl: On the adaptive control of countable Markov chains. Probability Theory, Banach Centre Publications, Warsaw 1979, pp. 159-173. · Zbl 0439.60069
[16] W. Whitt: Approximations of dynamic programs. Math. Oper. Res. 3 (1978), 231 - 243. · Zbl 0393.90094
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.