×

Approximation and adaptive control of Markov processes: Average reward criterion. (English) Zbl 0633.90091

Several procedures to approximate the optimal value of average-reward controlled Markov processes with Borel state and control spaces are introduced. The procedures are then used to obtain (i) optimal policies, and (ii) optimal adaptive policies for control processes depending on unknown parameters. The latter include the well known “method of substituting the estimates into optimal stationary controls”. The approximation procedures are based on a nonstationary version of the value-iteration scheme.

MSC:

90C40 Markov and semi-Markov decision processes
PDFBibTeX XMLCite
Full Text: EuDML

References:

[1] R. S. Acosta Abreu: Control of Markov chains with unknown parameters and metric state space. Submitted for publication. In Spanish. · Zbl 0765.93080
[2] R. S. Acosta Abreu, O. Hernandez-Lerma: Iterative adaptive control of denumerable state average-cost Markov systems. Control. Cyber. 14 (1985), 313 - 322. · Zbl 0606.90130
[3] V. V. Baranov: Recursive algorithms of adaptive control in stochastic systems. Cybernetics 17 (1981), 815-824. · Zbl 0531.93067
[4] V. V. Baranov: A recursive algorithm in markovian decision processes. Cybernetics 18 (1982), 499-506. · Zbl 0517.90089 · doi:10.1007/BF01078680
[5] D. P. Bertsekas, S. E. Shreve: Stochastic Optimal Control- The Discrete Time Case. Academic Press, New York 1978. · Zbl 0471.93002
[6] A. Federgruen, P. J. Schweitzer: Nonstationary Markov decision problems with converging parameters. J. Optim. Theory Appl. 34 (1981), 207-241. · Zbl 0426.90091 · doi:10.1007/BF00935474
[7] A. Federgruen, H. C. Tijms: The optimality equation in average cost denumerable state semi-Markov decision problems, recurrency conditions and algorithms. J. Appl. Probab. 15 (1978), 356-373. · Zbl 0386.90060 · doi:10.2307/3213407
[8] P. J. Georgin: Contröle de chaines de Markov sur des espaces arbitraires. Ann. Inst. H. Poincare B 14 (1978), 255-277. · Zbl 0391.60066
[9] J. P. Georgin: Estimation et controle de chaines de Markov sur des espaces arbitraires. Lecture Notes Mathematics 636. Springer-Verlag, Berlin-Heidelberg-New York-Tokyo 1978, pp. 71-113.
[10] E. I. Gordienko: Adaptive strategies for certain classes of controlled Markov processes. Theory Probab. Appl. 29 (1985), 504-518. · Zbl 0577.93067 · doi:10.1137/1129064
[11] L. G. Gubenko, E. S. Statland: On controlled, discrete-time Markov decision processes. Theory Probab. Math. Statist. 7 (1975), 47-61. · Zbl 0359.93048
[12] O. Hernández-Lerma: Approximation and adaptive policies in discounted dynamic programming. Bol. Soc. Mat. Mexicana 30 (1985). In press. · Zbl 0641.90087
[13] O. Hernández-Lerma: Nonstationary value-iteration and adaptive control of discounted semi-Markov processes. J. Math. Anal. Appl. 112 (1985), 435-445. · Zbl 0581.90096 · doi:10.1016/0022-247X(85)90253-7
[14] O. Hernandez-Lerma, S. I. Marcus: Adaptive control of service in queueing systems. Syst. Control Lett. 3 (1983), 283-289. · Zbl 0534.90037 · doi:10.1016/0167-6911(83)90027-0
[15] O. Hernández-Lerma, S. I. Marcus: Optimal adaptive control of priority assignment in queueing systems. Syst. Control Lett. 4 (1984), 65 - 75. · Zbl 0529.90045
[16] O. Hernández-Lerma, S. I. Marcus: Adaptive policies for discrete-time stochastic control systems with unknown disturbance distribution. Submitted for publication, 1986. · Zbl 0637.93075 · doi:10.1016/0167-6911(87)90055-7
[17] O. Hernández-Lerma, S. I. Marcus: Nonparametric adaptive control of discrete-time partially observable stochastic systems. Submitted for publication, 1986. · Zbl 0675.93055 · doi:10.1016/0022-247X(89)90248-5
[18] C. J. Himmelberg T. Parthasarathy, F. S. Van Vleck: Optimal plans for dynamic programming problems. Math. Oper. Res. 1 (1976), 390-394. · Zbl 0368.90134 · doi:10.1287/moor.1.4.390
[19] K. Hinderer: Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter. (Lecture Notes in Operations Research and Mathematical Systems 33.) Springer-Verlag, Berlin-Heidelberg-New York 1970. · Zbl 0202.18401
[20] A. Hordijk P. J. Schweitzer, H. Tijms: The asymptotic behaviour of the minimal total expected cost for the denumerable state Markov decision model. J. Appl. Probab. 12 (1975), 298-305. · Zbl 0306.90078 · doi:10.2307/3212443
[21] P. R. Kumar: A survey of some results in stochastic adaptive control. SIAM J. Control Optim. 23 (1985), 329-380. · Zbl 0571.93038 · doi:10.1137/0323023
[22] M. Kurano: Discrete-time markovian decision processes with an unknown parameter - average return criterion. J. Oper. Res. Soc. Japan 15 (1972), 67-76. · Zbl 0238.90006
[23] M. Kurano: Average-optimal adaptive policies in semi-Markov decision processes including an unknown parameter. J. Oper. Res. Soc. Japan 28 (1985), 252-366. · Zbl 0579.90098
[24] P. Mandl: Estimation and control in Markov chains. Adv. Appl. Probab. 6 (1974), 40-60. · Zbl 0281.60070 · doi:10.2307/1426206
[25] P. Mandl: On the adaptive control of countable Markov chains. Probability Theory, Banach Centre Publications 5, PWB-Polish Scientific Publishers, Warsaw 1979, pp. 159- 173. · Zbl 0439.60069
[26] H. L. Royden: Real Analysis. Macmillan, New York 1968. · Zbl 0704.26006
[27] M. Schäl: Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal. Z. Wahrsch. verw. Gebiete 32 (1975), 179-196. · Zbl 0316.90080 · doi:10.1007/BF00532612
[28] M. Schäl: Estimation and control in discounted stochastic dynamic programming. Preprint No. 428, Institute for Applied Math., University of Bonn, Bonn 1981.
[29] H. C. Tijms: On dynamic programming with arbitrary state space, compact action space and the average reward as criterion. Report BW 55/75, Mathematisch Centrum, Amsterdam 1975.
[30] T. Ueno: Some limit theorems for temporally discrete Markov processes. J. Fac. Science, University of Tokyo 7 (1957), 449-462. · Zbl 0077.33201
[31] D. J. White: Dynamic programming, Markov chains, and the method of successive approximations. J. Math. Anal. Appl. 6 (1963), 373-376. · Zbl 0124.36404 · doi:10.1016/0022-247X(63)90017-9
[32] P. Mandl, G. Hiibner: Transient phenomena and self-optimizing control of Markov chains. Acta Universitatis Carolinae - Math, et Phys. 26 (1985), 1, 35-51. · Zbl 0618.90096
[33] A. Hordijk, H. Tijms: A modified form of the iterative method of dynamic programming. Ann. Statist. 3 (1975), 1, 203-208. · Zbl 0304.90115 · doi:10.1214/aos/1176343008
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.