Adaptive policies for discrete-time stochastic control systems with unknown disturbance distribution. (English) Zbl 0637.93075

Discrete-time stochastic control systems are considered. The disturbance or driving process is a sequence of independent and identically distributed (i.i.d.) random elements whose common distribution is unknown. The state is completely observable together with realizations of the driving process. Adaptive control policies for the problem of maximizing the discounted reward criterion are introduced and studied. These policies are asymptotically optimal and for each of them uniform approximations of the optimal reward function can be obtained.
Reviewer: V.Zhukovin


93E20 Optimal stochastic control
93C40 Adaptive control/observation systems
93C55 Discrete-time control/observation systems
60E99 Distribution theory
93C10 Nonlinear systems in control theory
93E10 Estimation and detection in stochastic control theory
Full Text: DOI


[1] Billingsley, P., Weak convergence of probability measures, (1968), Wiley New York · Zbl 0172.21201
[2] Billingsley, P.; Topsoe, F., Uniformity in weak convergence, Z. wahrsch. verw. geb., 7, 1-16, (1967) · Zbl 0147.15701
[3] Dynkin, E.B.; Yushkevich, A.A., Controlled Markov processes, (1979), Springer New York · Zbl 0073.34801
[4] Federgruen, A.; Schweitzer, P.J., Nonstationary Markov decision problems with converging parameters, J. optim. theory appl., 34, 207-241, (1981) · Zbl 0426.90091
[5] Gaenssler, P.; Stute, W., Empirical processes: a survey for i.i.d. random variables, Ann. probab., 7, 193-243, (1979) · Zbl 0402.60031
[6] Georgin, J.P., Estimation et contrble des chaines de Markov sur des espaces arbitraires, Lecture notes math. no.636, 71-113, (1978) · Zbl 0372.60094
[7] Gordienko, E.I., Adaptive strategies for certain classes of controlled Markov processes, Theory probab. appl., 29, 504-518, (1985) · Zbl 0577.93067
[8] Hernández-Lerma, O., Approximation and adaptive policies in discounted dynamic programming, Bol. soc. mat. mexicana, 30, 25-35, (1985) · Zbl 0641.90087
[9] Hernández-Lerma, O.; Cavazos-Cadena, R., Continuous dependence of stochastic control models on the noise distribution, Appl. math. optim., 16, (1987)
[10] Hernández-Lerma, O.; Marcus, S.I., Optimal adaptive control of priority assignment in queueing systems, Systems control lett., 4, 65-72, (1984) · Zbl 0529.90045
[11] Hernández-Lerma, O.; Marcus, S.I., Adaptive control of discounted Markov decision chains, J. optim. theory appl., 46, 227-235, (1985) · Zbl 0543.90093
[12] Himmelberg, C.J.; Parthasarathy, T.; Van Vleck, F.S., Optimal plans for dynamic programming problems, Math. oper. res., 1, 390-394, (1976) · Zbl 0368.90134
[13] Hinderer, K., Foundations of nonstationary dynamic programming with discrete time parameter, () · Zbl 0202.18401
[14] Royden, H.L., Real analysis, (1968), Macmillan New York · Zbl 0197.03501
[15] Schäl, M., Estimation and control in discounted stochastic dynamic programming, Stochastics, 20, 51-71, (1987) · Zbl 0621.90092
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.