×

Convergence rate of linear two-time-scale stochastic approximation. (English) Zbl 1094.62103

Summary: We study the rate of convergence of linear two-time-scale stochastic approximation methods. We consider two-time-scale linear iterations driven by i.i.d. noise, prove some results on their asymptotic covariance and establish asymptotic normality. The well-known result [B. T. Polyak, Autom. Remote Control 51, No. 7, 937–946 (1990; Zbl 0737.93080); D. Ruppert, Tech. Rep. 781, School Oper. Res. Indust. Eng., Cornell Univ. (1988)] on the optimality of Polyak-Ruppert averaging techniques specialized to linear stochastic approximation is established as a consequence of the general results in this paper.

MSC:

62L20 Stochastic approximation
60F05 Central limit and other weak theorems

Citations:

Zbl 0737.93080
PDF BibTeX XML Cite
Full Text: DOI arXiv

References:

[1] Baras, J. S. and Borkar, V. S. (2000). A learning algorithm for Markov decision processes with adaptive state aggregation. In Proc. 39th IEEE Conference on Decision and Control . IEEE, New York.
[2] Benveniste, A., Metivier, M. and Priouret, P. (1990). Adaptive Algorithms and Stochastic Approximations . Springer, Berlin. · Zbl 0752.93073
[3] Bhatnagar, S., Fu, M. C., Marcus, S. I. and Bhatnagar, S. (2001). Two timescale algorithms for simulation optimization of hidden Markov models. IIE Transactions 3 245–258.
[4] Bhatnagar, S., Fu, M. C., Marcus, S. I. and Fard, P. J. (2001). Optimal structured feedback policies for ABR flow control using two timescale SPSA. IEEE/ACM Transactions on Networking 9 479–491.
[5] Borkar, V. S. (1997). Stochastic approximation with two time scales. Systems Control Lett. 29 291–294. · Zbl 0895.62085
[6] Duflo, M. (1997). Random Iterative Models . Springer, Berlin. · Zbl 0868.62069
[7] Kokotovic, P. V. (1984). Applications of singular perturbation techniques to control problems. SIAM Rev. 26 501–550. · Zbl 0548.93001
[8] Konda, V. R. (2002). Actor-critic algorithms. Ph.D. dissertation, Dept. Electrical Engineering and Computer Science, MIT. · Zbl 1049.93095
[9] Konda, V. R. and Borkar, V. S. (1999). Actor-critic like learning algorithms for Markov decision processes. SIAM J. Control Optim. 38 94–123. · Zbl 0938.93069
[10] Konda, V. R. and Tsitsiklis, J. N. (2003). On actor-critic algorithms. SIAM J. Control Optim. 42 1143–1166. · Zbl 1049.93095
[11] Kushner, H. J. and Clark, D. S. (1978). Stochastic Approximation for Constrained and Unconstrained Systems . Springer, New York. · Zbl 0381.60004
[12] Kushner, H. J. and Yang, J. (1993). Stochastic approximation with averaging of the iterates: Optimal asymptotic rates of convergence for general processes. SIAM J. Control Optim. 31 1045–1062. · Zbl 0788.62078
[13] Kushner, H. J. and Yin, G. G. (1997). Stochastic Approximation Algorithms and Applications . Springer, New York. · Zbl 0914.60006
[14] Nevel’son, M. B. and Has’minskii, R. Z. (1973). Stochastic Approximation and Recursive Estimation . Amer. Math. Soc., Providence, RI. · Zbl 0355.62075
[15] Polyak, B. T. (1976). Convergence and convergence rate of iterative stochastic algorithms I. Automat. Remote Control 12 1858–1868. · Zbl 0363.93054
[16] Polyak, B. T. (1990). New method of stochastic approximation type. Automat. Remote Control 51 937–946. · Zbl 0737.93080
[17] Polyak, B. T. and Juditsky, A. B. (1992). Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30 838–855. · Zbl 0762.62022
[18] Ruppert, D. (1988). Efficient estimators from a slowly convergent Robbins–Monro procedure. Technical Report 781, School of Operations Research and Industrial Engineering, Cornell Univ.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.