×

Nonparametric bandit methods. (English) Zbl 0729.90089

The authors consider an infinite-horizon bandit problem within a nonparametric setting. Supposing K arms are available, each satisfying a probability bound, the sample plans proposed are shown to be asymptotically optimal and converge at guaranteed rates. In the bounded- arm case, the rate is optimal. Finally, the theory is extended to the case in which the bandit population is infinite.

MSC:

90C40 Markov and semi-Markov decision processes
60J10 Markov chains (discrete-time Markov processes on discrete state spaces)
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] J. Bather, Randomised allocation of treatments in sequential trials, Adv. Appl. Prob. 12 (1980) 174–182. · Zbl 0437.90065 · doi:10.2307/1426500
[2] R. Bellman,Adaptive Control Processes (Princeton University Press, Princeton, NJ, 1961). · Zbl 0103.12901
[3] D.A. Berry and B. Fristedt,Bandit Problems: Sequential Allocation of Experiments (Chapman-Hall, New York, 1985). · Zbl 0659.62086
[4] H. Chernoff,Sequential Analysis and Optimal Design (SIAM, Philadelphia, 1972). · Zbl 0265.62024
[5] Y.S. Chow and T.L. Lai, Some one-sided theorems on the tail distribution of sample sums with applications to the last time and largest excess of boundary crossings, Trans. AMS 208 (1975) 51–72. · Zbl 0335.60021 · doi:10.1090/S0002-9947-1975-0380973-6
[6] M.K. Clayton and D.A. Berry, Bayesian nonparametric bandits, Ann. Statist. 13 (1985) 1523–1534. · Zbl 0587.62151 · doi:10.1214/aos/1176349753
[7] L.P. Devroye, The uniform convergence of nearest neighbor regression function estimators and their application in optimization, IEEE Trans. Info. Theory IT-24 (2) (1978) 142–151. · Zbl 0375.62083 · doi:10.1109/TIT.1978.1055865
[8] D.K. Fuk and S.V. Nagaev, Probability inequalities for sums of independent random variables, Theory Probab. Appl. 16 (1971) 643–660. · Zbl 0259.60024 · doi:10.1137/1116071
[9] O. Hernández-Lerma,Adaptive Markov Control Processes (Springer, New York, 1989).
[10] J.D. Holland, Genetic algorithms and the optimal allocation of trials, SIAM J. Comput. 2 (2) (1973) 88–105. · Zbl 0259.90031 · doi:10.1137/0202009
[11] T.L. Lai, Adaptive treatment allocation and the multi-armed bandit problem, Ann. Statist. 15 (1987) 1091–1114. · Zbl 0643.62054 · doi:10.1214/aos/1176350495
[12] T.L. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Adv. Appl. Math. 6 (1985) 4–22. · Zbl 0568.62074 · doi:10.1016/0196-8858(85)90002-8
[13] T.L. Lai, H. Robbins and D. Siegmund, Sequential design of comparamtive clinical trials, in:Recent Advances in Statistics, eds. M. Rizvi, J. Rustagi, and D. Siegmund (Academic Press, New York, 1983). · Zbl 0538.62071
[14] A. Renyi,Probability Theory (Elsevier, New York, 1970).
[15] H. Robbins, Some aspects of the sequential design of experiments, Bull. AMS 58 (1952) 527–535. · Zbl 0049.37009 · doi:10.1090/S0002-9904-1952-09620-8
[16] S. Yakowitz, A statistical foundation for machine learning, with application to go-moku, Comput. Math. Appl. 17 (1989) 1085–1102. · Zbl 0688.68080 · doi:10.1016/0898-1221(89)90039-4
[17] S. Yakowitz,Mathematics of Adaptive Control Processes (Elsevier, New York, 1969). · Zbl 0195.17801
[18] S. Yakowitz and E. Lugosi, Random search in the presence of noise, with application to machine learning, SIAM J. Sci. Statist. Comput. (1990). · Zbl 0742.65053
[19] S. Yakowitz, T. Jawayardena and S. Li, Theory for automatic learning under Markov-dependent noise, with applications, submitted for publication.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.