Bandit problems with infinitely many arms. (English) Zbl 0881.62083

Summary: We consider a bandit problem consisting of a sequence of \(n\) choices from an infinite number of Bernoulli arms, with \(n\to\infty\). The objective is to minimize the long-run failure rate. The Bernoulli parameters are independent observations from a distribution \(F\). We first assume \(F\) to be the uniform distribution on (0,1) and consider various extensions. In the uniform case we show that the best lower bound for the expected failure proportion is between \(\sqrt 2/ \sqrt n\) and \(2/ \sqrt n\) and we exhibit classes of strategies that achieve the latter.


62L05 Sequential statistical design
62C25 Compound decision problems in statistical decision theory
60F99 Limit theorems in probability theory
Full Text: DOI