×

A penalized bandit algorithm. (English) Zbl 1206.62139

Summary: We study a two armed-bandit recursive algorithm with penalty. We show that the algorithm converges towards its “target” although it always has a noiseless “trap”. Then, we elucidate the rate of convergence. For some choices of the parameters, we obtain a central limit theorem in which the limit distribution is characterized as the unique stationary distribution of a Markov process with jumps.

MSC:

62L05 Sequential statistical design
62L20 Stochastic approximation
60F05 Central limit and other weak theorems
65C60 Computational problems in statistics (MSC2010)