##
**Some monotonicity properties of parametric and nonparametric Bayesian bandits.**
*(English)*
Zbl 1383.62027

The paper is concentrated on various properties of sequential decision procedures in the Bayesian framework for parametric and nonparametric two-armed bandit problems. One of two independent stochastic processes (arms) is to be selected sequentially at each stage of \(n\) stages and the selection decision depends on the past observations and the prior information. The objective is to maximize the expected future-discounted sum of the \(n\) observations. The author studies the structural properties of the classical bandit problem in the Bayesian framework, for example, how the maximum expected payoff and the optimal strategy vary with the priors, in two cases: (a) observations from each arm have an exponential family distribution, and different arms are assigned conjugate priors; (b) observations from each arm have a nonparametric distribution, and different arms are assigned independent Dirichlet process priors. The following results are noted: (i) for a particular arm with fixed prior weight, the maximum expected payoff increases as the prior mean yield increases; (ii) for a fixed prior mean yield, the maximum expected payoff increases as the prior weight decreases. Some specializations and the resulting properties are noted. These results generalize the works of J. Gittins and Y.-G. Wang [Ann. Stat. 20, No. 3, 1625–1636 (1992; Zbl 0760.62080)] and M. K. Clayton and D. A. Berry [ibid. 13, 1523–1534 (1985; Zbl 0587.62151)].

Reviewer: Rasul A. Khan (Forest Glen)

### MSC:

62C10 | Bayesian problems; characterization of Bayes procedures |

62L10 | Sequential statistical analysis |

62L15 | Optimal stopping in statistics |

62G05 | Nonparametric estimation |

60G40 | Stopping times; optimal stopping problems; gambling theory |