Index-based policies for discounted multi-armed bandits on parallel machines. (English) Zbl 1073.90568

Summary: We utilize and develop elements of the recent achievable region account of Gittins indexation by Bertsimas and Niño-Mora to design index-based policies for discounted multi-armed bandits on parallel machines. The policies analyzed have expected rewards which come within an \(O(\alpha)\) quantity of optimality, where \(\alpha > 0\) is a discount rate. In the main, the policies make an initial once for all allocation of bandits to machines, with each machine then handling its own workload optimally. This allocation must take careful account of the index structure of the bandits. The corresponding limit policies are average-overtaking optimal.


90C40 Markov and semi-Markov decision processes
90B36 Stochastic scheduling theory in operations research
90C47 Minimax problems in mathematical programming
Full Text: DOI


[1] Bertsimas, D. and Ni no-Mora J. (1996). Conservation laws, extended polymatroids and multiarmed bandit problems; a polyhedral approach to indexable systems. Math. Oper. Res. 21 257-306. JSTOR: · Zbl 0857.90055
[2] Blackwell D. (1962). Discrete dynamic programming. Ann. Math. Stat. 33 719-726. · Zbl 0133.12906
[3] Coffman, E. and Mitrani, I. (1980). A characterization of waiting time performance realizable by single server queues. Oper. Res. 28 810-821. JSTOR: · Zbl 0451.90059
[4] Denardo, E. V. and Miller, B. L. (1968). An optimality criterion for discrete dynamic programming with no discounting, Ann. Math. Stat. 39 1220-1227. · Zbl 0167.18402
[5] Gittins, J. C. and Jones, D. M. (1974). A dynamicallocation index for the sequential design of experiments. In Progress in Statistics: European Meeting of Statisticians, Budapest, 1972 (J. Gani, K. Sarkadi and I. Vince, eds.) 241-266. North-Holland, Amsterdam. Glazebrook, K. D. (1976) Stochastic scheduling. Ph.D. thesis, Cambridge Univ. · Zbl 0303.62064
[6] Glazebrook, K. D. and Garbe, R. (1999). Almost optimal policies for stochastic systems which almost satisfy conservation laws. Ann. Oper. Res. 92 19-43. · Zbl 0970.90021
[7] Shanthikumar, J. G. and Yao, D. D. (1992). Multi-class queueing systems: polymatroidal structure and optimal scheduling control. Oper. Res. 40 293-299. · Zbl 0764.90036
[8] Veinott, Jr, A. F. (1966). On finding optimal policies in discrete dynamic programming with no discounting. Ann. Math. Stat. 37 1284-1294. Weber, R. R. (1982) Scheduling jobs with stochastic processing requirements on parallel machines to minimize makespan or flowtime. J. Appl. Probab. 19 167-182. Weber, R. R., Varaiya, P. and Walrand, J. (1986) Scheduling jobs with stochastically ordered processing times on parallel machines to minimize expected flowtime. J. Appl. Probab. 23 841-847. Weiss, G. (1990) Approximation results in parallel machines stochastic scheduling. Ann. Oper. Res. Special Volume on Production Planning and Scheduling (M. Queyranne, ed.) 26 195-242. Weiss, G. (1992) Turnpike optimality of Smith’s rule in parallel machines stochastic scheduling. Math. Oper. Res. 17 255-270. Weiss, G. (1995) On almost optimal priority rules for preemptive scheduling of stochastic jobs on parallel machines. Adv. Appl. Probab. 27 821-839. · Zbl 0149.16301
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.