Monotone policies and indexability for bidirectional restless bandits. (English) Zbl 1274.90473

Authors’ abstract: Motivated by a wide range of applications, we consider a development of Whittle’s restless bandit model in which project activation requires a state-dependent amount of a key resource, which is assumed to be available at a constant rate. As many projects may be activated at each decision epoch as resource availability allows. We seek a policy for project activation within resource constraints which minimises an aggregate cost rate for the system. Project indices derived from a Lagrangian relaxation of the original problem exist provided the structural requirement of indexability is met. Verification of this property and derivation of the related indices is greatly simplified when the solution of the Lagrangian relaxation has a state monotone structure for each constituent project. We demonstrate that this is indeed the case for a wide range of bidirectional projects in which the project state tends to move in a different direction when it is activated from that in which it moves when passive. This is natural in many application domains in which activation of a project ameliorates its condition, which otherwise tends to deteriorate or deplete. In some cases the state monotonicity required is related to the structure of state transitions, while in others it is also related to the nature of costs. Two numerical studies demonstrate the value of the ideas for the construction of policies for dynamic resource allocation, most especially in contexts which involve a large number of projects.


90C40 Markov and semi-Markov decision processes
49L20 Dynamic programming in optimal control and differential games
90C39 Dynamic programming
49M20 Numerical methods of relaxation type
Full Text: DOI Euclid


[1] Ansell, P., Glazebrook, K. D., Niño-Mora, J. and O’Keeffe, M. (2003). Whittle’s index policy for a multi-class queueing system with convex holding costs. Math. Meth. Operat. Res. 57 , 21-39. · Zbl 1023.90010
[2] Archibald, T. W., Black, D. P. and Glazebrook, K. D. (2009). Indexability and index heuristics for a simple class of inventory routing problems. Operat. Res. 57 , 314-326. · Zbl 1181.90004
[3] Dacre, M., Glazebrook, K. and Niño-Mora, J. (1999). The achievable region approach to the optimal control of stochastic systems (with discussion). J. R. Statist. Soc. B 61 , 747-791. · Zbl 0934.93071
[4] Gittins, J. C. (1979). Bandit processes and dynamic allocation indices (with discussion). J. R. Statist. Soc. B 41 , 148-177. · Zbl 0411.62055
[5] Gittins, J. C. (1989). Multi-Armed Bandit Allocation Indices . John Wiley, Chichester. · Zbl 0699.90068
[6] Glazebrook, K. D. and Minty, R. (2009). A generalized Gittins index for a class of multiarmed bandits with general resource requirements. Math. Operat. Res. 34 , 26-44. · Zbl 1218.90208
[7] Glazebrook, K. D., Kirkbride, C. and Ruiz-Hernandez, D. (2006). Spinning plates and squad systems: policies for bidirectional restless bandits. Adv. Appl. Prob. 38 , 95-115. · Zbl 1105.90103
[8] Glazebrook, K. D., Niño-Mora, J. and Ansell, P. S. (2002). Index policies for a class of discounted restless bandits. Adv. Appl. Prob. 34 , 754-774. · Zbl 1053.90048
[9] Jacko, P. (2009). Marginal productivity index policies for dynamic priority allocation in restless bandit models. Doctoral thesis, Universidad Carlos III de Madrid.
[10] Le Ny, J., Dahleh, M. and Feron, E. (2008). Multi-UAV dynamic routing with partial observations using restless bandit allocation indices. In 2008 American Control Conference, pp. 4220-4225.
[11] Liu, K. and Zhao, Q. (2010). Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access. IEEE Trans. Inf. Theory 56 , 5547-5567. · Zbl 1366.94390
[12] Niño-Mora, J. (2001). Restless bandits, partial conservation laws and indexability. Adv. Appl. Prob. 33 , 76-98. · Zbl 1039.90019
[13] Niño-Mora, J. (2007). Dynamic priority allocation via restless bandit marginal productivity indices. TOP 15 , 161-198. · Zbl 1142.90015
[14] Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming . John Wiley, New York. · Zbl 0829.90134
[15] Weber, R. R. and Weiss, G. (1990). On an index policy for restless bandits. J. Appl. Prob. 27 , 637-648. · Zbl 0735.90072
[16] Weber, R. R. and Weiss, G. (1991). Addendum to ‘On an index policy for restless bandits’. Adv. Appl. Prob. 23 , 429-430. · Zbl 0727.90090
[17] Whittle, P. (1988). Restless bandits: activity allocation in a changing world. In A Celebration of Applied Probability ( J. Appl. Prob. Spec. Vol. 25A ), ed. J. Gani, Applied Probability Trust, Sheffield, pp. 287-298. · Zbl 0664.90043
[18] Whittle, P. (1996). Optimal Control . John Wiley, Chichester. · Zbl 0880.49001
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.