Optimal strategies for a class of sequential control problems with precedence relations. (English) Zbl 1209.62178

Summary: Consider the following multi-phase project management problem. Each project is divided into several phases. All projects enter the next phase at the same point chosen by the decision maker based on observations up to that point. Within each phase, one can pursue the projects in any order. When pursuing the project with one unit of resource, the project state changes according to a Markov chain. The probability distribution of the Markov chain is known up to an unknown parameter. When pursued, the project generates a random reward depending on the phase and the state of the project and the unknown parameter. The decision maker faces two problems: (a) how to allocate resources to projects within each phase, and (b) when to enter the next phase, so that the total expected reward is as large as possible. In this paper we formulate the preceding problem as a stochastic scheduling problem and propose asymptotic optimal strategies, which minimize the shortfall from perfect information payoff. Concrete examples are given to illustrate our method.


62L05 Sequential statistical design
93E20 Optimal stochastic control
62L99 Sequential statistical methods
60J20 Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.)
90B36 Stochastic scheduling theory in operations research
Full Text: DOI arXiv


[1] Agrawal, R., Teneketzis, D. and Anantharam, V. (1989). Asymptotically efficient adaptive allocation schemes for controlled i.i.d. processes: Finite parameter space. IEEE Trans. Automat. Control 34 258–267. · Zbl 0666.93077 · doi:10.1109/9.16415
[2] Agrawal, R., Teneketzis, D. and Anantharam, V. (1989). Asymptotically efficient adaptive allocation schemes for controlled Markov chains: Finite parameter space. IEEE Trans. Automat. Control 34 1249–1259. · Zbl 0689.93039 · doi:10.1109/9.40770
[3] Anantharam, V., Varaiya, P. and Walrand, J. (1987). Asymptotically efficient allocation rules for the multi-armed bandit problem with multiple plays. I. I.I.D. rewards. II. Markovian rewards. IEEE Trans. Automat. Control 32 968–976, 977–982., Mathematical Reviews (MathSciNet): · Zbl 0632.93067 · doi:10.1109/TAC.1987.1104491
[4] Berry, D. A. and Fristedt, B. (1985). Bandit Problems . Sequentral Allocation of Experiments . Chapman and Hall, London. · Zbl 0659.62086
[5] Chan, H. P., Fuh, C. D. and Hu, I. (2006). Multi-armed bandit problem with precendence relations. In Times Series and Related Topics. In Memory of Ching-Zong Wei (H.-C. Ho, C.-K. Ing and T. L. Lai, eds.) 223–235. IMS, Beachwood, OH. · Zbl 1268.62092 · doi:10.1214/074921706000001067
[6] Feldman, D. (1962). Contributions to the “two-armed bandit” problem. Ann. Math. Statist. 33 847–856. · Zbl 0113.14801 · doi:10.1214/aoms/1177704454
[7] Fuh, C.-D. and Hu, I. (2000). Asymptotically efficient strategies for a stochastic scheduling problem with order constraints. Ann. Statist. 28 1670–1695. · Zbl 1105.62365 · doi:10.1214/aos/1015957475
[8] Gittins, J. C. (1989). Multi-Armed Bandit Allocation Indices. Wiley, Chichester. · Zbl 0699.90068
[9] Glazebrook, K. D. (1991). Strategy evaluation for stochastic scheduling problems with order constraints. Adv. in Appl. Probab. 23 86–104. JSTOR: · Zbl 0717.90091 · doi:10.2307/1427513
[10] Glazebrook, K. D. (1996). On the undiscounted tax problem with precedence constraints. Adv. in Appl. Probab. 28 1123–1144. JSTOR: · Zbl 0867.90124 · doi:10.2307/1428167
[11] Graves, T. and Lai, T. L. (1997). Asymptotically efficient adaptive choice of control laws in controlled Markov chains. SIAM J. Control Optim. 35 715–743. · Zbl 0876.93053 · doi:10.1137/S0363012994275440
[12] Hu, I. and Lee, C.-W. J. (2003). Bayesian adaptive stochastic process termination. Math. Oper. Res. 28 361–381. JSTOR: · Zbl 1082.90578 · doi:10.1287/moor.28.2.361.14481
[13] Hu, I. and Wei, C. Z. (1989). Irreversible adaptive allocation rules. Ann. Statist. 17 801–823. · Zbl 0689.62061 · doi:10.1214/aos/1176347144
[14] Kadane, J. B. and Simon, H. A. (1977). Optimal strategies for a class of constrained sequential problems. Ann. Statist. 5 237–255. · Zbl 0363.90063 · doi:10.1214/aos/1176343791
[15] Lai., T. L. (1987). Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. 15 1091–1114. · Zbl 0643.62054 · doi:10.1214/aos/1176350495
[16] Lai, T. L. and Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Adv. in Appl. Math. 6 4–22. · Zbl 0568.62074 · doi:10.1016/0196-8858(85)90002-8
[17] Mandelbaum, A. and Vanderbei, R. J. (1981). Optimal stopping and supermartingales over partially ordered sets. Z. Wachrsch. Verw. Gebiete 57 253–264. · Zbl 0445.60036 · doi:10.1007/BF00535493
[18] Meyn, S. P. and Tweedie, R. L. (1993). Markov Chains and Stochastic Stability. Springer, London. · Zbl 0925.60001
[19] Ney, P. and Nummelin, E. (1987). Markov additive processes. I. Eigenvalue properties and limit theorems. Ann. Probab. 15 561–592. · Zbl 0625.60027 · doi:10.1214/aop/1176992159
[20] Presman, È. L. and Sonin, I. N. (1990). Sequential Control with Incomplete Information. Academic Press, San Diego. · Zbl 0721.93003
[21] Robbins, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58 527–535. · Zbl 0049.37009 · doi:10.1090/S0002-9904-1952-09620-8
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.