Dynamic treatment regimes: technical challenges and applications. (English) Zbl 1298.62189

Summary: Dynamic treatment regimes are of growing interest across the clinical sciences because these regimes provide one way to operationalize and thus inform sequential personalized clinical decision making. Formally, a dynamic treatment regime is a sequence of decision rules, one per stage of clinical intervention. Each decision rule maps up-to-date patient information to a recommended treatment. We briefly review a variety of approaches for using data to construct the decision rules. We then review a critical inferential challenge that results from nonregularity, which often arises in this area. In particular, nonregularity arises in inference for parameters in the optimal dynamic treatment regime; the asymptotic, limiting, distribution of estimators are sensitive to local perturbations. We propose and evaluate a locally consistent Adaptive Confidence Interval (ACI) for the parameters of the optimal dynamic treatment regime. We use data from the Adaptive Pharmacological and Behavioral Treatments for Children with ADHD Trial as an illustrative example. We conclude by highlighting and discussing emerging theoretical problems in this area.


62P10 Applications of statistics to biology and medical sciences; meta analysis
62F40 Bootstrap, jackknife and other resampling methods


bootlib; Approxrl; qLearn
Full Text: DOI Euclid


[1] Andrews, D.W. and Soares, G., Inference for Parameters Defined by Moment Inequalities Using Generalized Moment Selection., SSRN eLibrary , 2007.
[2] Andrews, D.W.K., Inconsistency of the bootstrap when a parameter is on the boundary of the parameter space., Econometrica , 68(2):399-405, 2000. · Zbl 1015.62044
[3] Andrews, D.W.K., Testing when a parameter is on the boundary of the maintained hypothesis., Econometrica , 69:683-734, 2001a. · Zbl 0999.62010
[4] Andrews, D.W.K., Testing when a parameter is on the boundary of the maintained hypothesis., Econometrica , 69(3):683-734, 2001b. · Zbl 0999.62010
[5] Andrews, D.W.K. and Guggenberger, P., Incorrect asymptotic size of subsampling procedures based on post-consistent model selection estimators., Journal of Econometrics , 152(1):19-27, 2009. · Zbl 1431.62203
[6] Anthony, M. and Bartlett, P.L., Neural Network Learning: Theoretical Foundations . Cambridge University Press, 1999. · Zbl 0968.68126
[7] Barto, A.G. and Dieterich, T., Reinforcement learning and its relation to supervised learning., Handbook of Learning and Approximate Dynamic Programming , pages 45-63, 2004.
[8] Bellman, R.E., Dynamic Programming . Princeton University Press, 1957. · Zbl 0077.13605
[9] Berger, R.L. and Boos, D., P values maximized over a confidence set for the nuisance parameters., Journal of the American Statistical Association , 89(427) :1012-1016, 1994. · Zbl 0804.62018
[10] Bickel, P.J., Minimax estimation of the mean of a normal distribution when the parameter space is restricted., The Annals of Statistics , 9(6) :1301-1309, 1981. · Zbl 0484.62013
[11] Bickel, P.J. and Freedman, D.A., Some asymptotic theory for the bootstrap., The Annals of Statistics , pages 1196-1217, 1981. · Zbl 0449.62034
[12] Bickel, P.J., Klaassen, A.J., Ritov, Y., and Wellner, J.A., Efficient and Adaptive Inference in Semi-Parametric Models . Johns Hopkins University Press, Baltimore, 1993. · Zbl 0786.62001
[13] Bickman, L., Kelley, S.D. and Athay, M., The technology of measurement feedback systems., Couple and Family Psychology: Research and Practice , 1(4):274-284, 2012.
[14] Blumenthal, S. and Cohen, A., Estimation of the larger of two normal means., Journal of the American Statistical Association , pages 861-876, 1968. · Zbl 0162.49705
[15] Busoniu, L., Babuska, R., De Schutter, B., and Ernst, D., Reinforcement Learning and Dynamic Programming Using Function Approximators . CRC Press, 2010.
[16] Casella, G. and Strawderman, W.E., Estimating a bounded normal mean., The Annals of Statistics , pages 870-878, 1981. · Zbl 0474.62010
[17] Chakraborty, B., Laber, E.B., and Zhao, Y., Inference for optimal dynamic treatment regimes using an adaptive m-out-of-n bootstrap scheme., Biometrics , TBA(TBA):TBA, 2013. · Zbl 1418.62182
[18] Chakraborty, B., Murphy, S., and Strecher, V., Inference for non-regular parameters in optimal dynamic treatment regimes., Statistical Methods in Medical Research , 19(3), 2009. · Zbl 1365.62411
[19] Chakraborty, B. and Moodie, E.E.M., Statistical Methods for Dynamic Treatment Regimes . Springer, 2013. · Zbl 1278.62169
[20] Chakraborty, B. and Murphy, S.A., Dynamic treatment regimes., Annual Review of Statistics and Its Application , 1(1):null, 2014. URL .
[21] Chen, J., Notes on the bias-variance trade-off phenomenon., A Festschrift for Herman Rubin: Institute of Mathematical Statistics , 45:207-217, 2004. · Zbl 1268.62029
[22] Cheng, X., Robust confidence intervals in nonlinear regression under weak identification., Job Market Paper , 2008.
[23] Csörgő, S. and Rosalsky, A., A survey of limit laws for bootstrapped sums., International Journal of Mathematics and Mathematical Statistics , 45 :2835-2861, 2003. · Zbl 1036.60018
[24] Davison, A.C. and Hinkley, D.V., Bootstrap Methods and Their Application , volume 1. Cambridge university press, 1997. · Zbl 0886.62001
[25] Dusseldorp, E. and Van Mechelen, I., Qualitative interaction trees: A tool to identify qualitative treatment-subgroup interactions., Statistics in Medicine , 2013.
[26] Foster, J.C., Taylor, J.M.G. and Ruberg, S.J. Subgroup identification from randomized clinical trial data., Statistics in Medicine , 30(24) :2867-2880, 2011.
[27] Goldberg, Y., Song, R., and Kosorok, M.R., Adaptive q-learning., From Probability to Statistics and Back: High-Dimensional Models and Processes , page 150, 2012.
[28] Gunter, L., Zhu, J., and Murphy, S.A., Variable selection for qualititative interactions., Statistical Methodology , 8(1):42-55, 2011. · Zbl 05898213
[29] Hamburg, M.A. and Collins, F.S., The path to personalized medicine., New England Journal of Medicine , 363(4):301-304, 2010.
[30] Henderson, R., Ansell, P., and Alshibani, D., Regret-regression for optimal dynamic treatment regimes., Biometrics , 66(4), 2009. · Zbl 1233.62180
[31] Hirano, K. and Porter, J., Impossibility results for nondifferentiable functionals. Mpra paper, University Library of Munich, Germany, 2009. URL, .
[32] Hirano, K. and Porter, J.R., Impossibility results for nondifferentiable functionals., Econometrica , 80(4) :1769-1790, 2012. · Zbl 1274.62240
[33] Janes, H., Brown, M.D., Pepe, M., and Huang, Y., Statistical methods for evaluating and comparing biomarkers for patient treatment selection, 2013.
[34] Kelly, J., Gooding, P., Pratt, D., Ainsworth, J., Welford, M., and Tarrier, N., Intelligent real-time therapy: Harnessing the power of machine learning to optimise the delivery of momentary cognitive-behavioural interventions., Journal of Mental Health , 21(4):404-414, 2012.
[35] Konda, V.R. and Tsitsiklis, J.N., Onactor-critic algorithms., SIAM Journal on Control and Optimization , 42(4) :1143-1166, 2003. · Zbl 1049.93095
[36] Kosorok, M.R., Introduction to Empirical Processes and Semiparametric Inference . Springer, 2008. · Zbl 1180.62137
[37] Laber, E.B., Linn, L.A., and Stefanski, L.A., Interactive model-building for \(q\)-learning., Biometrika , to appear, 2014. · Zbl 1306.62235
[38] Laber, E., Qian, M., Lizotte, D.J., and Murphy, S.A., Statistical inference in dynamic treatment regimes. arXiv preprint arXiv :1006.5831, 2010.
[39] Laber, E.B. and Murphy, S.A., Adaptive confidence intervals for the test error in classification., Journal of the American Statistical Association , 106(495):904-913, 2011. · Zbl 1229.62085
[40] Lavori, P.W. and Dawson, R., A design for testing clinical strategies: Biased adaptive within-subject randomization., Journal of the Royal Statistical Society: Series A (Statistics in Society) , 163(1):29-38, 2000.
[41] Leeb, H. and Poetscher, B.M., The finite-sample distribution of post-model-selection estimators and uniform versus nonuniform approximations., Econometric Theory , 19(1):100-142, 2003. · Zbl 1032.62011
[42] Leeb, H. and Pötscher, B.M., The finite-sample distribution of post-model-selection estimators and uniform versus nonuniform approximations., Econometric Theory , 19(1):100-142, 2003. · Zbl 1032.62011
[43] H. Leeb and Pötscher, B.M., Model selection and inference: Facts and fiction., Econometric Theory , 21(01):21-59, 2005. · Zbl 1085.62004
[44] Lei, H., Nahum-Shani, I., Lynch, K., Oslin, D., and Murphy, S.A., A “smart” design for building individualized treatment sequences., Annual Review of Clinical Psychology , 8:21-48, 2012.
[45] Liu, R.C. and Brown, L.D., Nonexistence of informative unbiased estimators in singular problems., Annals of Statistics , 21(1):1-13, 1993. · Zbl 0783.62026
[46] Marchand, E. and Strawderman, W.E., Estimation in restricted parameter spaces: A review., Lecture Notes-Monograph Series , pages 21-44, 2004. · Zbl 1268.62030
[47] Moodie, E.E.M., Richardson, T.S., and Stephens, D.A., Estimating optimal dynamic regimes: Correcting bias under the null., Biometrics , 63(2):447-455, 2010. · Zbl 1224.62139
[48] Murphy, S.A., An experimental design for the development of adaptive treatment strategies., Statistics in medicine , 24(10) :1455-1481, 2005a.
[49] Murphy, S.A., Van Der Laan, M.J., and Robins, J.M., Marginal mean models for dynamic regimes., Journal of the American Statistical Association , 96(456) :1410-1423, 2001. · Zbl 1051.62114
[50] Murphy, S.A., Optimal dynamic treatment regimes., Journal of the Royal Statistical Society, Series B , 65(2):331-366, 2003. · Zbl 1065.62006
[51] Murphy, S.A., A generalization error for Q-learning., Journal of Machine Learning Research , 6 :1073-1097, Jul 2005b. · Zbl 1222.68271
[52] Nahum-Shani, I., Qian, M., Almirall, D., Pelham, W.E., Gnagy, B., Fabiano, G.A., Waxmonsky, J.G., Yu, J., and Murphy, S.A., Experimental design and primary data analysis methods for comparing adaptive interventions., Psychological methods , 17(4):457, 2012a.
[53] Nahum-Shani, I., Qian, M., Almirall, D., Pelham, W.E., Gnagy, B., Fabiano, G.A., Waxmonsky, J.G., Yu, J., and Murphy, S.A., Q-learning: A data analysis method for constructing adaptive interventions., Psychological methods , 17(4):478, 2012b.
[54] Olshen, R.A., The conditional level of the F-test., Journal of the American Statistical Association , 68(343):692-698, 1973. · Zbl 0271.62068
[55] Orellana, L., Rotnitzky, A., and Robins, J., Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, part i: Main content., Int. Jrn. of Biostatistics , 6(2), 2010.
[56] The PSU Methodology Center, Nih program announcements, January 2014a. URL, .
[57] The PSU Methodology Center, Smart studies, January 2014b. URL, .
[58] Putterman, M.L., Markov Decision Processes . John Wiely and Sons, New York, 1994.
[59] Qian, M., Nahum-Shani, I., and Murphy, S.A., Dynamic treatment regimes. In, Modern Clinical Trial Analysis , pages 127-148. Springer, 2013.
[60] Robins, J., A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect., Mathematical Modelling , 7(9) :1393-1512, 1986. · Zbl 0614.62136
[61] Robins, J.M., Addendum to “A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect”., Computers & Mathematics with Applications , 14(9):923-945, 1987. · Zbl 0643.62062
[62] Robins, J.M., The analysis of randomized and non-randomized aids treatment trials using a new approach to causal inference in longitudinal studies., Health Service Research Methodology: A Focus on AIDS , 113:159, 1989.
[63] Robins, J.M., Information recovery and bias adjustment in proportional hazards regression analysis of randomized trials using surrogate markers. In, Proceedings of the Biopharmaceutical Section, American Statistical Association , volume 24, page 3. American Statistical Association, 1993.
[64] Robins, J.M., Causal inference from complex longitudinal data. In, Latent Variable Modeling and Applications to Causality , pages 69-117. Springer, 1997. · Zbl 0969.62072
[65] Robins, J.M., Testing and estimation of direct effects by reparameterizing directed acyclic graphs with structural nested models., Computation, Causation, and Discovery , pages 349-405, 1999.
[66] Robins, J.M., Marginal structural models. 1997 Proceedings of the American Statistical Association, Section on Bayesian Statistical Science, pp. 1-10, 1998.
[67] Robins, J.M., Optimal structural nested models for optimal sequential decisions. In, Proceedings of the Second Seattle Symposium in Biostatistics: Analysis of Correlated Data , 2004. · Zbl 1279.62024
[68] Robins, J.M., Orellana, L., and Rotnitzky, A., Estimation and extrapolation of optimal treatment and testing strategies., Statistics in Medicine , pages 4678-4721, 2008.
[69] Rubin, D.B., Bayesian inference for causal effects: The role of randomization., The Annals of Statistics , pages 34-58, 1978. · Zbl 0383.62021
[70] Schulte, P.J., Tsiatis, A.A., Laber, E.B., and Davidian, M., Q- and a-learning methods for estimating optimal dynamic treatment regimes. Technical Report, arXiv :1202.4177v2, arXiv.org, 2013. · Zbl 1331.62437
[71] Si, J., Barto, A.G., Powell, W.B., Wunsch, D.C., et al., Handbook of Learning and Approximate Dynamic Programming . IEEE Press Los Alamitos, 2004.
[72] Song, R., Wang, W., Zeng, D., and Kosorok, M., Penalized q-learning for dynamic treatment regimes. Technical Report, arXiv :1108.5338v1, arXiv.org, 2011. · Zbl 1415.62054
[73] Sutton, R.S., McAllester, D.A., Singh, S.P., and Mansour, Y., Policy gradient methods for reinforcement learning with function approximation. In, NIPS , volume 99, pages 1057-1063, 1999.
[74] Sutton, R.S. and Barto, A.G., Reinforcment Learning: An Introduction . The MIT Press, 1998.
[75] Szepesvári, C., Algorithms for reinforcement learning., Synthesis Lectures on Artificial Intelligence and Machine Learning , 4(1):1-103, 2010. · Zbl 1205.68320
[76] Tsiatis, A.A., Semiparametric Theory and Missing Data . Springer Verlag, 2006. · Zbl 1105.62002
[77] van der Laan, M.J., Causal effect models for intention to treat and realistic individualized treatment rules., 2006. · Zbl 1165.62357
[78] van der Laan, M.J. and Petersen, M.L., Causal effect models for realistic individualized treatment and intention to treat rules., International Journal of Biostatistics , 3(1):3, 2007. · Zbl 1165.62357
[79] Van der Vaart, A., On differentiable functionals., The Annals of Statistics , pages 178-204, 1991. · Zbl 0732.62035
[80] Van der Vaart, A. and Wellner, J., Weak Convergence and Empirical Processes: With Application to Statistics . Springer, 1996. · Zbl 0862.60002
[81] Watkins, C.J.C.H. and Dayan, P., Q-learning., Machine Learning , 8(3):279-292, 1992. · Zbl 0773.68062
[82] Wiering, M. and van Otterlo, M., Reinforcement Learning: State-of-the-art , volume 12. Springer, 2012.
[83] Zhang, B., Tsiatis, A.A., Laber, E.B., and Davidian, M., Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions., Biometrika , To appear, 2013. · Zbl 1284.62508
[84] Zhang, B., Tsiatis, A.A., Laber, E.B., and Davidian, M., A robust method for estimating optimal treatment regimes., Biometrics , 68(4) :1010-1018, 2012. · Zbl 1258.62116
[85] Zhao, Y., Zeng, D., Laber, E.B., and Kosorok, M.R., New statistical learning methods for estimating optimal dynamic treatment regimes., Under Review , 107(499) :1106-1118, 2013. · Zbl 1373.62557
[86] Zhao, Y., Zeng, D., Rush, A.J., and Kosorok, M.R., Estimating individualized treatment rules using outcome weighted learning., Journal of the American Statistical Association , 107(499) :1106-1118, 2012. · Zbl 1443.62396
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.