Zhang, Baqun; Tsiatis, Anastasios A.; Laber, Eric B.; Davidian, Marie Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. (English) Zbl 1284.62508 Biometrika 100, No. 3, 681-694 (2013). Summary: A dynamic treatment regime is a list of sequential decision rules for assigning treatments based on a patient’s history. Q- and A-learning are two main approaches for estimating the optimal regime, i.e., that yielding the most beneficial outcome in the patient population, using data from a clinical trial or observational study. Q-learning requires postulated regression models for the outcome, while A-learning involves models for that part of the outcome regression representing treatment contrasts and for treatment assignments. We propose an alternative to Q- and A-learning that maximizes a doubly robust augmented inverse probability weighted estimator for population mean outcome over a restricted class of regimes. Simulations demonstrate the method’s performance and robustness to model misspecification, which is a key concern. Cited in 19 Documents MSC: 62L10 Sequential statistical analysis 62C99 Statistical decision theory 62P10 Applications of statistics to biology and medical sciences; meta analysis 68T05 Learning and adaptive systems in artificial intelligence 62G35 Nonparametric robustness 65C60 Computational problems in statistics (MSC2010) Keywords:A-learning; double robustness; outcome regression; propensity score; Q-learning Software:qLearn PDF BibTeX XML Cite \textit{B. Zhang} et al., Biometrika 100, No. 3, 681--694 (2013; Zbl 1284.62508) Full Text: DOI