Posterior predictive \(p\)-values with Fisher randomization tests in noncompliance settings: test statistics vs discrepancy measures. (English) Zbl 1407.62075

Summary: In randomized experiments with noncompliance, one might wish to focus on compliers rather than on the overall sample. In this vein, D. B. Rubin [“More powerful randomization-based \(p\)-values in double-blind trials with non-compliance”, Stat. Med. 17, No. 3, 371–385 (1998; doi:10.1002/(sici)1097-0258(19980215)17:3<371::aid-sim768>3.0.co;2-o)] argued that testing for the complier average causal effect and averaging permutation-based \(p\)-values over the posterior distribution of the compliance types could increase power as compared to general intent-to-treat tests. The general scheme is a repeated two-step process: impute missing compliance types and conduct a permutation test with the completed data. In this paper, we explore this idea further, comparing the use of discrepancy measures – which depend on unknown but imputed parameters – to classical test statistics and contrasting different approaches for imputing the unknown compliance types. We also examine consequences of model misspecification in the imputation step, and discuss to what extent this additional modeling undercuts the advantage of permutation tests being model independent. We find that, especially for discrepancy measures, modeling choices can impact both power and validity. In particular, imputing missing compliance types under the null can radically reduce power, but not doing so can jeopardize validity. Fortunately, using covariates predictive of compliance type in the imputation can mitigate these results. We also compare this overall approach to Bayesian model-based tests, that is, tests that are directly derived from posterior credible intervals, under both correct and incorrect model specification.


62F15 Bayesian inference
62A01 Foundations and philosophical topics in statistics
62G10 Nonparametric hypothesis testing
Full Text: DOI arXiv Euclid


[1] Angrist, J. D., Imbens,G. W., & Rubin, D. B. (1996). Identification of causal effects using instrumental variables (with discussion). Journal of the American Statistical Association, 91, 444–472. · Zbl 0897.62130
[2] Fisher, R. A. (1925). Statistical Methods for Research Workers. 1st ed. Oliver and Boyd, Edinburgh. · JFM 51.0414.08
[3] Fisher, R. A. (1925). The arrangement of field experiments. Journal of the Ministry of Agriculture of Great Britain, 33, 503–513.
[4] Fisher, R. A. (1925). The design of experiments. Edinburgh: Oliver and Boyd.
[5] Frangakis, C. E. & Rubin, D. B. (2002). Principal stratification in causal inference. Biometrics, 58, 21–29. · Zbl 1209.62288
[6] Gelman, A., Meng, X. L., and Stern, H. S. (1996). Posterior predictive assessment of model fitness via realized discrepancies (with discussion). Statistica Sinica, 6, 733–807. · Zbl 0859.62028
[7] Gelman, A. (2013). Two simple examples for understanding posterior \(p\)-values whose distributions are far from uniform. Electronic Journal of Statistics, 7, 2595–2602. · Zbl 1294.62049
[8] Guttman, I. (1967). The use of the concept of a future observation in goodness-of-fit problems. Journal of the Royal Statistical Society B, 29(1) 83–100. · Zbl 0158.37305
[9] Imbens, G. W. & Angrist, J. D. (1994). Identification and estimation of local average treatment effects. Econometrica, 62, 467–476. · Zbl 0800.90648
[10] Meng, X. L. (1994a). Posterior predictive \(p\)-values. Annals of Statistics, 22, 1142–1160. · Zbl 0820.62027
[11] Meng, X. L. (1994b). Multiple-imputation inferences under uncongeniality. Statistical Science, 4, 538–573.
[12] Neyman, J. (1923). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Roczniki Nauk Rolniczych Tom X [in Polish]; translated in Statistical Science, 5, 465–480.
[13] Neyman, J. (1934). On two different aspects of the representative method: The method of stratified sampling and the method of purposive selection with discussion. Journal of the Royal Statistical Society, 97, 558–625. · Zbl 0010.07201
[14] Robins, J. M., Vaart, A., and Ventura, V. (2000). Asymptotic distribution of p values in composite null models. Journal of the American Statistical Association, 95, 1143–1156. · Zbl 1072.62522
[15] Rubin, B. D. (1974). Estimating causal effects of treatments in randomized and non randomized studies. Journal of Educational Psychology 66, 688–701.
[16] Rubin, B. D. (1978). Bayesian inference for causal effects. Annals of Statistics, 6, 34–58. · Zbl 0383.62021
[17] Rubin, D. B. (1980). Comment on “Randomization Analysis of Experimental Data in the Fisher Randomization Tes” by D. Basu. Journal of the American Statistical Association, 75, 591–593.
[18] Rubin, D. B. (1981). Estimation in parallel randomized experiments. Journal of Educational Statistics, 6(4), 377–401.
[19] Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Annals of Statistics, 12(4), 1151–1172. · Zbl 0555.62010
[20] Rubin, D. B. (1996a). Discussion of “Posterior predictive \(p\)-values?” by Gelman, A., Meng, X. L. and Stern, H.. Statistica Sinica, 6, 787–792.
[21] Rubin, D. B. (1996b). Multiple imputation after 18+ years (with discussion). Journal of the American Statistical Association, 91, 473–520. · Zbl 0869.62014
[22] Rubin, D. B. (1998). More powerful randomization-based \(p\)-values in double-blind trials with non-compliance. Statistics in Medicine, 17(3), 371–85.
[23] Tanner, M. A. & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation (with discussions). Journal of the American Statistical Association, 82, 528–550. · Zbl 0619.62029
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.