A paradox from randomization-based causal inference. (English) Zbl 1442.62014

Summary: Under the potential outcomes framework, causal effects are defined as comparisons between potential outcomes under treatment and control. To infer causal effects from randomized experiments, J. Splawa-Neyman [Stat. Sci. 5, No. 4, 465–472 (1990; Zbl 0955.01560)] proposed to test the null hypothesis of zero average causal effect (Neyman’s null), and Fisher proposed to test the null hypothesis of zero individual causal effect (Fisher’s null). Although the subtle difference between Neyman’s null and Fisher’s null has caused a lot of controversies and confusions for both theoretical and practical statisticians, a careful comparison between the two approaches has been lacking in the literature for more than eighty years. We fill this historical gap by making a theoretical comparison between them and highlighting an intriguing paradox that has not been recognized by previous researchers. Logically, Fisher’s null implies Neyman’s null. It is therefore surprising that, in actual completely randomized experiments, rejection of Neyman’s null does not imply rejection of Fisher’s null for many realistic situations, including the case with constant causal effect. Furthermore, we show that this paradox also exists in other commonly-used experiments, such as stratified experiments, matched-pair experiments and factorial experiments. Asymptotic analyses, numerical examples and real data examples all support this surprising phenomenon. Besides its historical and theoretical importance, this paradox also leads to useful practical implications for modern researchers.


62A01 Foundations and philosophical topics in statistics
62G10 Nonparametric hypothesis testing
62G20 Asymptotic properties of nonparametric inference
62F03 Parametric hypothesis testing
62E20 Asymptotic distribution theory in statistics
62K15 Factorial statistical designs


Zbl 0955.01560
Full Text: DOI arXiv Euclid


[1] Agresti, A. and Min, Y. (2004). Effects and non-effects of paired identical observations in comparing proportions with binary matched-pairs data. Stat. Med.23 65-75.
[2] Angrist, J. D. and Pischke, J. S. (2008). Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton Univ. Press, Princeton, NJ. · Zbl 1159.62090
[3] Anscombe, F. J. (1948). The validity of comparative experiments. J. Roy. Statist. Soc. Ser. A 111 181-200; discussion, 200-211.
[4] Aronow, P. M., Green, D. P. and Lee, D. K. K. (2014). Sharp bounds on the variance in randomized experiments. Ann. Statist.42 850-871. · Zbl 1305.62024
[5] Barnard, G. A. (1947). Significance tests for \(2× 2\) tables. Biometrika 34 123-138. · Zbl 0029.15603
[6] Box, G. E. P. (1992). Teaching engineers experimental design with a paper helicopter. Qual. Eng.4 453-459.
[7] Chung, E. and Romano, J. P. (2013). Exact and asymptotically robust permutation tests. Ann. Statist.41 484-507. · Zbl 1267.62064
[8] Cox, D. R. (1958). The interpretation of the effects of non-additivity in the Latin square. Biometrika 45 69-73. · Zbl 0087.14902
[9] Cox, D. R. (1970). The Analysis of Binary Data. Methuen & Co., Ltd., London. · Zbl 0199.53301
[10] Cox, D. R. (1992). Planning of Experiments. Wiley, New York. Reprint of the 1958 original. · Zbl 1064.62546
[11] Cox, D. R. (2012). Statistical causality: Some historical remarks. In Causality: Statistical Perspectives and Applications (C. Berzuini, P. Dawid and L. Bernardinelli, eds.) 1-5. Wiley, New York.
[12] Dasgupta, T., Pillai, N. S. and Rubin, D. B. (2015). Causal inference from \(2^K\) factorial designs by using potential outcomes. J. R. Stat. Soc. Ser. B. Stat. Methodol.77 727-753. · Zbl 1414.62337 · doi:10.1111/rssb.12085
[13] Ding, P. (2017). Supplement to “A paradox from randomization-based causal inference.” DOI:10.1214/16-STS571SUPP. · Zbl 1442.62014
[14] Ding, P. and Dasgupta, T. (2016). A potential tale of two-by-two tables from completely randomized experiments. J. Amer. Statist. Assoc.111 157-168.
[15] Ding, P., Feller, A. and Miratrix, L. W. (2016). Randomization inference for treatment effect variation. J. R. Stat. Soc. Ser. B. Stat. Methodol.78 655-671. · Zbl 1414.62146
[16] Eberhardt, K. R. and Fligner, M. A. (1977). Comparison of two tests for equality of two proportions. Amer. Statist.31 151-155.
[17] Eden, T. and Yates, F. (1933). On the validity of Fisher’s \(z\)-test when applied to an actual example of non-normal data. J. Agric. Sci.23 6-17.
[18] Edgington, E. S. and Onghena, P. (2007). Randomization Tests, 4th ed. Chapman & Hall/CRC, Boca Raton, FL. With 1 CD-ROM (Windows). · Zbl 1291.62009
[19] Fienberg, S. E. and Tanur, J. M. (1996). Reconsidering the fundamental contributions of Fisher and Neyman on experimentation and sampling. Int. Stat. Rev.64 237-253. · Zbl 0899.62003
[20] Fisher, R. A. (1926). The arrangement of field experiments. Journal of the Ministry of Agriculture of Great Britain 33 503-513.
[21] Fisher, R. A. (1935a). The Design of Experiments, 1st ed. Oliver and Boyd, Edinburgh.
[22] Fisher, R. A. (1935b). Comment on “Statistical problems in agricultural experimentation”. Suppl. J. R. Stat. Soc.2 154-157. 173.
[23] Freedman, D. A. (2008). On regression adjustments to experimental data. Adv. in Appl. Math.40 180-193. · Zbl 1130.62003
[24] Gail, M. H., Mark, S. D., Carroll, R. J., Green, S. B. and Pee, D. (1996). On design considerations and randomization-based inference for community intervention trials. Stat. Med.15 1069-1092.
[25] Greenland, S. (1991). On the logical justification of conditional tests for two-by-two contingency tables. Amer. Statist.45 248-251.
[26] Hájek, J. (1960). Limiting distributions in simple random sampling from a finite population. Magyar Tud. Akad. Mat. Kutató Int. Közl.5 361-374. · Zbl 0102.15001
[27] Hinkelmann, K. and Kempthorne, O. (2008). Design and Analysis of Experiments, Vol. 1: Introduction to Experimental Design, 2nd ed. Wiley, New York. · Zbl 1146.62054
[28] Hodges, J. L. Jr. and Lehmann, E. L. (1964). Basic Concepts of Probability and Statistics. Holden-Day, Inc., San Francisco, CA-London-Amsterdam. · Zbl 0131.34706
[29] Hoeffding, W. (1952). The large-sample power of tests based on permutations of observations. Ann. Math. Stat.23 169-192. · Zbl 0046.36403
[30] Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66), Vol. I: Statistics 221-233. Univ. California Press, Berkeley, CA. · Zbl 0212.21504
[31] Imai, K. (2008). Variance identification and efficiency analysis in randomized experiments under the matched-pair design. Stat. Med.27 4857-4873.
[32] Imbens, G. W. and Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge Univ. Press, New York. · Zbl 1355.62002
[33] Janssen, A. (1997). Studentized permutation tests for non-i.i.d. hypotheses and the generalized Behrens-Fisher problem. Statist. Probab. Lett.36 9-21. · Zbl 1064.62526
[34] Kempthorne, O. (1952). The Design and Analysis of Experiments. Wiley, New York; Chapman & Hall, London. · Zbl 0049.09901
[35] Kempthorne, O. (1955). The randomization theory of experimental inference. J. Amer. Statist. Assoc.50 946-967.
[36] Lang, J. B. (2015). A closer look at testing the “no-treatment-effect” hypothesis in a comparative experiment. Statist. Sci.30 352-371. · Zbl 1332.62065
[37] Lehmann, E. L. (1999). Elements of Large-Sample Theory. Springer, New York. · Zbl 0914.62001
[38] Li, X. and Ding, P. (2016). Exact confidence intervals for the average causal effect on a binary outcome. Stat. Med.35 957-960.
[39] Li, X. and Ding, P. (2017). General forms of finite population central limit theorems with applications to causal inference. J. Amer. Statist. Assoc. To appear.
[40] Lin, W. (2013). Agnostic notes on regression adjustments to experimental data: Reexamining Freedman’s critique. Ann. Appl. Stat.7 295-318. · Zbl 1454.62217
[41] Lin, W., Halpern, S. D., Prasad Kerlin, M. and Small, D. S. (2017). A “placement of death” approach for studies of treatment effects on ICU length of stay. Stat. Methods Med. Res.26 292-311. · doi:10.1177/0962280214545121
[42] Nelsen, R. B. (2006). An Introduction to Copulas, 2nd ed. Springer, New York. · Zbl 1152.62030
[43] Neuhaus, G. (1993). Conditional rank tests for the two-sample problem under random censorship. Ann. Statist.21 1760-1779. · Zbl 0793.62027
[44] Neyman, J. (1935). Statistical problems in agricultural experimentation (with discussion). Suppl. J. R. Stat. Soc.2 107-180. · JFM 63.1103.02
[45] Neyman, J. (1937). Outline of a theory of statistical estimation based on the classical theory of probability. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Sci.236 333-380. · Zbl 0017.12403
[46] Neyman, J. (1990). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Statist. Sci.5 465-472. Translated from the 1923 Polish original and edited by D. M. Dabrowska and T. P. Speed. · Zbl 0955.01560
[47] Pauly, M., Brunner, E. and Konietschke, F. (2015). Asymptotic permutation tests in general factorial designs. J. R. Stat. Soc. Ser. B. Stat. Methodol.77 461-473. · Zbl 1414.62339
[48] Pitman, E. J. G. (1937). Significance tests which may be applied to samples from any populations. Suppl. J. R. Stat. Soc.4 119-130. · Zbl 0019.03502
[49] Pitman, E. J. G. (1938). Significance tests which can be applied to samples from any populations. III. The analysis of variance test. Biometrika 29 322-335. · Zbl 0018.22601
[50] Reid, C. (1982). Neyman—From Life. Springer, New York. · Zbl 0517.01034
[51] Rigdon, J. and Hudgens, M. G. (2015). Randomization inference for treatment effects on a binary outcome. Stat. Med.34 924-935.
[52] Robbins, H. (1977). A fundamental question of practical statistics. Amer. Statist.31 97.
[53] Robins, J. M. (1988). Confidence intervals for causal parameters. Stat. Med.7 773-785.
[54] Rosenbaum, P. R. (2002). Observational Studies, 2nd ed. Springer, New York. · Zbl 0985.62091
[55] Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol.66 688-701.
[56] Rubin, D. B. (1980). Comment on “Randomization analysis of experimental data: The Fisher randomization test” by D. Basu. J. Amer. Statist. Assoc.75 591-593.
[57] Rubin, D. B. (1990). Comment on J. Neyman and causal inference in experiments and observational studies: “On the application of probability theory to agricultural experiments. Essay on principles. Section 9” [Ann. Agric. Sci.10 (1923), 1-51]. Statist. Sci.5 472-480. · Zbl 0955.01559
[58] Rubin, D. B. (2004). Teaching statistical inference for causal effects in experiments and observational studies. J. Educ. Behav. Stat.29 343-367.
[59] Sabbaghi, A. and Rubin, D. B. (2014). Comments on the Neyman-Fisher controversy and its consequences. Statist. Sci.29 267-284. · Zbl 1332.62007
[60] Samii, C. and Aronow, P. M. (2012). On equivalencies between design-based and regression-based variance estimators for randomized experiments. Statist. Probab. Lett.82 365-370. · Zbl 1237.62007 · doi:10.1016/j.spl.2011.10.024
[61] Scheffé, H. (1959). The Analysis of Variance. Wiley, New York; Chapman & Hall, London. · Zbl 0086.34603
[62] Schochet, P. Z. (2010). Is regression adjustment supported by the Neyman model for causal inference? J. Statist. Plann. Inference 140 246-259. · Zbl 1178.62079
[63] Welch, B. L. (1937). On the \(z\)-test in randomized blocks and Latin squares. Biometrika 29 21-52. · Zbl 0017.12602
[64] White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48 817-838. · Zbl 0459.62051
[65] Wilk, M. B. and Kempthorne, O. (1957). Non-additivities in a Latin square design. J. Amer. Statist. Assoc.52 218-236. · Zbl 0084.36103
[66] Wu, C. F. J. and Hamada, M. S. (2009). Experiments: Planning, Analysis, and Optimization, 2nd ed. Wiley, Hoboken, NJ. · Zbl 1229.62100
[67] Yates, F.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.