# zbMATH — the first resource for mathematics

Could Fisher, Jeffreys and Neyman have agreed on testing? (With comments and a rejoinder). (English) Zbl 1048.62006
Summary: Ronald Fisher advocated testing using p-values, Harold Jeffreys proposed use of objective posterior probabilities of hypotheses, and Jerzy Neyman recommended testing with fixed error probabilities. Each was quite critical of the other approaches. Most troubling for statistics and science is that the three approaches can lead to quite different practical conclusions.
This article focuses on discussion of the conditional frequentist approach to testing, which is argued to provide the basis for a methodological unification of the approaches of Fisher, Jeffreys and Neyman. The idea is to follow Fisher in using p-values to define the “strength of evidence” in data and to follow his approach of conditioning on strength of evidence; then follow Neyman by computing Type I and Type II error probabilities, but do so conditional on the strength of evidence in the data. The resulting conditional frequentist error probabilities equal the objective posterior probabilities of the hypotheses advocated by Jeffreys.

##### MSC:
 62A01 Foundations and philosophical topics in statistics 62-03 History of statistics 62F03 Parametric hypothesis testing 01A60 History of mathematics in the 20th century
##### Keywords:
conditional testing
Full Text:
##### References:
 [1] Barnett, V. (1999). Comparative Statistical Inference , 3rd ed. Wiley, New York. · Zbl 0593.62002 [2] Basu, D. (1975). Statistical information and likelihood (with discussion). Sankhyā Ser. A 37 1–71. · Zbl 0332.62005 [3] Basu, D. (1977). On the elimination of nuisance parameters. J. Amer. Statist. Assoc. 72 355–366. · Zbl 0395.62003 [4] Bayarri, M. J. and Berger, J. (2000). $$P$$-values for composite null models (with discussion). J. Amer. Statist. Assoc. 95 1127–1142, 1157–1170. · Zbl 1004.62022 [5] Berger, J. (1985a). Statistical Decision Theory and Bayesian Analysis , 2nd ed. Springer, New York. · Zbl 0572.62008 [6] Berger, J. (1985b). The frequentist viewpoint and conditioning. In Proc. Berkeley Conference in Honor of Jack Kiefer and Jerzy Neyman (L. Le Cam and R. Olshen, eds.) 1 15–44. Wadsworth, Belmont, CA. · Zbl 1373.62024 [7] Berger, J. and Berry, D. (1988). Statistical analysis and the illusion of objectivity. American Scientist 76 159–165. [8] Berger, J., Boukai, B. and Wang, Y. (1997). Unified frequentist and Bayesian testing of a precise hypothesis (with discussion). Statist. Sci. 12 133–160. · Zbl 0955.62527 [9] Berger, J., Boukai, B. and Wang, Y. (1999). Simultaneous Bayesian–frequentist sequential testing of nested hypotheses. Biometrika 86 79–92. · Zbl 0920.62103 [10] Berger, J., Brown, L. and Wolpert, R. (1994). A unified conditional frequentist and Bayesian test for fixed and sequential simple hypothesis testing. Ann. Statist. 22 1787–1807. JSTOR: · Zbl 0824.62002 [11] Berger, J. and Delampady, M. (1987). Testing precise hypotheses (with discussion). Statist. Sci. 2 317–352. JSTOR: · Zbl 0955.62545 [12] Berger, J. and Guglielmi, A. (2001). Bayesian and conditional frequentist testing of a parametric model versus nonparametric alternatives. J. Amer. Statist. Assoc. 96 174–184. · Zbl 1014.62029 [13] Berger, J. and Mortera, J. (1999). Default Bayes factors for non-nested hypothesis testing. J. Amer. Statist. Assoc. 94 542–554. · Zbl 0996.62018 [14] Berger, J. and Sellke, T. (1987). Testing a point null hypothesis: The irreconcilability of $$p$$ values and evidence (with discussion). J. Amer. Statist. Assoc. 82 112–139. · Zbl 0612.62022 [15] Berger, J. and Wolpert, R. L. (1988). The Likelihood Principle , 2nd ed. (with discussion). IMS, Hayward, CA. · Zbl 1060.62500 [16] Birnbaum, A. (1961). On the foundations of statistical inference: Binary experiments. Ann. Math. Statist. 32 414–435. · Zbl 0118.13704 [17] Bjørnstad, J. (1996). On the generalization of the likelihood function and the likelihood principle. J. Amer. Statist. Assoc. 91 791–806. · Zbl 0871.62006 [18] Braithwaite, R. B. (1953). Scientific Explanation . Cambridge Univ. Press. · Zbl 0052.00401 [19] Brown, L. D. (1978). A contribution to Kiefer’s theory of conditional confidence procedures. Ann. Statist. 6 59–71. JSTOR: · Zbl 0383.62008 [20] Carlson, R. (1976). The logic of tests of significance (with discussion). Philos. Sci. 43 116–128. [21] Casella, G. and Berger, R. (1987). Reconciling Bayesian and frequentist evidence in the one-sided testing problem (with discussion). J. Amer. Statist. Assoc. 82 106–111, 123– 139. · Zbl 0612.62021 [22] Cox, D. R. (1958). Some problems connected with statistical inference. Ann. Math. Statist. 29 357–372. · Zbl 0088.11702 [23] Dass, S. (2001). Unified Bayesian and conditional frequentist testing for discrete distributions. Sankhyā Ser. B 63 251– 269. · Zbl 1192.62046 [24] Dass, S. and Berger, J. (2003). Unified conditional frequentist and Bayesian testing of composite hypotheses. Scand. J. Statist. 30 193–210. · Zbl 1034.62009 [25] Delampady, M. and Berger, J. (1990). Lower bounds on Bayes factors for multinomial distributions, with application to chi-squared tests of fit. Ann. Statist. 18 1295–1316. JSTOR: · Zbl 0712.62027 [26] Edwards, W., Lindman, H. and Savage, L. J. (1963). Bayesian statistical inference for psychological research. Psychological Review 70 193–242. · Zbl 0173.22004 [27] Efron, B. and Gous, A. (2001). Scales of evidence for model selection: Fisher versus Jeffreys (with discussion). In Model Selection (P. Lahiri, ed.) 208–256. IMS, Hayward, CA. [28] Fisher, R. A. (1925). Statistical Methods for Research Workers . Oliver and Boyd, Edinburgh (10th ed., 1946). · JFM 51.0414.08 [29] Fisher, R. A. (1935). The logic of inductive inference (with discussion). J. Roy. Statist. Soc. 98 39–82. · Zbl 0011.03205 [30] Fisher, R. A. (1955). Statistical methods and scientific induction. J. Roy. Statist. Soc. Ser. B 17 69–78. · Zbl 0066.38008 [31] Fisher, R. A. (1973). Statistical Methods and Scientific Inference , 3rd ed. Macmillan, London. · Zbl 0281.62002 [32] Gibbons, J. and Pratt, J. (1975). $$P$$-values: Interpretation and methodology. Amer. Statist. 29 20–25. · Zbl 0361.62017 [33] Good, I. J. (1958). Significance tests in parallel and in series. J. Amer. Statist. Assoc. 53 799–813. · Zbl 0092.36205 [34] Good, I. J. (1992). The Bayes/non-Bayes compromise: A brief review. J. Amer. Statist. Assoc. 87 597–606. [35] Goodman, S. (1992). A comment on replication, $$p$$-values and evidence. Statistics in Medicine 11 875–879. [36] Goodman, S. (1993). $$P$$-values, hypothesis tests, and likelihood: Implications for epidemiology of a neglected historical debate. American Journal of Epidemiology 137 485–496. [37] Goodman, S. (1999a). Toward evidence-based medical statistics. 1: The $$p$$-value fallacy. Annals of Internal Medicine 130 995–1004. [38] Goodman, S. (1999b). Toward evidence-based medical statistics. 2: The Bayes factor. Annals of Internal Medicine 130 1005–1013. [39] Hacking, I. (1965). Logic of Statistical Inference. Cambridge Univ. Press. · Zbl 0133.41604 [40] Hall, P. and Selinger, B. (1986). Statistical significance: Balancing evidence against doubt. Austral. J. Statist. 28 354–370. · Zbl 0621.62002 [41] Hubbard, R. (2000). Minding one’s $$p$$’s and $$\alpha$$’s: Confusion in the reporting and interpretation of results of classical statistical tests in marketing research. Technical Report, College of Business and Public Administration, Drake Univ. [42] Jeffreys, H. (1961). Theory of Probability , 3rd ed. Oxford Univ. Press. · Zbl 0116.34904 [43] Johnstone, D. J. (1997). Comparative classical and Bayesian interpretations of statistical compliance tests in auditing. Accounting and Business Research 28 53–82. [44] Kalbfleish, J. D. and Sprott, D. A. (1973). Marginal and conditional likelihoods. Sankhyā Ser. A 35 311–328. · Zbl 0285.62001 [45] Kiefer, J. (1976). Admissibility of conditional confidence procedures. Ann. Math. Statist. 4 836–865. · Zbl 0353.62008 [46] Kiefer, J. (1977). Conditional confidence statements and confidence estimators (with discussion). J. Amer. Statist. Assoc. 72 789–827. · Zbl 0375.62023 [47] Kyburg, H. E., Jr. (1974). The Logical Foundations of Statistical Inference . Reidel, Boston. · Zbl 0335.02001 [48] Laplace, P. S. (1812). Théorie Analytique des Probabilités . Courcier, Paris. [49] Lehmann, E. (1993). The Fisher, Neyman–Pearson theories of testing hypotheses: One theory or two? J. Amer. Statist. Assoc. 88 1242–1249. · Zbl 0805.62023 [50] Matthews, R. (1998). The great health hoax. The Sunday Telegraph , September 13. [51] Morrison, D. E. and Henkel, R. E., eds. (1970). The Significance Test Controversy . A Reader . Aldine, Chicago. [52] Neyman, J. (1961). Silver jubilee of my dispute with Fisher. J. Operations Res. Soc. Japan 3 145–154. [53] Neyman, J. (1977). Frequentist probability and frequentist statistics. Synthese 36 97–131. · Zbl 0372.60002 [54] Neyman, J. and Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philos. Trans. Roy. Soc. London Ser. A 231 289–337. · Zbl 0006.26804 [55] Paulo, R. (2002a). Unified Bayesian and conditional frequentist testing in the one- and two-sample exponential distribution problem. Technical Report, Duke Univ. [56] Paulo, R. (2002b). Simultaneous Bayesian–frequentist tests for the drift of Brownian motion. Technical Report, Duke Univ. [57] Pearson, E. S. (1955). Statistical concepts in their relation to reality. J. Roy. Statist. Soc. Ser. B 17 204–207. · Zbl 0067.11401 [58] Pearson, E. S. (1962). Some thoughts on statistical inference. Ann. Math. Statist. 33 394–403. · Zbl 0109.13002 [59] Reid, N. (1995). The roles of conditioning in inference (with discussion). Statist. Sci. 10 138–157, 173–199. · Zbl 0955.62524 [60] Robins, J. M., van der Vaart, A. and Ventura, V. (2000). Asymptotic distribution of $$p$$ values in composite null models (with discussion). J. Amer. Statist. Assoc. 95 1143–1167, 1171–1172. · Zbl 1072.62522 [61] Royall, R. M. (1997). Statistical Evidence: A Likelihood Paradigm . Chapman and Hall, New York. · Zbl 0919.62004 [62] Savage, L. J. (1976). On rereading R. A. Fisher (with discussion). Ann. Statist. 4 441–500. JSTOR: · Zbl 0325.62008 [63] Seidenfeld, T. (1979). Philosophical Problems of Statistical Inference. Reidel, Boston. · Zbl 0487.62004 [64] Sellke, T., Bayarri, M. J. and Berger, J. O. (2001). Calibration of $$p$$-values for testing precise null hypotheses. Amer. Statist. 55 62–71. · Zbl 1182.62053 [65] Spielman, S. (1974). The logic of tests of significance. Philos. Sci. 41 211–226. [66] Spielman, S. (1978). Statistical dogma and the logic of significance testing. Philos. Sci. 45 120–135. [67] Sterne, J. A. C. and Davey Smith, G. (2001). Sifting the evidence—what’s wrong with significance tests? British Medical Journal 322 226–231. [68] Welch, B. and Peers, H. (1963). On formulae for confidence points based on integrals of weighted likelihoods. J. Roy. Statist. Soc. Ser. B 25 318–329. · Zbl 0117.14205 [69] Wolpert, R. L. (1996). Testing simple hypotheses. In Data Analysis and Information Systems (H. H. Bock and W. Polasek, eds.) 7 289–297. Springer, Heidelberg. · Zbl 0896.62002 [70] Zabell, S. (1992). R. A. Fisher and the fiducial argument. Statist. Sci. 7 369–387. JSTOR: · Zbl 0955.62521
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.