# zbMATH — the first resource for mathematics

Could Fisher, Jeffreys and Neyman have agreed on testing? (With comments and a rejoinder). (English) Zbl 1048.62006
Summary: Ronald Fisher advocated testing using p-values, Harold Jeffreys proposed use of objective posterior probabilities of hypotheses, and Jerzy Neyman recommended testing with fixed error probabilities. Each was quite critical of the other approaches. Most troubling for statistics and science is that the three approaches can lead to quite different practical conclusions.
This article focuses on discussion of the conditional frequentist approach to testing, which is argued to provide the basis for a methodological unification of the approaches of Fisher, Jeffreys and Neyman. The idea is to follow Fisher in using p-values to define the “strength of evidence” in data and to follow his approach of conditioning on strength of evidence; then follow Neyman by computing Type I and Type II error probabilities, but do so conditional on the strength of evidence in the data. The resulting conditional frequentist error probabilities equal the objective posterior probabilities of the hypotheses advocated by Jeffreys.

##### MSC:
 62A01 Foundations and philosophical topics in statistics 62-03 History of statistics 62F03 Parametric hypothesis testing 01A60 History of mathematics in the 20th century
##### Keywords:
conditional testing
Full Text:
##### References:
  Barnett, V. (1999). Comparative Statistical Inference , 3rd ed. Wiley, New York. · Zbl 0593.62002  Basu, D. (1975). Statistical information and likelihood (with discussion). Sankhyā Ser. A 37 1–71. · Zbl 0332.62005  Basu, D. (1977). On the elimination of nuisance parameters. J. Amer. Statist. Assoc. 72 355–366. · Zbl 0395.62003  Bayarri, M. J. and Berger, J. (2000). $$P$$-values for composite null models (with discussion). J. Amer. Statist. Assoc. 95 1127–1142, 1157–1170. · Zbl 1004.62022  Berger, J. (1985a). Statistical Decision Theory and Bayesian Analysis , 2nd ed. Springer, New York. · Zbl 0572.62008  Berger, J. (1985b). The frequentist viewpoint and conditioning. In Proc. Berkeley Conference in Honor of Jack Kiefer and Jerzy Neyman (L. Le Cam and R. Olshen, eds.) 1 15–44. Wadsworth, Belmont, CA. · Zbl 1373.62024  Berger, J. and Berry, D. (1988). Statistical analysis and the illusion of objectivity. American Scientist 76 159–165.  Berger, J., Boukai, B. and Wang, Y. (1997). Unified frequentist and Bayesian testing of a precise hypothesis (with discussion). Statist. Sci. 12 133–160. · Zbl 0955.62527  Berger, J., Boukai, B. and Wang, Y. (1999). Simultaneous Bayesian–frequentist sequential testing of nested hypotheses. Biometrika 86 79–92. · Zbl 0920.62103  Berger, J., Brown, L. and Wolpert, R. (1994). A unified conditional frequentist and Bayesian test for fixed and sequential simple hypothesis testing. Ann. Statist. 22 1787–1807. JSTOR: · Zbl 0824.62002  Berger, J. and Delampady, M. (1987). Testing precise hypotheses (with discussion). Statist. Sci. 2 317–352. JSTOR: · Zbl 0955.62545  Berger, J. and Guglielmi, A. (2001). Bayesian and conditional frequentist testing of a parametric model versus nonparametric alternatives. J. Amer. Statist. Assoc. 96 174–184. · Zbl 1014.62029  Berger, J. and Mortera, J. (1999). Default Bayes factors for non-nested hypothesis testing. J. Amer. Statist. Assoc. 94 542–554. · Zbl 0996.62018  Berger, J. and Sellke, T. (1987). Testing a point null hypothesis: The irreconcilability of $$p$$ values and evidence (with discussion). J. Amer. Statist. Assoc. 82 112–139. · Zbl 0612.62022  Berger, J. and Wolpert, R. L. (1988). The Likelihood Principle , 2nd ed. (with discussion). IMS, Hayward, CA. · Zbl 1060.62500  Birnbaum, A. (1961). On the foundations of statistical inference: Binary experiments. Ann. Math. Statist. 32 414–435. · Zbl 0118.13704  Bjørnstad, J. (1996). On the generalization of the likelihood function and the likelihood principle. J. Amer. Statist. Assoc. 91 791–806. · Zbl 0871.62006  Braithwaite, R. B. (1953). Scientific Explanation . Cambridge Univ. Press. · Zbl 0052.00401  Brown, L. D. (1978). A contribution to Kiefer’s theory of conditional confidence procedures. Ann. Statist. 6 59–71. JSTOR: · Zbl 0383.62008  Carlson, R. (1976). The logic of tests of significance (with discussion). Philos. Sci. 43 116–128.  Casella, G. and Berger, R. (1987). Reconciling Bayesian and frequentist evidence in the one-sided testing problem (with discussion). J. Amer. Statist. Assoc. 82 106–111, 123– 139. · Zbl 0612.62021  Cox, D. R. (1958). Some problems connected with statistical inference. Ann. Math. Statist. 29 357–372. · Zbl 0088.11702  Dass, S. (2001). Unified Bayesian and conditional frequentist testing for discrete distributions. Sankhyā Ser. B 63 251– 269. · Zbl 1192.62046  Dass, S. and Berger, J. (2003). Unified conditional frequentist and Bayesian testing of composite hypotheses. Scand. J. Statist. 30 193–210. · Zbl 1034.62009  Delampady, M. and Berger, J. (1990). Lower bounds on Bayes factors for multinomial distributions, with application to chi-squared tests of fit. Ann. Statist. 18 1295–1316. JSTOR: · Zbl 0712.62027  Edwards, W., Lindman, H. and Savage, L. J. (1963). Bayesian statistical inference for psychological research. Psychological Review 70 193–242. · Zbl 0173.22004  Efron, B. and Gous, A. (2001). Scales of evidence for model selection: Fisher versus Jeffreys (with discussion). In Model Selection (P. Lahiri, ed.) 208–256. IMS, Hayward, CA.  Fisher, R. A. (1925). Statistical Methods for Research Workers . Oliver and Boyd, Edinburgh (10th ed., 1946). · JFM 51.0414.08  Fisher, R. A. (1935). The logic of inductive inference (with discussion). J. Roy. Statist. Soc. 98 39–82. · Zbl 0011.03205  Fisher, R. A. (1955). Statistical methods and scientific induction. J. Roy. Statist. Soc. Ser. B 17 69–78. · Zbl 0066.38008  Fisher, R. A. (1973). Statistical Methods and Scientific Inference , 3rd ed. Macmillan, London. · Zbl 0281.62002  Gibbons, J. and Pratt, J. (1975). $$P$$-values: Interpretation and methodology. Amer. Statist. 29 20–25. · Zbl 0361.62017  Good, I. J. (1958). Significance tests in parallel and in series. J. Amer. Statist. Assoc. 53 799–813. · Zbl 0092.36205  Good, I. J. (1992). The Bayes/non-Bayes compromise: A brief review. J. Amer. Statist. Assoc. 87 597–606.  Goodman, S. (1992). A comment on replication, $$p$$-values and evidence. Statistics in Medicine 11 875–879.  Goodman, S. (1993). $$P$$-values, hypothesis tests, and likelihood: Implications for epidemiology of a neglected historical debate. American Journal of Epidemiology 137 485–496.  Goodman, S. (1999a). Toward evidence-based medical statistics. 1: The $$p$$-value fallacy. Annals of Internal Medicine 130 995–1004.  Goodman, S. (1999b). Toward evidence-based medical statistics. 2: The Bayes factor. Annals of Internal Medicine 130 1005–1013.  Hacking, I. (1965). Logic of Statistical Inference. Cambridge Univ. Press. · Zbl 0133.41604  Hall, P. and Selinger, B. (1986). Statistical significance: Balancing evidence against doubt. Austral. J. Statist. 28 354–370. · Zbl 0621.62002  Hubbard, R. (2000). Minding one’s $$p$$’s and $$\alpha$$’s: Confusion in the reporting and interpretation of results of classical statistical tests in marketing research. Technical Report, College of Business and Public Administration, Drake Univ.  Jeffreys, H. (1961). Theory of Probability , 3rd ed. Oxford Univ. Press. · Zbl 0116.34904  Johnstone, D. J. (1997). Comparative classical and Bayesian interpretations of statistical compliance tests in auditing. Accounting and Business Research 28 53–82.  Kalbfleish, J. D. and Sprott, D. A. (1973). Marginal and conditional likelihoods. Sankhyā Ser. A 35 311–328. · Zbl 0285.62001  Kiefer, J. (1976). Admissibility of conditional confidence procedures. Ann. Math. Statist. 4 836–865. · Zbl 0353.62008  Kiefer, J. (1977). Conditional confidence statements and confidence estimators (with discussion). J. Amer. Statist. Assoc. 72 789–827. · Zbl 0375.62023  Kyburg, H. E., Jr. (1974). The Logical Foundations of Statistical Inference . Reidel, Boston. · Zbl 0335.02001  Laplace, P. S. (1812). Théorie Analytique des Probabilités . Courcier, Paris.  Lehmann, E. (1993). The Fisher, Neyman–Pearson theories of testing hypotheses: One theory or two? J. Amer. Statist. Assoc. 88 1242–1249. · Zbl 0805.62023  Matthews, R. (1998). The great health hoax. The Sunday Telegraph , September 13.  Morrison, D. E. and Henkel, R. E., eds. (1970). The Significance Test Controversy . A Reader . Aldine, Chicago.  Neyman, J. (1961). Silver jubilee of my dispute with Fisher. J. Operations Res. Soc. Japan 3 145–154.  Neyman, J. (1977). Frequentist probability and frequentist statistics. Synthese 36 97–131. · Zbl 0372.60002  Neyman, J. and Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philos. Trans. Roy. Soc. London Ser. A 231 289–337. · Zbl 0006.26804  Paulo, R. (2002a). Unified Bayesian and conditional frequentist testing in the one- and two-sample exponential distribution problem. Technical Report, Duke Univ.  Paulo, R. (2002b). Simultaneous Bayesian–frequentist tests for the drift of Brownian motion. Technical Report, Duke Univ.  Pearson, E. S. (1955). Statistical concepts in their relation to reality. J. Roy. Statist. Soc. Ser. B 17 204–207. · Zbl 0067.11401  Pearson, E. S. (1962). Some thoughts on statistical inference. Ann. Math. Statist. 33 394–403. · Zbl 0109.13002  Reid, N. (1995). The roles of conditioning in inference (with discussion). Statist. Sci. 10 138–157, 173–199. · Zbl 0955.62524  Robins, J. M., van der Vaart, A. and Ventura, V. (2000). Asymptotic distribution of $$p$$ values in composite null models (with discussion). J. Amer. Statist. Assoc. 95 1143–1167, 1171–1172. · Zbl 1072.62522  Royall, R. M. (1997). Statistical Evidence: A Likelihood Paradigm . Chapman and Hall, New York. · Zbl 0919.62004  Savage, L. J. (1976). On rereading R. A. Fisher (with discussion). Ann. Statist. 4 441–500. JSTOR: · Zbl 0325.62008  Seidenfeld, T. (1979). Philosophical Problems of Statistical Inference. Reidel, Boston. · Zbl 0487.62004  Sellke, T., Bayarri, M. J. and Berger, J. O. (2001). Calibration of $$p$$-values for testing precise null hypotheses. Amer. Statist. 55 62–71. · Zbl 1182.62053  Spielman, S. (1974). The logic of tests of significance. Philos. Sci. 41 211–226.  Spielman, S. (1978). Statistical dogma and the logic of significance testing. Philos. Sci. 45 120–135.  Sterne, J. A. C. and Davey Smith, G. (2001). Sifting the evidence—what’s wrong with significance tests? British Medical Journal 322 226–231.  Welch, B. and Peers, H. (1963). On formulae for confidence points based on integrals of weighted likelihoods. J. Roy. Statist. Soc. Ser. B 25 318–329. · Zbl 0117.14205  Wolpert, R. L. (1996). Testing simple hypotheses. In Data Analysis and Information Systems (H. H. Bock and W. Polasek, eds.) 7 289–297. Springer, Heidelberg. · Zbl 0896.62002  Zabell, S. (1992). R. A. Fisher and the fiducial argument. Statist. Sci. 7 369–387. JSTOR: · Zbl 0955.62521
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.