Replication, statistical consistency, and publication bias. (English) Zbl 1285.91108

The paper starts with exposing recurrent flaws in psychological science that lead some critics to announce a scientific crisis. Some researchers, either fraudulently or by ignorance, behave in such a way as to invalidate their conclusions. The most common practices include withholding null results and only reporting conclusive experiments, selecting the size of the sample while running the experiment, and multiple testing. The pressure to publish (or perish) and the need for “sexy papers” in order to convince leading editors encourage these practices. The author then details how one can, a posteriori, test a set of experimental results for consistency within the framework of null hypothesis significance test (NHST). The basic idea is that even if a conclusion is true, one should expect to get null findings within a set of experiments. The number of null findings one should expect in a series of NHST depends on the power or the tests used, which may be assessed from the data. The author describes the main characteristics of the resulting consistency test and reminds the reader that many papers published in leading journals failed to pass the consistency test. Last, he replies to a series of criticisms he faced after publishing evidence that many often-cited papers, including a claimed proof of precognition by Bem, Piff’s experimental study associating wealth with poor moral behavior, and a few more.


91E99 Mathematical psychology
62P15 Applications of statistics to psychology


pwr; R; BUGS
Full Text: DOI


[1] Anscombe, F. J., Fixed-sample-size analysis of sequential observations, Biometrics, 10, 1, 89-100, (1954) · Zbl 0058.12902
[2] Observation of a new particle in the search for the standard model Higgs boson with the ATLAS detector at the LHC, Physics Letters B, 716, 1, 1-29, (2012), arXiv:1207.7214
[3] Balcetis, E.; Dunning, D., Wishful seeing: more desired objects are seen as closer, Psychological Science, 21, 1, 147-152, (2010)
[4] Balcetis, E.; Dunning, D., A false-positive error in search in selective reporting: a refutation of francis, \(i\)-Perception, 3, (2012), Author response
[5] Begg, C. B.; Mazumdar, M., Operating characteristics of a rank correlation test for publication bias, Biometrics, 50, 1088-1101, (1994) · Zbl 0825.62457
[6] Bem, D. J., Feeling the future: experimental evidence for anomalous retroactive influences on cognition and affect, Journal of Personality and Social Psychology, 100, 407-425, (2011)
[7] Bones, A. K., We knew the future all along, Perspectives on Psychological Science, 7, 3, 307-309, (2012)
[8] Borenstein, M.; Hedges, L. V.; Higgins, J. P.T.; Rothstein, H. R., Introduction to meta-analysis, (2009), Wiley Chichester, UK · Zbl 1178.62001
[9] Champely, S. (2009). Pwr: Basic functions for power analysis. R package version 1.1.1. http://CRAN.R-project.org/package=pwr.
[10] Observation of a new boson at a mass of 125 gev with the CMS experiment at the LHC, Physics Letters B, 716, 1, 30-61, (2012), arXiv:1207.7235
[11] Cohen, J., Statistical power analysis for the behavioral sciences, (1988), Erlbaum Hillsdale, NJ · Zbl 0747.62110
[12] Cumming, G., Replication and \(p\) intervals: \(p\) values predict the future only vaguely, but confidence intervals do much better, Perspectives on Psychological Science, 3, 286-300, (2008)
[13] Cumming, G., Understanding the new statistics: effect sizes, confidence intervals, and meta-analysis, (2012), Routledge New York
[14] Duval, S.; Tweedie, R., Trim and fill: a simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis, Biometrics, 56, 455-463, (2000) · Zbl 1060.62600
[15] Enserink, M., Dutch university sacks social psychologist over faked data, Science, (2011)
[16] Fanelli, D., Positive results increase down the hierarchy of the sciences, PLoS ONE, 5, 4, e10068, (2010)
[17] Francis, G., Too good to be true: publication bias in two prominent studies from experimental psychology, Psychonomic Bulletin & Review, 19, 151-156, (2012)
[18] Francis, G., The same old new look: publication bias in a study of wishful seeing, \(i\)-Perception, 3, 176-178, (2012)
[19] Francis, G., Response to author: some clarity about publication bias and wishful seeing, \(i\)-Perception, 3, (2012)
[20] Francis, G., Evidence that publication bias contaminated studies relating social class and unethical behavior, Proceedings of the National Academy of Sciences, 109, E1587, (2012)
[21] Francis, G. (2012e). Checking the counterarguments confirms that publication bias contaminated studies relating social class and unethical behavior. Downloaded from http://www1.psych.purdue.edu/ gfrancis/Publications/FrancisRebuttal2012.pdf.
[22] Francis, G., Publication bias and the failure of replication in experimental psychology, Psychonomic Bulletin & Review, (2012)
[23] Francis, G., The psychology of replication and replication in psychology, Perspectives on Psychological Science, 7, 6, 580-589, (2012)
[24] Francis, G., Publication bias in red, rank, and romance in women viewing men by elliot et al. (2010), Journal of Experimental Psychology: General, 142, 1, 292-296, (2013)
[25] Galak, J.; LeBoeuf, R. A.; Nelson, L. D.; Simmons, J. P., Correcting the past: failures to replicate psi, Journal of Personality and Social Psychology, 103, 6, 933-948, (2012)
[26] Galak, J.; Meyvis, T., The pain was greater if it will happen again: the effect of anticipated continuation on retrospective discomfort, Journal of Experimental Psychology: General, 140, 63-75, (2011)
[27] Galak, J.; Meyvis, T., You could have just asked: reply to francis (2012), Perspectives on Psychological Science, 7, 6, 595-596, (2012)
[28] Gillett, R., Post hoc power analysis, Journal of Applied Psychology, 79, 783-785, (1994)
[29] Hedges, L. V., Distribution theory for glass’s estimator of effect size and related estimators, Journal of Educational Statistics, 6, 107-128, (1981)
[30] Hedges, L. V.; Olkin, I., Statistical methods for meta-analysis, (1985), Academic Press New York · Zbl 0666.62002
[31] Hoenig, J. M.; Heisey, D. M., The abuse of power: the pervasive fallacy of power calculations for data analysis, American Statistician, 55, 19-24, (2001)
[32] Ioannidis, J. P.A., Why most discovered true associations are inflated, Epidemiology, 19, 640-648, (2008)
[33] Ioannidis, J. P.A.; Trikalinos, T. A., An exploratory test for an excess of significant findings, Clinical Trials, 4, 245-253, (2007)
[34] John, L. K.; Loewenstein, G.; Prelec, D., Measuring the prevalence of questionable research practices with incentives for truth-telling, Psychological Science, 23, 524-532, (2012)
[35] Johnson, C. S.; Smeesters, D.; Wheeler, S. C., Retraction of Johnson, smeesters, and Wheeler (2012), Journal of Personality and Social Psychology, 103, 4, 605, (2012)
[36] Johnson, V.; Yuan, Y., Comments on an exploratory test for an excess of significant findings by JPA loannidis and TA trikalinos, Clinical Trials, 4, 254-255, (2007)
[37] Kelley, K., Confidence intervals for standardized effect sizes: theory, application, and implementation, Journal of Statistical Software, 20, (2007)
[38] Kerr, N. L., Harking: hypothesizing after the results are known, Personality and Social Psychology Review, 2, 196-217, (1998)
[39] Kim, J.; Francis, G., Color selection, color capture, and afterimage filling-in, Journal of Vision, 11, 3, 22, (2011)
[40] Kline, R. B., Beyond significance testing: reforming data analysis methods in behavioral research, (2004), American Psychological Association Washington, DC
[41] Kruschke, J. K., Bayesian data analysis, Wiley Interdisciplinary Reviews: Cognitive Science, 1, 5, 658-676, (2010)
[42] Kruschke, J. K., Doing Bayesian data analysis: a tutorial with R and BUGS, (2010), Academic Press/Elsevier Science
[43] Lane, D. M.; Dunlap, W. P., Estimating effect size: bias resulting from the significance criterion in editorial decisions, British Journal of Mathematical and Statistical, Psychology, 31, 107-112, (1978)
[44] Meehl, P. E., Theoretical risks and tabular asterisks: sir karl, sir ronald, and the slow progress of soft psychology, Journal of Consulting and Clinical Psychology, 46, 806-834, (1978)
[45] Miller, J., What is the probability of replicating a statistically significant effect?, Psychonomic Bulletin & Review, 16, 617-640, (2009)
[46] Pashler, H.; Wagenmakers, E.-J., Editors’ introduction to the special section on replicability in psychological science: a crisis of confidence?, Perspectives on Psychological Science, 7, 528-530, (2012)
[47] Peters, J. L.; Sutton, A. J.; Jones, D. R.; Abrams, K. R.; Rushton, L., Performance of the trim and fill method in the presence of publication bias and between-study heterogeneity, Statistics in Medicine, 26, 4544-4562, (2007)
[48] Phillips, C. V., Publication bias in situ, BMC Medical Research Methodology, 4, 20, (2004)
[49] Piff, P. K.; Stancato, D. M.; Côté, S.; Mendoza-Denton, R.; Keltner, D., Higher social class predicts increased unethical behavior, Proceedings of the National Academy of Sciences USA, 109, 4086-4091, (2012)
[50] Piff, P. K.; Stancato, D. M.; Côté, S.; Mendoza-Denton, R.; Keltner, D., Reply to francis: cumulative power calculations are faulty when based on observed power and a small sample of studies, Proceedings of the National Academy of Sciences USA, (2012)
[51] R Development Core Team. (2011) \(R\): a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. Austria. ISBN 3-900051-07-0. http://www.R-project.org/.
[52] Ritchie, S. J.; Wiseman, R.; French, C. C., Failing the future: three unsuccessful attempts to replicate bem’s ‘retroactive facilitation of recall’ effect, PLoS ONE, 7, 3, e33423, (2012)
[53] Roediger, H. L., Psychology’s woes and a partial cure: the value of replication, APS Observer, 25, 2, (2012)
[54] Rosenthal, R., Applied social research methods series, vol. 6, meta-analytic procedures for social research, (1984), Sage Publications Newbury Park, CA
[55] Sagan, C., The demon-haunted world: science as a candle in the dark, (1997), Ballantine Books New York
[56] Scargle, J. D., Publication bias: the file-drawer problem in scientific inference, Journal of Scientific Exploration, 14, 1, 91-106, (2000)
[57] Schimmack, U., The ironic effect of significant results on the credibility of multiple study articles, Psychological Methods, (2012), Advance online publication
[58] Shea, C. (2011). Fraud scandal fuels debate over practices of social psychology. In The Chronicle of Higher Education. Downloaded from http://chronicle.com/article/As-Dutch-Research-Scandal/129746/.
[59] Simmons, J. P.; Nelson, L. D.; Simonsohn, U., False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant, Psychological Science, 22, 1359-1366, (2011)
[60] Simonsohn, U., Weather to go to college, The Economic Journal, 120, 543, 270-280, (2010)
[61] Simonsohn, U., It does not follow: evaluating the one-off publication bias critiques by francis (2012a, b, c, d, e, f), Perspectives on Psychological Science, 7, 6, 597-599, (2012)
[62] Sterling, T. D., Publication decisions and the possible effects on inferences drawn from test of significance—or vice versa, Journal of the American Statistical Association, 54, 30-34, (1959)
[63] Sterne, J. A.; Gavaghan, D.; Egger, M., Publication and related bias in meta-analysis: power of statistical tests and prevalence in the literature, Journal of Clinical Epidemiology, 53, 1119-1129, (2000)
[64] Thompson, B., Foundations of behavioral statistics: an insight-based approach, (2006), Guilford New York
[65] Topolinski, S.; Sparenberg, P., Turning the hands of time: clockwise movements increase preference for novelty, Social Psychological and Personality Science, 3, 308-314, (2012)
[66] Wagenmakers, E.-J., A practical solution to the pervasive problems of \(p\) values, Psychonomic Bulletin & Review, 14, 779-804, (2007)
[67] Wagenmakers, E.-J.; Wetzels, R.; Borsboom, D.; van der Maas, H. L.J., Why psychologists must change the way they analyze their data: the case of psi: comment on BEM (2011), Journal of Personality and Social Psychology, 100, 426-432, (2011)
[68] Yong, E., Bad copy, Nature, 485, 298-300, (2012)
[69] Yong, E., The data detective, Nature, 487, 18-19, (2012)
[70] Yuan, K. H.; Maxwell, S., On the post hoc power in testing mean differences, Journal of Educational and Behavioral Statistics, 30, 141-167, (2005)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.