Summary: Extending work of *D. B. Rubin* [ibid. 12, 1151-1172 (1984; Zbl 0555.62010)] this paper explores a Bayesian counterpart of the classical $p$-value, namely, a tail-area probability of a “test statistic” under a null hypothesis. The Bayesian formulation, using posterior predictive replications of the data, allows a “test statistic” to depend on both data and unknown (nuisance) parameters and thus permits a direct measure of the discrepancy between sample and population quantities. The tail- area probability for a “test statistic” is then found under the joint posterior distribution of replicate data and the (nuisance) parameters, both conditional on the null hypothesis. This posterior predictive $p$- value can also be viewed as the posterior mean of a classical $p$-value, averaging over the posterior distribution of (nuisance) parameters under the null hypothesis, and thus it provides one general method for dealing with nuisance parameters.

Two classical examples, including the Behrens-Fisher problem, are used to illustrate the posterior predictive $p$-value and some of its interesting properties, which also reveal a new (Bayesian) interpretation for some classical $p$-values. An application to multiple-imputation inference is also presented. A frequency evaluation shows that, in general, if the replication is defined by new (nuisance) parameters and new data, then the Type I frequentist error of an $\alpha $-level posterior predictive test is often close to but less than $\alpha $ and will never exceed $2\alpha $.

##### MSC:

62F15 | Bayesian inference |

62F03 | Parametric hypothesis testing |

62A01 | Foundations and philosophical topics in statistics |