Interval estimation for a binomial proportion. (With comments and a rejoinder).

*(English)*Zbl 1059.62533Summary: We revisit the problem of interval estimation of a binomial proportion. The erratic behavior of the coverage probability of the standard Wald confidence interval has previously been remarked on in the literature (Blyth and Still, Agresti and Coull, Santner, and others). We begin by showing that the chaotic coverage properties of the Wald interval are far more persistent than is appreciated. Furthermore, common textbook prescriptions regarding its safety are misleading and defective in several respects and cannot be trusted.

This leads us to consideration of alternative intervals. A number of natural alternatives are presented, each with its motivation and context. Each interval is examined for its coverage probability and its length. Based on this analysis, we recommend the Wilson interval or the equal-tailed Jeffreys prior interval for small n and the interval suggested by Agresti and Coull for larger n. We also provide an additional frequentist justification for use of the Jeffreys interval.

This leads us to consideration of alternative intervals. A number of natural alternatives are presented, each with its motivation and context. Each interval is examined for its coverage probability and its length. Based on this analysis, we recommend the Wilson interval or the equal-tailed Jeffreys prior interval for small n and the interval suggested by Agresti and Coull for larger n. We also provide an additional frequentist justification for use of the Jeffreys interval.

##### MSC:

62F25 | Parametric tolerance and confidence regions |

##### Keywords:

Bayes; binomial distribution; confidence intervals; coverage probability; Edgeworth expansion; expected length; Jeffreys prior; normal approximation; posterior
PDF
BibTeX
XML
Cite

\textit{L. D. Brown} et al., Stat. Sci. 16, No. 2, 101--133 (2001; Zbl 1059.62533)

Full Text:
DOI

##### References:

[1] | Abramowitz, M. andStegun, I. A. (1970). Handbook of Mathematical Functions. Dover, New York. |

[2] | Agresti, A. andCoull, B. A. (1998). Approximate is better than ”exact” for interval estimation of binomial proportions. Amer. Statist. 52 119-126. JSTOR: · Zbl 04546791 · doi:10.2307/2685469 · links.jstor.org |

[3] | Anscombe, F. J. (1948). The transformation of Poisson, binomial andnegative binomial data. Biometrika 35 246-254. JSTOR: · Zbl 0032.03702 · doi:10.1093/biomet/35.3-4.246 · links.jstor.org |

[4] | Anscombe, F. J. (1956). On estimating binomial response relations. Biometrika 43 461-464. JSTOR: · Zbl 0074.14703 · doi:10.1093/biomet/43.3-4.461 · links.jstor.org |

[5] | Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis, 2nded. Springer, New York. · Zbl 0572.62008 |

[6] | Berry, D. A. (1996). Statistics: A Bayesian Perspective. Wadsworth, Belmont, CA. |

[7] | Bickel, P. andDoksum, K. (1977). Mathematical Statistics. Prentice-Hall, EnglewoodCliffs, NJ. |

[8] | Blyth, C. R. and Still, H. A. (1983). Binomial confidence intervals. J. Amer. Statist. Assoc. 78 108-116. JSTOR: · Zbl 0503.62028 · doi:10.2307/2287116 · links.jstor.org |

[9] | Brown, L. D., Cai, T. andDasGupta, A. (1999). Confidence intervals for a binomial proportion andasymptotic expansions. Ann. Statist |

[10] | Brown, L. D., Cai, T. andDasGupta, A. (2000). Interval estimation in discrete exponential family. Technical report, Dept. Statistics. Univ. Pennsylvania. |

[11] | Casella, G. (1986). Refining binomial confidence intervals Canad. J. Statist. 14 113-129. JSTOR: · Zbl 0592.62029 · doi:10.2307/3314658 · links.jstor.org |

[12] | Casella, G. andBerger, R. L. (1990). Statistical Inference. Wadsworth & Brooks/Cole, Belmont, CA. · Zbl 0699.62001 |

[13] | Clopper, C. J. and Pearson, E. S. (1934). The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26 404-413. · JFM 60.1175.02 |

[14] | Cox, D. R. and Snell, E. J. (1989). Analysis of Binary Data, 2nd ed. Chapman and Hall, London. · Zbl 0729.62004 |

[15] | Cressie, N. (1980). A finely tunedcontinuity correction. Ann. Inst. Statist. Math. 30 435-442. · Zbl 0445.62033 · doi:10.1007/BF02480234 |

[16] | Ghosh, B. K. (1979). A comparison of some approximate confidence intervals for the binomial parameter J. Amer. Statist. Assoc. 74 894-900. · Zbl 0429.62025 · doi:10.2307/2286420 |

[17] | Hall, P. (1982). Improving the normal approximation when constructing one-sided confidence intervals for binomial or Poisson parameters. Biometrika 69 647-652. JSTOR: · Zbl 0493.62036 · doi:10.1093/biomet/69.3.647 · links.jstor.org |

[18] | Lehmann, E. L. (1999). Elements of Large-Sample Theory. Springer, New York. · Zbl 0914.62001 |

[19] | Newcombe, R. G. (1998). Two-sided confidence intervals for the single proportion; comparison of several methods. Statistics in Medicine 17 857-872. |

[20] | Rao, C. R. (1973). Linear Statistical Inference and Its Applications. Wiley, New York. · Zbl 0256.62002 |

[21] | Samuels, M. L. and Witmer, J. W. (1999). Statistics for the Life Sciences, 2nded. Prentice Hall, Englewood Cliffs, NJ. |

[22] | Santner, T. J. (1998). A note on teaching binomial confidence intervals. Teaching Statistics 20 20-23. |

[23] | Santner, T. J. and Duffy, D. E. (1989). The Statistical Analysis of Discrete Data. Springer, Berlin. · Zbl 0702.62005 |

[24] | Stone, C. J. (1995). A Course in Probability and Statistics. Duxbury, Belmont, CA. |

[25] | Strawderman, R. L. and Wells, M. T. (1998). Approximately exact inference for the common odds ratio in several 2 \times 2 tables (with discussion). J. Amer. Statist. Assoc. 93 1294- 1320. · Zbl 1064.62533 · doi:10.2307/2670044 |

[26] | Tamhane, A. C. and Dunlop, D. D. (2000). Statistics and Data Analysis from Elementary to Intermediate. Prentice Hall, EnglewoodCliffs, NJ. |

[27] | Vollset, S. E. (1993). Confidence intervals for a binomial proportion. Statistics in Medicine 12 809-824. |

[28] | Wasserman, L. (1991). An inferential interpretation of default priors. Technical report, Carnegie-Mellon Univ. |

[29] | Wilson, E. B. (1927). Probable inference, the law of succession, andstatistical inference. J. Amer. Statist. Assoc. 22 209-212. |

[30] | Oliver (2000). But this does not account for the effects of discreteness, and as BCD point out, guidelines in terms of p are not verifiable. For elementary course teaching there is no obvious alternative (such as t methods) for smaller n, so we think it is sensible to teach a single methodthat behaves reasonably well for all n, as do the Wilson, Jeffreys and Agresti-Coull intervals. |

[31] | . Presumably some other boundary modification will result in a happy medium. In a letter to the editor about Agresti and Coull (1998), Rindskopf (2000) argued in favor of the logit interval partly because of its connection with logit modeling. We have not usedthis methodfor teaching in elementary courses, since logit intervals do not extendto intervals for the difference of proportions and(like CIW andCIJ) they are rather complex for that level. For practical use andfor teaching in more advanced courses, some statisticians may prefer the likelihoodratio interval, since conceptually it is simple andthe methodalso applies in a general modelbuilding framework. An advantage compared to the Waldapproach is its invariance to the choice of scale, resulting, for instance, both from the original scale andthe logit. BCD do not say much about this interval, since it is harder to compute. However, it is easy to obtain with standard statistical software (e.g., in SAS, using the LRCI option in PROC GENMOD for a model containing only an intercept term andassuming a binomial response with logit or identity link function). Graphs in Vollset (1993) suggest that the boundary-modified likelihood ratio interval also behaves reasonably well, although conservative for p near 0 and1. For elementary course teaching, a disadvantage of all such intervals using boundary modifications is that making exceptions from a general, simple recipe distracts students from the simple concept of taking the estimate plus andminus a normal score multiple of a standard error. (Of course, this concept is not sufficient for serious statistical work, but some over simplification andcompromise is necessary at that level.) Even with CIAC, instructors may findit preferable to give a recipe with the same number of added pseudo observations for all |

[32] | , insteadof z2 /2. Reasonably goodperformance seems to result, especially for small, from the value 4 z20 025 usedin the 95% CIAC interval (i.e., the ”addtwo successes andtwo failures” interval). Agresti andCaffo (2000) discussedthis andshowed that adding four pseudo observations also dramatically improves the Waldtwo-sample interval for comparing proportions, although again at the cost of rather severe conservativeness when both parameters are near 0 or near 1. |

[33] | ; Blyth andStill, 1983). Finally, we are curious about the implications of the BCD results in a more general setting. How much does their message about the effects of discreteness andbasing interval estimation on the Jeffreys prior or the score test rather than the Wald test extend to parameters in other discrete distributions andto two-sample comparisons? We have seen that interval estimation of the Poisson parameter benefits from inverting the score test rather than the Waldtest on the count scale (Agresti and Coull, 1998). One wouldnot think there couldbe anything new to say about the Waldconfid ence interval for a proportion, an inferential methodthat must be one of the most frequently usedsince Laplace (1812, page 283). Likewise, the confidence interval for a proportion basedon the Jeffreys prior has receivedattention in various forms for some time. For instance, R. A. Fisher (1956, pages 63- 70) showedthe similarity of a Bayesian analysis with Jeffreys prior to his fiducial approach, in a discussion that was generally critical of the confidence interval method but grudgingly admitted of limits obtainedby a test inversion such as the Clopper- Pearson method, ”though they fall short in logical content of the limits foundby the fiducial argument, andwith which they have often been confused, they do fulfil some of the desiderata of statistical inferences.” Congratulations to the authors for brilliantly casting new light on the performance of these old andestablishedmethods. |

[34] | lishedin Datta andGhosh (1996). Thus a uniform prior for arcsin 1/2, where is the binomial proportion, leads to Jeffreys Beta (1/2, 1/2) prior for. When is the Poisson parameter, the uniform prior for 1/2 leads to Jeffreys’ prior -1/2 for. In a more formal set-up, let X1 Xn be iid conditional on some real-valued. Let 1 X1 Xn denote a posterior 1 th quantile for under the prior. Then is saidto be a first-order probability matching prior if P 1 X1 Xn = 1 + o n-1/2 (1) |

[35] | as derived in Tibshirani (1989). Here h \cdot is an arbitrary function differentiable in its arguments. In general, matching priors have a long success story in providing frequentist confidence intervals, especially in complex problems, for example, the Behrens-Fisher or the common mean estimation problems where frequentist methods run into difficulty. Though asymptotic, the matching property seems to holdfor small andmoderate sample sizes as well for many important statistical problems. One such example is Garvan andGhosh (1997) where such priors were foundfor general disper sion models as given in Jorgensen (1997). It may be worthwhile developing these priors in the presence of nuisance parameters for other discrete cases as well, for example when the parameter of interest is the difference of two binomial proportions, or the log-odds ratio in a 2 \times 2 contingency table. Having arguedso strongly in favor of matching priors, I wonder, though, whether there is any special needfor such priors in this particular problem of binomial proportions. It appears that any Beta (a a) prior will do well in this case. As noted in this paper, by shrinking the MLE X/n towardthe prior mean 1/2, one achieves a better centering for the construction of confidence intervals. The two diametrically opposite priors Beta (2, 2) (symmetric concave with maximum at 1/2 which provides the Agresti-Coull interval) andJeffreys prior Beta (1/2 1/2) (symmetric convex with minimum at 1/2) seem to be equally good for recentering. Indeed, I wonder whether any Beta prior which shrinks the MLE toward the prior mean / + becomes appropriate for recentering. The problem of construction of confidence intervals for binomial proportions occurs in first courses in statistics as well as in day-to-day consulting. While I am strongly in favor of replacing Waldintervals by the new ones for the latter, I am not quite sure how easy it will be to motivate these new intervals for the former. The notion of shrinking can be explained adequately only to a few strong students in introductory statistics courses. One possible solution for the classroom may be to bring in the notion of continuity correction andsomewhat heuristcally ask students to work with X+ 12 n-X+ 12 instead of X n X. In this way, one centers around X + 12 / n + 1 a la Jeffreys prior. |

[36] | interval (e.g., Theorem 1 of Ghosh, 1979). My first set of comments concern the specific binomial problem that the authors address and then the implications of their work for other important discrete data confidence interval problems. The results in Ghosh (1979) complement the calculations of Brown, Cai andDasGupta (BCD) by pointing out that the Waldinterval is ”too long” in addition to being centered at the ”wrong” value (the MLE as opposedto a Bayesian point estimate such is usedby the Agresti-Coull interval). His Table 3 lists the probability that the Waldinterval is longer than the Wilson interval for a central set of p values (from 0.20 to 0.80) anda range of sample sizes n from 20 to 200. Perhaps surprisingly, in view of its inferior coverage characteristics, the Waldinterval tends to be longer than the Wilson interval with very high probability. Hence the Waldinterval is both too long andcenteredat the wrong place. This is a dramatic effect of the skewness that BCD mention. When discussing any system of intervals, one is concernedwith the consistency of the answers given by the interval across multiple uses by a single researcher or by groups of users. Formally, this is the reason why various symmetry properties are requiredof confidence intervals. For example, in the present case, requiring that the p interval L X U X satisfy the symmetry property L x U x = 1 L n x 1 U n x (1) for x 0 n shows that investigators who reverse their definitions of success and failure will Pearson (1934). For the binomial problem, Blyth and Still (1983) constructeda set of confidence intervals by selecting among size acceptance regions those that possessed additional symmetry properties and were ”small” (leading to short confidence intervals). For example, they desired that the interval should ”move to the right” as x increases when n is fixed andshould”move the left” as n increases when x is fixed. They also asked that their system of intervals increase monotonically in the coverage probability for fixed x and n in the sense that the higher nominal coverage interval contain the lower nominal coverage interval. In addition to being less intuitive to unsophisticatedstatistical consumers, systems of confidence intervals formedby inversion of acceptance regions also have two other handicaps that have hindered their rise in popularity. First, they typically require that the confidence interval (essentially) be constructedfor all possible outcomes, rather than merely the response of interest. Second, their rather brute force character means that a specializedcomputer program must be written to produce the acceptance sets andtheir inversion (the intervals). |

[37] | Gupta (2001). This article also discusses a number of additional issues and presents further analytical calculations, including a Pearson tilting similar to the chi-square tilts advised in Hall (1983). Corcoran andMehta’s Figure 2 compares average length of three of our proposals with Blyth-Still and with their likelihoodratio procedure. We note first that their LB procedure is not the same as ours. Theirs is basedon numerically computedexact percentiles of the fixedsample likelihoodratio statistic. We suspect this is roughly equivalent to adjustment of the chi-squaredpercentile by a Bartlett correction. Ours is basedon the traditional asymptotic chi-squaredformula for the distribution of the likelihoodratio statistic. Consequently, their procedure has conservative coverage, whereas ours has coverage fluctuating aroundthe nominal value. They assert that the difference in expected length is ”negligible.” How much difference qualifies as negligible is an arguable, subjective evaluation. But we note that in their plot their intervals can be on average about 8% or 10% longer than Jeffreys or Wilson intervals, respectively. This seems to us a nonnegligible difference. Actually, we suspect their preference for their LR andBSC intervals rests primarily on their overriding preference for conservativity in coverage whereas, as we have discussed above, our intervals are designed to attain approximately the desired nominal value. D. Santner proposes an interesting variant of the original Blyth-Still proposal. As we understand it, he suggests producing nominal |

[38] | Agresti, A. andCaffo, B. (2000). Simple andeffective confidence intervals for proportions and differences of proportions result from adding two successes and two failures. Amer. Statist. 54. To appear. JSTOR: · Zbl 1250.62016 · doi:10.2307/2685779 · links.jstor.org |

[39] | Aitkin, M., Anderson, D., Francis, B. and Hinde, J. (1989). Statistical Modelling in GLIM. OxfordUniv. Press. · Zbl 0676.62001 |

[40] | Boos, D. D. and Hughes-Oliver, J. M. (2000). How large does n have to be for Z and t intervals? Amer. Statist. 54 121-128. Brown, L. D., Cai, T. and DasGupta, A. (2000a). Confidence intervals for a binomial proportion andasymptotic expansions. Ann. Statist. To appear. Brown, L. D., Cai, T. and DasGupta, A. (2000b). Interval estimation in exponential families. Technical report, Dept. Statistics, Univ. Pennsylvania. JSTOR: · Zbl 04547435 · doi:10.2307/2685927 · links.jstor.org |

[41] | Brown, L. D. and Li, X. (2001). Confidence intervals for the difference of two binomial proportions. Unpublished manuscript. |

[42] | Cai, T. (2001). One-sided confidence intervals and hypothesis testing in discrete distributions. |

[43] | Coe, P. R. and Tamhane, A. C. (1993). Exact repeatedconfidence intervals for Bernoulli parameters in a group sequential clinical trial. Controlled Clinical Trials 14 19-29. |

[44] | Cox, D. R. and Reid, N. (1987). Orthogonal parameters and approximate conditional inference (with discussion). J. Roy. Statist. Soc. Ser. B 49 113-147. JSTOR: · Zbl 0616.62006 · links.jstor.org |

[45] | DasGupta, A. (2001). Some further results in the binomial interval estimation problem. |

[46] | Datta, G. S. and Ghosh, M. (1996). On the invariance of noninformative priors. Ann. Statist. 24 141-159. · Zbl 0906.62024 · doi:10.1214/aos/1033066203 |

[47] | Duffy, D. and Santner, T. J. (1987). Confidence intervals for a binomial parameter basedon multistage tests. Biometrics 43 81-94. · Zbl 0657.62091 · doi:10.2307/2531951 |

[48] | Fisher, R. A. (1956). Statistical Methods for Scientific Inference. Oliver and Boyd, Edinburgh. · Zbl 0070.36903 |

[49] | Gart, J. J. (1966). Alternative analyses of contingency tables. J. Roy. Statist. Soc. Ser. B 28 164-179. JSTOR: · Zbl 0138.13502 · links.jstor.org |

[50] | Garvan, C. W. and Ghosh, M. (1997). Noninformative priors for dispersion models. Biometrika 84 976-982. JSTOR: · Zbl 0892.62010 · doi:10.1093/biomet/84.4.976 · www3.oup.co.uk |

[51] | Ghosh, J. K. (1994). Higher Order Asymptotics. IMS, Hayward, CA. · Zbl 1163.62305 |

[52] | Hall, P. (1983). Chi-squaredapproximations to the distribution of a sum of independent random variables. Ann. Statist. 11 1028-1036. · Zbl 0525.60028 · doi:10.1214/aop/1176993451 |

[53] | Jennison, C. and Turnbull, B. W. (1983). Confidence intervals for a binomial parameter following a multistage test with application to MIL-STD 105D andmedical trials. Technometrics, 25 49-58. · Zbl 0504.62088 · doi:10.2307/1267726 |

[54] | Jorgensen, B. (1997). The Theory of Dispersion Models. CRC Chapman andHall, London. |

[55] | Laplace, P. S. (1812). Théorie Analytique des Probabilités. Courcier, Paris. |

[56] | Larson, H. J. (1974). Introduction to Probability Theory and Statistical Inference, 2nded. Wiley, New York. · Zbl 0288.62002 |

[57] | Pratt, J. W. (1961). Length of confidence intervals. J. Amer. Statist. Assoc. 56 549-567. JSTOR: · Zbl 0099.14002 · doi:10.2307/2282079 · links.jstor.org |

[58] | Rindskopf, D. (2000). Letter to the editor. Amer. Statist. 54 88. |

[59] | Rubin, D. B. and Schenker, N. (1987). Logit-basedinterval estimation for binomial data using the Jeffreys prior. Sociological Methodology 17 131-144. |

[60] | Sterne, T. E. (1954). Some remarks on confidence or fiducial limits. Biometrika 41 275-278. JSTOR: · Zbl 0055.12807 · links.jstor.org |

[61] | Tibshirani, R. (1989). Noninformative priors for one parameter of many. Biometrika 76 604-608. JSTOR: · Zbl 0678.62010 · doi:10.1093/biomet/76.3.604 · links.jstor.org |

[62] | Welch, B. L. and Peers, H. W. (1963). On formula for confidence points basedon intergrals of weightedlikelihoods. J. Roy. Statist. Ser. B 25 318-329. JSTOR: · Zbl 0117.14205 · links.jstor.org |

[63] | Yamagami, S. and Santner, T. J. (1993). Invariant small sample confidence intervals for the difference of two success probabilities. Comm. Statist. Simul. Comput. 22 33-59. · Zbl 0825.62420 · doi:10.1080/03610919308813080 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.