Bootstrap confidence intervals. With comments and a rejoinder by the authors. (English) Zbl 0955.62574

Summary: This article surveys bootstrap methods for producing good approximate confidence intervals. The goal is to improve by an order of magnitude upon the accuracy of the standard intervals \(\hat\theta\pm z^{(\alpha)}\hat\sigma\), in a way that allows routine application even to very complicated problems. Both theory and examples are used to show how this is done. The first seven sections provide a heuristic overview of four bootstrap confidence interval procedures: \(BC_a\), bootstrap-\(t\), ABC and calibration. Sections 8 and 9 describe the theory behind these methods, and their close connection with the likelihood-based confidence interval theory developed by Barndorff-Nielsen, Cox and Reid and others.


62G09 Nonparametric statistical resampling methods


bootstrap; iASA
Full Text: DOI


[1] Babu, G. J. and Singh, K. (1983). Inference on means using the bootstrap. Ann. Statist. 11 999-1003. · Zbl 0539.62043
[2] Barndorff-Nielsen, O. E. (1983). On a formula for the distribution of the maximum likelihood estimator. Biometrika 70 343-365. JSTOR: · Zbl 0532.62006
[3] Barndorff-Nielsen, O. E. (1986). Inference on full or partial parameters based on the standardized signed log likelihood ratio. Biometrika 73 307-322. JSTOR: · Zbl 0605.62020
[4] Barndorff-Nielsen, O. E. (1994). Adjusted versions of profile likelihood and likelihood roots, and extended likelihood. J. Roy. Statist. Soc. Ser. B 56 125-140. JSTOR: · Zbl 0798.62005
[5] Barndorff-Nielsen, O. E. and Chamberlin, S. R. (1994). Stable and invariant adjusted directed likelihoods. Biometrika 81 485-499. JSTOR: · Zbl 0812.62030
[6] Beran, R. (1987). Prepivoting to reduce level error of confidence sets. Biometrika 74 457-468. JSTOR: · Zbl 0663.62045
[7] Bickel, P. J. (1987). Comment on ”Better bootstrap confidence intervals” by B. Efron. J. Amer. Statist. Assoc. 82 191. JSTOR: · Zbl 0622.62039
[8] Bickel, P. J. (1988). Discussion of ”Theoretical comparison of bootstrap confidence intervals” by P. Hall. Ann. Statist. 16 959-961. · Zbl 0663.62046
[9] Cox, D. R. and Reid, N. (1987). Parameter orthogonality and approximate conditional inference. J. Roy. Statist. Soc. Ser. B 49 1-39. JSTOR: · Zbl 0616.62006
[10] Cox, D. R. and Reid, N. (1993). A note on the calculation of adjusted profile likelihood. J. Roy. Statist. Soc. Ser. B 55 467-472. JSTOR: · Zbl 0797.62015
[11] DiCiccio, T. J. (1984). On parameter transformations and interval estimation. Biometrika 71 477-485. JSTOR: · Zbl 0566.62022
[12] DiCiccio, T. J. and Efron, B. (1992). More accurate confidence intervals in exponential families. Biometrika 79 231-245. JSTOR: · Zbl 0752.62027
[13] DiCiccio, T. J. and Martin, M. (1993). Simple modifications for signed roots of likelihood ratio statistics. J. Roy. Statist. Soc. Ser. B 55 305-316. JSTOR: · Zbl 0794.62014
[14] DiCiccio, T. J. and Romano, J. P. (1995). On bootstrap procedures for second-order accurate confidence limits in parametric models. Statist. Sinica 5 141-160. · Zbl 0829.62035
[15] Efron, B. (1979). Bootstrap methods: another look at the jackknife. Ann. Statist. 7 1-26. · Zbl 0406.62024
[16] Efron, B. (1981). Nonparametric estimates of standard error: the jackknife, the bootstrap, and other methods. Biometrika 68 589-599. JSTOR: · Zbl 0487.62031
[17] Efron, B. (1987). Better bootstrap confidence intervals (with discussion). J. Amer. Statist. Assoc. 82 171-200. JSTOR: · Zbl 0622.62039
[18] Efron, B. (1993). Bay es and likelihood calculations from confidence intervals. Biometrika 80 3-26. JSTOR: · Zbl 0773.62021
[19] Efron, B. (1994). Missing data, imputation, and the bootstrap (with comment and rejoinder). J. Amer. Statist. Assoc. 89 463-478. JSTOR: · Zbl 0806.62033
[20] Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman and Hall, New York. · Zbl 0835.62038
[21] Hall, P. (1986). On the bootstrap and confidence intervals. Ann. Statist. 14 1431-1452. · Zbl 0611.62047
[22] Hall, P. (1988). Theoretical comparison of bootstrap confidence intervals (with discussion). Ann. Statist. 16 927-985. · Zbl 0663.62046
[23] Hall, P. and Martin, M. A. (1988). On bootstrap resampling and iteration. Biometrika 75 661-671. JSTOR: · Zbl 0659.62053
[24] Hougaard, P. (1982). Parametrizations of non-linear models. J. Roy. Statist. Soc. Ser. B 44 244-252. JSTOR: · Zbl 0494.62032
[25] Lawley, D. N. (1956). A general method for approximating to the distribution of the likelihood ratio criteria. Biometrika 43 295-303. JSTOR: · Zbl 0073.13602
[26] Loh, W.-Y. (1987). Calibrating confidence coefficients. J. Amer. Statist. Assoc. 82 155-162. JSTOR: · Zbl 0608.62057
[27] McCullagh, P. (1984). Local sufficiency. Biometrika 71 233-244. JSTOR: · Zbl 0573.62026
[28] McCullagh, P. (1987). Tensor Methods in Statistics. Chapman and Hall, London. · Zbl 0732.62003
[29] McCullagh, P. and Tibshirani, R. (1990). A simple method for the adjustment of profile likelihoods. J. Roy. Statist. Soc. Ser. B 52 325-344. JSTOR: · Zbl 0716.62039
[30] Peers, H. W. (1965). On confidence points and Bayesian probability points in the case of several parameters. J. Roy. Statist. Soc. Ser. B 27 9-16. JSTOR: · Zbl 0144.41403
[31] Pierce, D. and Peters, D. (1992). Practical use of higher order asy mptotics for multiparameter exponential families (with discussion) J. Roy. Stat. Soc. Ser. B 54 701-725. JSTOR:
[32] Sprott, D. A. (1980). Maximum likelihood in small samples: estimation in the presence of nuisance parameters. Biometrika 67 515-523. JSTOR: · Zbl 0447.62040
[33] Tibshirani, R. (1988). Variance stabilization and the bootstrap. Biometrika 75 433-444. JSTOR: · Zbl 0651.62039
[34] Tibshirani, R. (1989). Noninformative priors for one parameter of many. Biometrika 76 604-608. JSTOR: · Zbl 0678.62010
[35] such as those described by Hall (1988). But why not replace them altogether with more informative tools? The bootstrap affords a unique opportunity for obtaining a large amount of information very simply. The process of setting confidence intervals merely picks two points off a bootstrap histogram, ignoring much relevant information about shape and other important features. ”Confidence pictures” (e.g., Hall, 1992, Appendix III), essentially smoothed and transformed bootstrap histograms, are one alternative to confidence intervals. Graphics such as these provide a simple but powerful way to convey information lost in numerical summaries. The opportunities offered by dy namic graphics are also attractive, particularly when confidence information needs to be passed to a lay audience. (Consider, e.g., the need to provide information about the errors associated with predictions from opinion polls.) Bootstrap methods and new graphical way s of presenting information offer, together, exciting prospects for conveying information about uncertainty.
[36] jackknife-after-bootstrap plot for t (Efron, 1992). The ingenious idea that underlies this is that we can get the effect of bootstrapping the reduced data set y1 yj-1 yj+1 yn by considering only those bootstrap samples in which yj did not appear. The horizontal dotted lines are quantiles of t t for all 999 bootstrap replicates, while the solid lines join the corresponding quantiles for the subsets of bootstrap replicates in which each of the 20 observations did not appear. The x-axis shows empirical influence values lj which measure the effect on t of putting more mass on each of the observations separately. If F represents the empirical distribution function of the data, which puts mass n-1 on each of the observations y1 yn and t F is the corresponding statistic, we can write of Davison and Hinkley (1996). We have developed a library of bootstrap functions in S-PLUS which facilitates this ty pe of analysis. The library may be obtained by anony mous ftp to markov.stats.ox.ac.uk and retrieving the file pub/canty/bootlib.sh.Z.
[37] ABC, use bootstrap calibration directly on the crude percentile-based procedures these methods refine, and which seem currently favored in published applications of the bootstrap, as any literature search confirms. In doing so, we retain the desirable properties of these basic procedures (stability of length and endpoints, invariance under parametrization etc.) yet improve their coverage accuracy. The price is one of great computational expense, although, as is demonstrated by Lee and Young (1995), there are approximations which can bring such bootstrap iteration within the reach of even a modest computational budget. An advantage of this solution lies in its simplicity: there is no need to explain the mechanics of the method, in the way that is done for the BCa and ABC methods in Sections 2-4 of DiCiccio and Efron’s paper. Which solution is best? To answer this requires a careful analysis of what we believe the bootstrap methodology to be. Our view is that willingness to use extensive computation to extract information from a data sample, by simulation or resampling, is quite fundamental. In other words, in comparing different methods, computational expense should not be a factor. All things being equal, we naturally look for computational efficiency, but things are hardly ever equal. How do the two solutions, that provided by DiCiccio and Efron and that involving the iterated percentile bootstrap, compare? There are two concerns here, theoretical performance and empirical performance, and the two might conflict. We demonstrate by considering the simple problem of constructing a two-sided nonparametric bootstrap confidence interval for a scalar population mean.
[38] and Beran (1987). The calibration method of Loh (1987) corresponds to the method of Beran (1987) when applied to a bootstrap confidence interval. For the confidence interval problem the method of Hall (1986) amounts to making an additive adjustment, estimated by the bootstrap, to the endpoints of the confidence interval, while the method of Beran (1987) amounts to making an additive adjustment, again estimated by bootstrapping, to the nominal coverage level of the bootstrap interval. The method of calibration described by DiCiccio and Efron in Section 7 of their paper is a subtle variation on the latter procedure, and one which should be used with care. DiCiccio and Efron use a method in which the bootstrap is used to calibrate separately the nominal levels of the lower and upper limits of the interval, rather than the overall nominal level. Theoretical and empirical evidence which we shall present elsewhere leads to the conclusion that, all things being taken into consideration, preference should be shown to methods which adjust nominal coverage, rather than the interval endpoints. We shall therefore focus on the question of how to calibrate the nominal coverage of a bootstrap confidence interval. The major difference between the two approaches to adjusting nominal coverage is that the method as illustrated by DiCiccio and Efron is only effective in reducing coverage error of the two-sided interval to order n-2 when the one-sided coverage-corrected interval achieves a coverage error of order n-3/2, as is the case with the ABC interval, but not the percentile interval. The effect of bootstrap calibration on the coverage error of one-sided intervals is discussed by Hall and Martin (1988) and by Martin (1990), who show that bootstrap coverage correction produces improvements in coverage accuracy of order n-1/2, therefore reducing coverage error from order n-1/2 to order n-1 for percentile intervals, but from order n-1 to order n-3/2 for the ABC interval. If the one-sided corrected interval has coverage error of order n-3/2, then separate correction of the upper and lower limits gives a two-sided interval with coverage error of order n-2, due to the fact that the order n-3/2 term involves an even poly nomial. With the percentile interval, the coverage error, of order n-1, of the coverage-corrected one-sided interval ty pically involves an odd poly nomial, and terms of that order will not cancel when determining the coverage error of the two-sided interval, which remains of order n-1. On the face of it, therefore, we should be wary of the calibration method described by DiCiccio and Efron, although the problems with it do not arise with the ABC interval.
[39] cedure: see also Martin (1990). Application of these methods to the intervals under consideration here allows closer examination of coverage error. Table 1 gives information on the theoretical leading terms in expansions of the coverage error of the percentile interval (denoted IP), iterated percentile interval (denoted IPITa and IPITb), ABC interval (denoted IABC) and iterated ABC interval (denoted by IABCITa and IABCITb). Figures refer to two-sided intervals of nominal coverage 90
[40] Daniels, H. E. and Young, G. A. (1991). Saddlepoint approximation for the Studentized mean, with an application to the bootstrap. Biometrika 78 169-179. JSTOR:
[41] Davison, A. C. and Hinkley, D. V. (1988). Saddlepoint approximations in resampling methods. Biometrika 75 417-431. JSTOR: · Zbl 0651.62018
[42] Davison, A. C. and Hinkley, D. V. (1996). Bootstrap Methods and Their Application. Cambridge Univ. Press. · Zbl 0886.62001
[43] Davison, A. C., Hinkley, D. V. and Worton, B. J. (1992). Bootstrap likelihoods. Biometrika 79 113-130. JSTOR: · Zbl 0753.62026
[44] DiCiccio, T. J., Martin, M. A. and Young, G. A. (1992). Fast and accurate approximate double bootstrap confidence intervals. Biometrika 79 285-295. JSTOR: · Zbl 0751.62013
[45] DiCiccio, T. J., Martin, M. A. and Young, G. A. (1993). Analy tical approximations for iterated bootstrap confidence intervals. Statistics and Computing 2 161-171.
[46] Efron, B. (1992). Jackknife-after-bootstrap standard errors and influence functions (with discussion). J. Roy Statist. Soc. Ser. B 54 83-127. JSTOR: · Zbl 0782.62051
[47] Efron, B. and LePage, R. (1992). Introduction to bootstrap. In Exploring the Limits of Bootstrap (R. LePage and L. Billard, eds.) 3-10. Wiley, New York. · Zbl 0835.62038
[48] Gleser, L. J. and Hwang, J. T. (1987). The nonexistence of 100 1 percent confidence sets of finite expected diameter in errors-in-variables and related models. Ann. Statist. 15 1351-1362. · Zbl 0638.62035
[49] Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer, New York. · Zbl 0744.62026
[50] Hall, P. and Jing, B-Y. (1995). Uniform coverage bounds for confidence intervals and Berry-Esseen theorems for Edgeworth expansion. Ann. of Statist. 23 363-375. · Zbl 0824.62043
[51] Jensen, J. L. (1986). Similar tests and the standardized log likelihood ratio statistic. Biometrika 73 567-572. JSTOR: · Zbl 0608.62021
[52] Lee, S. M. S. and Young, G. A. (1995). Asy mptotic iterated bootstrap confidence intervals. Ann. Statist. 23 1301-1330. Lee, S. M. S. and Young, G. A. (1996a). Sequential iterated bootstrap confidence intervals. J. Roy. Statist. Soc. Ser. B 58 235-252. Lee, S. M. S. and Young, G. A. (1996b). Asy mptotics and resampling methods. Computing Science and Statistics. To appear. Lu, K. L. and Berger, J. O. (1989a). Estimation of normal means: frequentist estimation of loss. Ann. Statist. 17 890-906. Lu, K. L. and Berger, J. O. (1989b). Estimated confidence for multivariate normal mean confidence set. J. Statist. Plann. Inference 23 1-20. · Zbl 0838.62034
[53] Martin, M. A. (1990). On bootstrap iteration for coverage correction in confidence intervals. J. Amer. Statist. Assoc. 85 1105-1118. JSTOR: · Zbl 0736.62040
[54] Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75 237-249. JSTOR: · Zbl 0641.62032
[55] Owen, A. B. (1990). Empirical likelihood ratio confidence regions. Ann. Statist. 18 90-120. · Zbl 0712.62040
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.