×

Computation of exact bootstrap confidence intervals: complexity and deterministic algorithms. (English) Zbl 1445.90025

Summary: The bootstrap is a nonparametric approach for calculating quantities, such as confidence intervals, directly from data. Since calculating exact bootstrap quantities is believed to be intractable, randomized resampling algorithms are traditionally used. In this paper, we present a new perspective on the bootstrap method through the lens of counting integer points in polyhedra. Through this new perspective, we make several advances for the bootstrap method, both theoretically and algorithmically. First, we establish several computational complexity results for the exact bootstrap method in the case of the sample mean. Second, we present the first efficient deterministic approximation algorithm (fully polynomial time approximation scheme) for producing exact bootstrap confidence intervals which, unlike traditional methods, has guaranteed bounds on the approximation error. Third, we develop a simple exact algorithm for exact bootstrap confidence intervals based on polynomial multiplication. We provide empirical evidence on real and synthetic data sets with several hundreds of data points that the proposed deterministic algorithms can quickly produce confidence intervals that are substantially more accurate than those from randomized methods, and thus are practical alternatives in applications such as clinical trials.

MSC:

90B25 Reliability, availability, maintenance, inspection in operations research
62M20 Inference from stochastic processes and prediction
90C59 Approximation methods and heuristics in mathematical programming

Software:

LattE; MPFR; gmp
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Arora S, Barak B (2009) Computational Complexity: A Modern Approach, 1st ed. (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar · Zbl 1193.68112 · doi:10.1017/CBO9780511804090
[2] Barvinok AI (1994) A polynomial time algorithm for counting integral points in polyhedra when the dimension is fixed. Math. Oper. Res. 19(4):769-779.Link, Google Scholar · Zbl 0821.90085
[3] Bertsimas D, Weismantel R (2005) Optimization Over Integers (Dynamic Ideas, Nashville, TN).Google Scholar
[4] Bertsimas D, Allison K, Pulleyblank WR (2016) The Analytics Edge (Dynamic Ideas, Nashville, TN).Google Scholar
[5] Bertsimas D, Johnson M, Kallus N (2015) The power of optimization over randomization in designing experiments involving small samples. Oper. Res. 63(4):868-876.Link, Google Scholar · Zbl 1329.90078
[6] Chan KY, Lee SM (2001) An exact iterated bootstrap algorithm for small-sample bias reduction. Comput. Statist. Data Anal. 36(1):1-13.Crossref, Google Scholar · Zbl 1080.62516 · doi:10.1016/S0167-9473(00)00029-3
[7] Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to Algorithms, vol. 3 (MIT Press, Cambridge, MA).Google Scholar · Zbl 1187.68679
[8] De Loera JA, Hemmecke R, Tauzer J, Yoshida R (2004) Effective lattice point counting in rational convex polytopes. J. Symbolic Comput. 38(4):1273-1302.Crossref, Google Scholar · Zbl 1137.52303 · doi:10.1016/j.jsc.2003.04.003
[9] DiCiccio T, Efron B (1992) More accurate confidence limits in exponential families. Biometrika 79(2):231-245.Crossref, Google Scholar · Zbl 0752.62027 · doi:10.1093/biomet/79.2.231
[10] DiCiccio T, Efron B (1996) Bootstrap confidence intervals. Statist. Sci. 11(3):189-212.Crossref, Google Scholar · Zbl 0955.62574 · doi:10.1214/ss/1032280214
[11] Durrett R (2010) Probability: Theory and Examples (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar · Zbl 1202.60001 · doi:10.1017/CBO9780511779398
[12] Dyer M (2003) Approximate counting by dynamic programming. Proc. 35th Annual ACM Sympos. Theory Comput. (ACM, New York), 693-699.Google Scholar · Zbl 1192.90242
[13] Dyer M, Stougie L (2006) Computational complexity of stochastic programming problems. Math. Programming 106(3):423-432.Crossref, Google Scholar · Zbl 1134.90027 · doi:10.1007/s10107-005-0597-0
[14] Dyer M, Kannan R, Mount J (1997) Sampling contingency tables. Random Structures Algorithms 10(4):487-506.Crossref, Google Scholar · Zbl 0884.62065 · doi:10.1002/(SICI)1098-2418(199707)10:4<487::AID-RSA4>3.0.CO;2-Q
[15] Dyer M, Frieze A, Kannan R, Kapoor A, Perkovic L, Vazirani U (1993) A mildly exponential time algorithm for approximating the number of solutions to a multidimensional knapsack problem. Combinatorics Probab. Comput. 2(3):271-284.Crossref, Google Scholar · Zbl 0819.90094 · doi:10.1017/S0963548300000675
[16] Dyer ME, Frieze AM (1988) On the complexity of computing the volume of a polyhedron. SIAM J. Comput. 17(5):967-974.Crossref, Google Scholar · Zbl 0668.68049 · doi:10.1137/0217060
[17] Efron B (1979) Bootstrap methods: Another look at the jackknife. Ann. Statist. 7(1):1-26.Crossref, Google Scholar · Zbl 0406.62024 · doi:10.1214/aos/1176344552
[18] Efron B (1982) The Jackknife, the Bootstrap and Other Resampling Plans (SIAM, Philadelphia).Crossref, Google Scholar · Zbl 0496.62036 · doi:10.1137/1.9781611970319
[19] Efron B (1987) Better bootstrap confidence intervals. J. Amer. Statist. Assoc. 82(397):171-185.Crossref, Google Scholar · Zbl 0622.62039 · doi:10.1080/01621459.1987.10478410
[20] Efron B, Tibshirani RJ (1994) An Introduction to the Bootstrap (CRC Press, New York).Crossref, Google Scholar · doi:10.1201/9780429246593
[21] Evans DL, Leemis LM, Drew JH (2006) The distribution of order statistics for discrete random variables with applications to bootstrapping. INFORMS J. Comput. 18(1):19-30.Link, Google Scholar · Zbl 1241.62076
[22] Fisher NI, Hall P (1991) Boostrap algorithms for small samples. J. Statist. Planning Inference 27(2):157-169.Crossref, Google Scholar · doi:10.1016/0378-3758(91)90013-5
[23] Fousse L, Hanrot G, Lefèvre V, Pélissier P, Zimmermann P (2007) MPFR: A multiple-precision binary floating-point library with correct rounding. ACM Trans. Math. Software 33(2):13-15.Crossref, Google Scholar · Zbl 1365.65302 · doi:10.1145/1236463.1236468
[24] Fürer M (2009) Faster integer multiplication. SIAM J. Comput. 39(3):979-1005.Crossref, Google Scholar · Zbl 1192.68926 · doi:10.1137/070711761
[25] Garey MR, Johnson DS (1990) Computers and Intractability: A Guide to the Theory of NP-Completeness (W. H. Freeman & Co., New York).Google Scholar
[26] Gathen JVZ, Gerhard J (2013) Modern Computer Algebra, 3rd ed. (Cambridge University Press, New York).Crossref, Google Scholar · Zbl 1277.68002 · doi:10.1017/CBO9781139856065
[27] Gopalan P, Klivans A, Meka R, Stefankovic D, Vempala S, Vigoda E (2011) An FPTAS for #knapsack and related counting problems. Proc. IEEE 52nd Annual Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 817-826.Google Scholar · Zbl 1292.68167
[28] Granlund T (2017) The GNU multiple precision arithmetic library. Accessed August 1, 2017, http://gmplib.org.Google Scholar
[29] Hall P (1988) Theoretical comparison of bootstrap confidence intervals. Ann. Statist. 16(3):927-953.Crossref, Google Scholar · Zbl 0663.62046 · doi:10.1214/aos/1176350933
[30] Halman N (2016) A deterministic fully polynomial time approximation scheme for counting integer knapsack solutions made easy. Theoret. Comput. Sci. 645:41-47.Crossref, Google Scholar · Zbl 1348.68295 · doi:10.1016/j.tcs.2016.06.015
[31] Halman N, Klabjan D, Mostagir M, Orlin J, Simchi-Levi D (2009) A fully polynomial-time approximation scheme for single-item stochastic inventory control with discrete demand. Math. Oper. Res. 34(3):674-685.Link, Google Scholar · Zbl 1231.90030
[32] Hanasusanto GA, Kuhn D, Wiesemann W (2016) A comment on “Computational complexity of stochastic programming problems.” Math. Programming 159(1-2):557-569.Crossref, Google Scholar · Zbl 1345.90063 · doi:10.1007/s10107-015-0958-2
[33] Harris JM, Hirst JL, Mossinghoff MJ (2008) Combinatorics and Graph Theory, 2nd ed. (Springer-Verlag, New York).Crossref, Google Scholar · Zbl 1170.05001 · doi:10.1007/978-0-387-79711-3
[34] Harvey D (2009) Faster polynomial multiplication via multipoint Kronecker substitution. J. Symbolic Comput. 44(10):1502-1510.Crossref, Google Scholar · Zbl 1194.68265 · doi:10.1016/j.jsc.2009.05.004
[35] Huang J (1991) Efficient computation of the performance of bootstrap and jackknife estimators of the variance of L-statistics. J. Statist. Comput. Simulation 38(1-4):45-56.Crossref, Google Scholar · Zbl 0800.62232 · doi:10.1080/00949659108811318
[36] Hutson AD, Ernst MD (2000) The exact bootstrap mean and variance of an L-estimator. J. Royal Statist. Soc. B Statist. Methodology 62(1):89-94.Crossref, Google Scholar · Zbl 0941.62048 · doi:10.1111/1467-9868.00221
[37] Jerrum M, Sinclair A (1996) The Markov chain Monte Carlo method: An approach to approximate counting and integration. Hochbaum DS, ed., Approximation Algorithms for NP-Hard Problems (PWS Publishing, Boston), 482-520.Google Scholar
[38] Kisielinska J (2013) The exact bootstrap method shown on the example of the mean and variance estimation. Comput. Statist. 28(3):1061-1077.Crossref, Google Scholar · Zbl 1305.65050 · doi:10.1007/s00180-012-0350-0
[39] Kleinberg J, Rabani Y, Tardos É (1997) Allocating bandwidth for bursty connections. SIAM J. Comput. 30(1):191-215.Crossref, Google Scholar · Zbl 0979.05098 · doi:10.1137/S0097539797329142
[40] Kronecker L (1882) Grundzuge einer arithmetischen Theorie der algebraischen Grössen. J. Reine Angewandte Math. 1(92):1-122.Google Scholar · JFM 14.0038.02
[41] Lasserre JB (2009) Linear and Integer Programming vs Linear Integration and Counting (Springer-Verlag, New York).Crossref, Google Scholar · Zbl 1188.90170 · doi:10.1007/978-0-387-09414-4
[42] Li J, Shi T (2014) A fully polynomial-time approximation scheme for approximating a sum of random variables. Oper. Res. Lett. 42(3):197-202.Crossref, Google Scholar · Zbl 1408.90259 · doi:10.1016/j.orl.2014.02.004
[43] Linial N (1986) Hard enumeration problems in geometry and combinatorics. SIAM J. Algebraic Discrete Methods 7(2):331-335.Crossref, Google Scholar · Zbl 0596.68041 · doi:10.1137/0607036
[44] Nesterov Y (2004) Fast Fourier transform and its applications to integer knapsack problems. Discussion Paper 2004/64, CORE, Louvain-la-Neuve, Belgium.Google Scholar
[45] Schönhage A, Strassen V (1971) Schnelle multiplikation großer zahlen. Comput. 7(3-4):281-292.Crossref, Google Scholar · Zbl 0223.68007 · doi:10.1007/BF02242355
[46] Štefankovic D, Vempala S, Vigoda E (2012) A deterministic polynomial-time approximation scheme for counting knapsack solutions. SIAM J. Comput. 41(2):356-366.Crossref, Google Scholar · Zbl 1253.68182 · doi:10.1137/11083976X
[47] Valiant LG (1979a) The complexity of computing the permanent. Theoret. Comput. Sci. 8(2):189-201.Crossref, Google Scholar · Zbl 0415.68008 · doi:10.1016/0304-3975(79)90044-6
[48] Valiant LG (1979b) The complexity of enumeration and reliability problems. SIAM J. Comput. 8(3):410-421.Crossref, · Zbl 0419.68082 · doi:10.1137/0208032
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.