Gap bootstrap methods for massive data sets with an application to transportation engineering. (English) Zbl 1257.62051

Summary: We describe two bootstrap methods for massive data sets. Naive applications of common resampling methodology are often impractical for massive data sets due to the computational burden and due to complex patterns of inhomogeneity. In contrast, the proposed methods exploit certain structural properties of a large class of massive data sets to break up the original problem into a set of simpler subproblems, solve each subproblem separately where the data exhibit approximate uniformity and where computational complexity can be reduced to a manageable level, and then combine the results through certain analytical considerations. The validity of the proposed methods is proved and their finite sample properties are studied through a moderately large simulation study. The methodology is illustrated with a real data example from transportation engineering, which motivated the development of the proposed methods.


62G09 Nonparametric statistical resampling methods
62P30 Applications of statistics in engineering and industry; control charts
90B06 Transportation, logistics and supply chain management
62H12 Estimation in multivariate analysis
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
65C60 Computational problems in statistics (MSC2010)
Full Text: DOI arXiv Euclid


[1] Bell, M. G. H. (1991). The estimation of origin-destination matrices by constrained generalised least squares. Transportation Res. 25B 13-22. · doi:10.1016/0191-2615(91)90010-G
[2] Cascetta, E. (1984). Estimation of trip matrices from traffic counts and survey data: A generalized least squares estimator. Transportation Res. 18B 289-299.
[3] Cremer, M. and Keller, H. (1987). A new class of dynamic methods for identification of origin-destination flows. Transportation Res. 21B 117-132.
[4] Dixon, M. P. and Rilett, L. R. (2000). Real-time origin-destination estimation using automatic vehicle identification data. In Proceedings of the 79 th Annual Meeting of the Transportation Research Board CD-ROM . Washington, DC.
[5] Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Ann. Statist. 7 1-26. · Zbl 0406.62024 · doi:10.1214/aos/1176344552
[6] Gajewski, B. J., Rilett, L. R., Dixon, P. M. and Spiegelman, C. H. (2002). Robust estimation of origin-destination matrices. Journal of Transportation and Statistics 5 37-56.
[7] Hall, P. (1992). The Bootstrap and Edgeworth Expansion . Springer, New York. · Zbl 0744.62026
[8] Hall, P. and Jing, B. (1996). On sample reuse methods for dependent data. J. Roy. Statist. Soc. Ser. B 58 727-737. · Zbl 0860.62037
[9] Ibragimov, I. A. and Linnik, Y. V. (1971). Independent and Stationary Sequences of Random Variables . Wolters-Noordhoff Publishing, Groningen. · Zbl 0219.60027
[10] Koul, H. L. (2002). Weighted Empirical Processes in Dynamic Nonlinear Models. Lecture Notes in Statistics 166 . Springer, New York. · Zbl 1007.62047 · doi:10.1007/978-1-4613-0055-7
[11] Koul, H. L. and Mukherjee, K. (1993). Asymptotics of \(R\)-, \(\mathrm{MD}\)- and \(\mathrm{LAD}\)-estimators in linear regression models with long range dependent errors. Probab. Theory Related Fields 95 535-553. · Zbl 0794.60020 · doi:10.1007/BF01196733
[12] Künsch, H. R. (1989). The jackknife and the bootstrap for general stationary observations. Ann. Statist. 17 1217-1241. · Zbl 0684.62035 · doi:10.1214/aos/1176347265
[13] Lahiri, S. N. (1999). Theoretical comparisons of block bootstrap methods. Ann. Statist. 27 386-404. · Zbl 0945.62049 · doi:10.1214/aos/1018031117
[14] Lahiri, S. N. (2003). Resampling Methods for Dependent Data . Springer, New York. · Zbl 1028.62002
[15] Mannering, F. L., Washburn, S. S. and Kilareski, W. P. (2009). Principles of Highway Engineering and Traffic Analysis , 4th ed. Wiley, Hoboken, NJ.
[16] Okutani, I. (1987). The Kalman filtering approaches in some transportation and traffic problems. In Transportation and Traffic Theory ( Cambridge , MA , 1987) 397-416. Elsevier, New York.
[17] Patton, A., Politis, D. N. and White, H. (2009). Correction to “Automatic block-length selection for the dependent bootstrap” by D. Politis and H. White. Econometric Rev. 28 372-375. · Zbl 1400.62193 · doi:10.1080/07474930802459016
[18] Politis, D. N. and Romano, J. P. (1994). Large sample confidence regions based on subsamples under minimal assumptions. Ann. Statist. 22 2031-2050. · Zbl 0828.62044 · doi:10.1214/aos/1176325770
[19] Roess, R. P., Prassas, E. S. and McShane, W. R. (2004). Traffic Engineering , 3rd ed. Prentice Hall, Englewood Cliffs, NJ.
[20] Serfling, R. J. (1980). Approximation Theorems of Mathematical Statistics . Wiley, New York. · Zbl 0538.62002
[21] Singh, K. (1981). On the asymptotic accuracy of Efron’s bootstrap. Ann. Statist. 9 1187-1195. · Zbl 0494.62048 · doi:10.1214/aos/1176345636
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.