×

Can we trust the bootstrap in high-dimensions? The case of linear models. (English) Zbl 1444.62039

Summary: We consider the performance of the bootstrap in high-dimensions for the setting of linear regression, where \(p< n\) but \(p/n\) is not close to zero. We consider ordinary least-squares as well as robust regression methods and adopt a minimalist performance requirement: can the bootstrap give us good confidence intervals for a single coordinate of \(\beta\) (where \(\beta\) is the true regression vector)?
We show through a mix of numerical and theoretical work that the bootstrap is fraught with problems. Both of the most commonly used methods of bootstrapping for regression – residual bootstrap and pairs bootstrap – give very poor inference on \(\beta\) as the ratio \(p/n\) grows. We find that the residual bootstrap tend to give anti-conservative estimates (inflated Type I error), while the pairs bootstrap gives very conservative estimates (severe loss of power) as the ratio \(p/n\) grows. We also show that the jackknife resampling technique for estimating the variance of \(\hat{\beta}\) severely overestimates the variance in high dimensions.
We contribute alternative procedures based on our theoretical results that result in dimensionality adaptive and robust bootstrap methods.

MSC:

62F40 Bootstrap, jackknife and other resampling methods
62F25 Parametric tolerance and confidence regions
62J05 Linear regression; mixed models
60B20 Random matrices (probabilistic aspects)
PDF BibTeX XML Cite
Full Text: arXiv Link

References:

[1] M. ApS. The MOSEK optimization toolbox for MATLAB manual. Version 7.1 (Revision 28)., 2015. URL http://docs.mosek.com/7.1/toolbox/index.html.
[2] D. Bean, P. J. Bickel, N. El Karoui, and B. Yu. Optimal M-estimation in high-dimensional regression. Proceedings of the National Academy of Sciences, 110(36):14563–14568, 2013.
[3] A. Belloni, V. Chernozhukov, and K. Kato.Uniform post-selection inference for least absolute deviation regression and other Z-estimation problems. Biometrika, 102(1):77–94, March 2015. · Zbl 1345.62049
[4] R. Beran and M. S. Srivastava. Bootstrap tests and confidence regions for functions of a covariance matrix. Ann. Statist., 13(1):95–115, 1985. · Zbl 0607.62048
[5] P. J. Bickel and D. A. Freedman. Bootstrapping regression models with many parameters. In A Festschrift for Erich L. Lehmann, Wadsworth Statist./Probab. Ser., pages 28–48. Wadsworth, Belmont, Calif., 1983.
[6] P. J. Bickel, F. G¨otze, and W. R. van Zwet. Resampling fewer than n observations: gains, losses, and remedies for losses. Statist. Sinica, 7(1):1–31, 1997. Empirical Bayes, sequential analysis and related topics in statistics and probability (New Brunswick, NJ, 1995). · Zbl 0927.62043
[7] P. J. Bickel and D. A. Freedman. Some asymptotic theory for the bootstrap. Ann. Statist., 9(6):1196–1217, 1981. · Zbl 0449.62034
[8] O. Chapelle, E. Manavoglu, and R. Rosales. Simple and scalable response prediction for display advertising. ACM Trans. Intell. Syst. Technol., 5(4):61:1–61:34, December 2014.
[9] A. Chatterjee and S. N. Lahiri.ASYMPTOTIC PROPERTIES OF THE RESIDUAL BOOTSTRAP FOR LASSO ESTIMATORS. Proceedings of the American Mathematical Society, 138(12):4497–4509, December 2010.
[10] A. Chatterjee and S. N. Lahiri. Rates of convergence of the Adaptive LASSO estimators to the Oracle distribution and higher order refinements by the bootstrap. The Annals of Statistics, 41(3):1232–1259, June 2013. · Zbl 1293.62153
[11] A. Chatterjee and S. Lahiri. Bootstrapping lasso estimators. Journal of the American Statistical Association, 106(494):608–625, 2011. · Zbl 1232.62088
[12] M. R. Chernick. Bootstrap Methods: A Practitioner’s Guide. Wiley, 1999. · Zbl 0932.62035
[13] Criteo. Criteo publicly available datasets, 2017. URL http://research.criteo.com/ outreach/.
[14] A. C. Davison and D. V. Hinkley. Bootstrap methods and their application. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 1997. 62 · Zbl 0886.62001
[15] A. Delaigle and I. Gijbels. Estimation of integrated squared density derivatives from a contaminated sample. Journal of the Royal Statistical Society, B, 64:869–886, 2002. · Zbl 1067.62034
[16] A. Delaigle and I. Gijbels. Practical bandwidth selection in deconvolution kernel density estimation. Computational Statistics and Data Analysis, 45:249 – 267, 2004. · Zbl 1429.62125
[17] A. Delaigle.Nonparametric kernel methods with errors-in-variables: constructing estimators, computing them, and avoiding common mistakes. Aust. N. Z. J. Stat., 56(2): 105–124, 2014. · Zbl 1334.62006
[18] R. Dezeure, P. B¨uhlmann, and C.-H. Zhang. High-dimensional simultaneous inference with the bootstrap. TEST, 26(4):685–719, October 2017. · Zbl 06833591
[19] P. Diaconis and D. Freedman. Asymptotics of graphical projection pursuit. Ann. Statist., 12(3):793–815, 1984. · Zbl 0559.62002
[20] D. Donoho and A. Montanari. High dimensional robust m-estimation: Asymptotic variance via approximate message passing. arXiv:1310.7320, 2013. · Zbl 1357.62220
[21] M. L. Eaton and D. E. Tyler. On Wielandt’s inequality and its application to the asymptotic distribution of the eigenvalues of a random symmetric matrix. Ann. Statist., 19(1):260– 271, 1991. · Zbl 0742.62015
[22] B. Efron. Bootstrap methods: another look at the jackknife. Ann. Statist., 7(1):1–26, 1979. · Zbl 0406.62024
[23] B. Efron. The jackknife, the bootstrap and other resampling plans, volume 38 of CBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, Pa., 1982. · Zbl 0496.62036
[24] B. Efron and C. Stein. The jackknife estimate of variance. Ann. Statist., 9(3):586–596, 1981. · Zbl 0481.62035
[25] B. Efron and R. J. Tibshirani. An introduction to the bootstrap, volume 57 of Monographs on Statistics and Applied Probability. Chapman and Hall, New York, 1993.
[26] N. El Karoui. Concentration of measure and spectra of random matrices: Applications to correlation matrices, elliptical distributions and beyond. The Annals of Applied Probability, 19(6):2362–2405, December 2009. · Zbl 1255.62156
[27] N. El Karoui. High-dimensionality effects in the Markowitz problem and other quadratic programs with linear constraints: risk underestimation. Ann. Statist., 38(6):3487–3566, 2010. · Zbl 1274.62365
[28] N. El Karoui. On the realized risk of high-dimensional markowitz portfolios. SIAM Journal in Financial Mathematics, 4(1), 2013. · Zbl 1358.91092
[29] N. El Karoui. Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators: rigorous results. arXiv:1311.2445, 2013. ArXiv:1311.2445. 63
[30] N. El Karoui. On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators. Probability Theory and Related Fields, 2017.
[31] N. El Karoui and H. Koesters. Geometric sensitivity of random matrix results: consequences for shrinkage estimators of covariance and related statistical methods. Submitted to Bernoulli, 2011. Available at arXiv:1105.1404 (68 pages).
[32] N. El Karoui, D. Bean, P. Bickel, C. Lim, and B. Yu. On robust regression with highdimensional predictors. Technical Report 811, UC, Berkeley, Department of Statistics, 2011. Originally submitted as manuscript AoS1111-009. Not under consideration anymore. · Zbl 1359.62184
[33] N. El Karoui, D. Bean, P. J. Bickel, C. Lim, and B. Yu. On robust regression with highdimensional predictors. Proceedings of the National Academy of Sciences, 2013. · Zbl 1359.62184
[34] J. Fan. On the optimal rates of convergence for nonparametric deconvolution problems. Ann. Statist., 19(3):1257–1272, 1991. · Zbl 0729.62033
[35] M. Grant and S. Boyd. Graph implementations for nonsmooth convex programs. In V. Blondel, S. Boyd, and H. Kimura, editors, Recent Advances in Learning and Control, Lecture Notes in Control and Information Sciences, pages 95–110. Springer-Verlag Limited, 2008. http://stanford.edu/ boyd/graph_dcp.html. · Zbl 1205.90223
[36] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx, March 2014.
[37] L. R. Haff. An identity for the Wishart distribution with applications. J. Multivariate Anal., 9(4):531–544, 1979. · Zbl 0423.62036
[38] P. Hall. The bootstrap and Edgeworth expansion. Springer Series in Statistics. SpringerVerlag, New York, 1992.
[39] P. Hall and S. Lahiri. Estimation of distributions, moments and quantiles in deconvolution problems. The Annals of Statistics, 36(5):2110–2134, 2008. · Zbl 1148.62028
[40] P. Hall, J. S. Marron, and A. Neeman. Geometric representation of high dimension, low sample size data. J. R. Stat. Soc. Ser. B Stat. Methodol., 67(3):427–444, 2005. · Zbl 1069.62097
[41] J.-B. Hiriart-Urruty and C. Lemar´echal. Fundamentals of convex analysis. Grundlehren Text Editions. Springer-Verlag, Berlin, 2001. Abridged version of ıt Convex analysis and minimization algorithms. I [Springer, Berlin, 1993; MR1261420 (95m:90001)] and ıt II [ibid.; MR1295240 (95m:90002)].
[42] R. A. Horn and C. R. Johnson. Matrix analysis. Cambridge University Press, Cambridge, 1990. Corrected reprint of the 1985 original. · Zbl 0704.15002
[43] P. J. Huber. Robust regression: asymptotics, conjectures and Monte Carlo. Ann. Statist., 1:799–821, 1973. 64 · Zbl 0289.62033
[44] P. J. Huber and E. M. Ronchetti. Robust statistics. Wiley Series in Probability and Statistics. John Wiley & Sons Inc., Hoboken, NJ, second edition, 2009.
[45] I. Johnstone. On the distribution of the largest eigenvalue in principal component analysis. Ann. Statist., 29(2):295–327, 2001. · Zbl 1016.62078
[46] T. Kato. Perturbation theory for linear operators. Classics in Mathematics. Springer-Verlag, Berlin, 1995. Reprint of the 1980 edition. · Zbl 0836.47009
[47] A. Kleiner, A. Talwalkar, P. Sarkar, and M. I. Jordan. A scalable bootstrap for massive data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(4): 795–816, 2014.
[48] R. Koenker. Quantile regression, volume 38 of Econometric Society Monographs. Cambridge University Press, Cambridge, 2005. · Zbl 1111.62037
[49] R. Koenker. quantreg: Quantile Regression, 2013. URL http://CRAN.R-project.org/ package=quantreg. R package version 5.05. · Zbl 1432.62097
[50] J. Langford, L. Li, and A. Strehl, 2007. URL https://github.com/JohnLangford/vowpal_ wabbit/wiki.
[51] M. Ledoux. The concentration of measure phenomenon, volume 89 of Mathematical Surveys and Monographs. American Mathematical Society, Providence, RI, 2001.
[52] M. Lopes. A Residual Bootstrap for High-Dimensional Regression with Near Low-Rank Designs. In Advances in Neural Information Processing Systems NIPS, pages 3239–3247, 2014.
[53] E. Mammen. Asymptotics with increasing dimension for robust regression with applications to the bootstrap. Ann. Statist., 17(1):382–400, 1989. · Zbl 0674.62017
[54] E. Mammen. Bootstrap, wild bootstrap, and asymptotic normality. Probab. Theory Related Fields, 93(4):439–455, 1992. · Zbl 0766.62021
[55] E. Mammen.Bootstrap and wild bootstrap for high-dimensional linear models.Ann. Statist., 21(1):255–285, 1993. · Zbl 0771.62032
[56] K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate analysis. Academic Press [Harcourt Brace Jovanovich Publishers], London, 1979. Probability and Mathematical Statistics: A Series of Monographs and Textbooks. · Zbl 0432.62029
[57] J. W. McKean, S. J. Sheather, and T. P. Hettmansperger. The Use and Interpretation of Residuals Based on Robust Estimation. Journal of the American Statistical Association, 88(424):1254–1263, December 1993. · Zbl 0792.62061
[58] P. D. Miller. Applied asymptotic analysis, volume 75 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2006. · Zbl 1101.41031
[59] J.-J. Moreau. Proximit´e et dualit´e dans un espace hilbertien. Bull. Soc. Math. France, 93: 273–299, 1965. 65 · Zbl 0136.12101
[60] MOSEK. Rmosek: The R to MOSEK Optimization Interface, 2014. URL http://rmosek. r-forge.r-project.org/,http://www.mosek.com/. R package version 7.0.5.
[61] A. Pajor and L. Pastur. On the limiting empirical measure of eigenvalues of the sum of rank one matrices with log-concave distribution. Studia Math., 195(1):11–29, 2009. · Zbl 1178.15023
[62] M. I. Parzen, L. J. Wei, and Z. Ying. A resampling method based on pivotal estimating functions. Biometrika, 81(2):341–350, 1994. · Zbl 0807.62038
[63] D. N. Politis, J. P. Romano, and M. Wolf. Subsampling. Springer Series in Statistics. Springer-Verlag, New York, 1999.
[64] S. Portnoy. Asymptotic behavior of M -estimators of p regression parameters when p2/n is large. I. Consistency. Ann. Statist., 12(4):1298–1309, 1984. · Zbl 0584.62050
[65] S. Portnoy. Asymptotic behavior of M estimators of p regression parameters when p2/n is large. II. Normal approximation. Ann. Statist., 13(4):1403–1417, 1985. · Zbl 0601.62026
[66] S. Portnoy. Asymptotic behavior of the empiric distribution of M -estimated residuals from a regression model with many parameters. Ann. Statist., 14(3):1152–1170, 1986. · Zbl 0612.62072
[67] S. Portnoy. A central limit theorem applicable to robust regression estimators. J. Multivariate Anal., 22(1):24–50, 1987. · Zbl 0626.62033
[68] G. R. Shorack. Bootstrapping robust regression. Comm. Statist. A—Theory Methods, 11 (9):961–972, 1982. · Zbl 0523.62033
[69] J. W. Silverstein. Strong convergence of the empirical distribution of eigenvalues of largedimensional random matrices. J. Multivariate Anal., 55(2):331–339, 1995. · Zbl 0851.62015
[70] D. W. Stroock. Probability theory, an analytic view. Cambridge University Press, Cambridge, 1993. · Zbl 0925.60004
[71] A. W. van der Vaart. Asymptotic statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 1998.
[72] W. N. Venables and B. D. Ripley. Modern Applied Statistics with S. Springer, New York, fourth edition, 2002. ISBN 0-387-95457-0. · Zbl 1006.62003
[73] K. W. Wachter. The strong limits of random matrix spectra for sample matrices of independent elements. Annals of Probability, 6(1):1–18, 1978. · Zbl 0374.60039
[74] X. Wang and B. Wang. Deconvolution estimation in measurement error models: The r package decon. Journal of Statistical Software, 39(10):1–24, 2011.
[75] S. Weisberg. Applied linear regression. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc., Hoboken, NJ, fourth edition, 2014.
[76] C.-F. J. Wu. Jackknife, bootstrap and other resampling methods in regression analysis. Ann. Statist., 14(4):1261–1350, 1986. With discussion and a rejoinder by the author. · Zbl 0618.62072
[77] S. Zheng, D. Jiang, Z. Bai, and X. He. Inference on multiple correlation coefficients with moderately high dimensional data. Biometrika, 101:748–754, 2014. · Zbl 1336.62157
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.