Can we trust the bootstrap in high-dimensions? The case of linear models.(English)Zbl 1444.62039

Summary: We consider the performance of the bootstrap in high-dimensions for the setting of linear regression, where $$p< n$$ but $$p/n$$ is not close to zero. We consider ordinary least-squares as well as robust regression methods and adopt a minimalist performance requirement: can the bootstrap give us good confidence intervals for a single coordinate of $$\beta$$ (where $$\beta$$ is the true regression vector)?
We show through a mix of numerical and theoretical work that the bootstrap is fraught with problems. Both of the most commonly used methods of bootstrapping for regression – residual bootstrap and pairs bootstrap – give very poor inference on $$\beta$$ as the ratio $$p/n$$ grows. We find that the residual bootstrap tend to give anti-conservative estimates (inflated Type I error), while the pairs bootstrap gives very conservative estimates (severe loss of power) as the ratio $$p/n$$ grows. We also show that the jackknife resampling technique for estimating the variance of $$\hat{\beta}$$ severely overestimates the variance in high dimensions.
We contribute alternative procedures based on our theoretical results that result in dimensionality adaptive and robust bootstrap methods.

 62F40 Bootstrap, jackknife and other resampling methods 62F25 Parametric tolerance and confidence regions 62J05 Linear regression; mixed models 60B20 Random matrices (probabilistic aspects)
