
High-dimensional inference in misspecified linear models. (English) Zbl 1327.62420

Summary: We consider high-dimensional inference when the assumed linear model is misspecified. We describe some correct interpretations and corresponding sufficient assumptions for valid asymptotic inference of the model parameters, which still have a useful meaning when the model is misspecified. We largely focus on the de-sparsified Lasso procedure but we also indicate some implications for (multiple) sample splitting techniques. In view of available methods and software, our results contribute to robustness considerations with respect to model misspecification.


62J07 Ridge regression; shrinkage estimators (Lasso)
62F25 Parametric tolerance and confidence regions


hdi; covTest; PDCO


[1] Belloni, A., Chen, D., Chernozhukov, V., and Hansen, C. (2012). Sparse models and methods for optimal instruments with an application to eminent domain., Econometrica , 80:2369-2429. · Zbl 1274.62464 · doi:10.3982/ECTA9626
[2] Belloni, A., Chernozhukov, V., and Wang, L. (2011). Square-root Lasso: Pivotal recovery of sparse signals via conic programming., Biometrika , 98:791-806. · Zbl 1228.62083 · doi:10.1093/biomet/asr043
[3] Benjamini, Y. and Yekutieli, D. (2005). False discovery rate-adjusted multiple confidence intervals for selected parameters., Journal of the American Statistical Association , 100:71-81. · Zbl 1117.62302 · doi:10.1198/016214504000001907
[4] Brown, L. (1990). An ancillarity paradox which appears in multiple linear regression., Annals of Statistics , 18:471-493. · Zbl 0721.62011 · doi:10.1214/aos/1176347602
[5] Bühlmann, P. (2013). Statistical significance in high-dimensional linear models., Bernoulli , 19:1212-1242. · Zbl 1029.62040 · doi:10.1214/aos/1028674845
[6] Bühlmann, P. and van de Geer, S. (2011)., Statistics for High-Dimensional Data: Methods, Theory and Applications . Springer. · Zbl 1273.62015 · doi:10.1007/978-3-642-20192-9
[7] Candès, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n (with discussion)., Annals of Statistics , 35:2313-2404. · Zbl 1139.62019 · doi:10.1214/009053606000001523
[8] Candès, E. J. and Tao, T. (2006). Near-optimal signal recovery from random projections: Universal encoding strategies?, IEEE Transactions on Information Theory , 52:5406-5425. · Zbl 1309.94033 · doi:10.1109/TIT.2006.885507
[9] Chen, S. S., Donoho, D. L., and Saunders, M. A. (1998). Atomic decomposition by basis pursuit., SIAM Journal on Scientific Computing , 20:33-61. · Zbl 0919.94002 · doi:10.1137/S1064827596304010
[10] Dezeure, R., Bühlmann, P., Meier, L., and Meinshausen, N. (2014). High-dimensional inference: confidence intervals, p-values and R-software hdi. To appear in Statistical Science; Preprint, · Zbl 1426.62183
[11] Donoho, D. L. (2006). Compressed sensing., IEEE Transactions on Information Theory , 52:1289-1306. · Zbl 1288.94016 · doi:10.1109/TIT.2006.871582
[12] Eicker, F. (1967). Limit theorems for regressions with unequal and dependent errors. In, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability , volume 1, pages 59-82. · Zbl 0217.51201
[13] Foygel Barber, R. and Candès, E. (2014). Controlling the false discovery rate via knockoffs. To appear in the Annals of Statistics; Preprint, · Zbl 1327.62082
[14] Freedman, D. A. et al. (1981). Bootstrapping regression models., Annals of Statistics , 9:1218-1228. · Zbl 0449.62046 · doi:10.1214/aos/1176345638
[15] Friedman, J. H. (1991). Multivariate adaptive regression splines (with discussion)., Annals of Statistics , 19:1-67. · Zbl 0765.62064 · doi:10.1214/aos/1176347963
[16] Ghosh, M., Reid, N., and Fraser, D. (2010). Ancillary statistics: A review., Statistica Sinica , 20:1309-1332. · Zbl 1200.62001
[17] Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. In, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability , volume 1, pages 221-233. · Zbl 0212.21504
[18] Jankova, J. and van de Geer, S. (2015). Confidence intervals for high-dimensional inverse covariance estimation., Electronic Journal of Statistics , 9:1205-1229. · Zbl 1328.62458 · doi:10.1214/15-EJS1031
[19] Javanmard, A. and Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression., Journal of Machine Learning Research , 15:2869-2909. · Zbl 1319.62145
[20] Lockhart, R., Taylor, J., Tibshirani, R. J., and Tibshirani, R. (2014). A significance test for the lasso (with discussion)., Annals of Statistics , 42:413-468. · Zbl 1305.62254 · doi:10.1214/13-AOS1175
[21] Meier, L., Meinshausen, N., and Dezeure, R. (2014)., hdi: High-Dimensional Inference . R package version 0.1-2.
[22] Meinshausen, N. (2015). Group-bound: confidence intervals for groups of variables in sparse high-dimensional regression without assumptions on the design. To appear in the Journal of the Royal Statistical Society; Preprint,
[23] Meinshausen, N. and Bühlmann, P. (2010). Stability selection (with discussion)., Journal of the Royal Statistical Society, Series B , 72:417-473. · doi:10.1111/j.1467-9868.2010.00740.x
[24] Meinshausen, N., Meier, L., and Bühlmann, P. (2009). P-values for high-dimensional regression., Journal of the American Statistical Association , 104:1671-1681. · Zbl 1205.62089 · doi:10.1198/jasa.2009.tm08647
[25] Minnier, J., Tian, L., and Cai, T. (2011). A perturbation method for inference on regularized regression estimates., Journal of the American Statistical Association , 106:1371-1382. · Zbl 1323.62076 · doi:10.1198/jasa.2011.tm10382
[26] Ren, Z., Sun, T., Zhang, C.-H., and Zhou, H. (2015). Asymptotic normality and optimalities in estimation of large Gaussian graphical model., Annals of Statistics , 43:991-1026. · Zbl 1328.62342 · doi:10.1214/14-AOS1286
[27] Taylor, J., Lockhart, R., Tibshirani, R. J., and Tibshirani, R. (2014). Exact post-selection inference for forward stepwise and least angle regression. Preprint, · Zbl 1305.62255
[28] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso., Journal of the Royal Statistical Society, Series B , 58:267-288. · Zbl 0850.62538
[29] van de Geer, S. (2015a). \(\chi^2\)-confidence sets in high-dimensional regression. Preprint, · Zbl 0418.14021 · doi:10.1016/0040-9383(79)90012-0
[30] van de Geer, S. (2015b). Estimation and testing under sparsity. Lecture Notes École d’Été de Probabilités de Saint-Flour. Springer. To, appear.
[31] van de Geer, S., Bühlmann, P., Ritov, Y., and Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models., Annals of Statistics , 42:1166-1202. · Zbl 1305.62259 · doi:10.1214/14-AOS1221
[32] Wasserman, L. (2014). Discussion: “A significance test for the Lasso”., Annals of Statistics , 42:501-508. · Zbl 1305.62257 · doi:10.1214/13-AOS1175E
[33] Wasserman, L. and Roeder, K. (2009). High dimensional variable selection., Annals of Statistics , 37:2178-2201. · Zbl 1173.62054 · doi:10.1214/08-AOS646
[34] White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity., Econometrica: Journal of the Econometric Society , 48:817-838. · Zbl 0459.62051 · doi:10.2307/1912934
[35] Zhang, C.-H. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models., Journal of the Royal Statistical Society, Series B , 76:217-242. · doi:10.1111/rssb.12026
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.