# zbMATH — the first resource for mathematics

Models as approximations. I. Consequences illustrated with linear regression. (English) Zbl 1440.62020
Summary: In the early 1980s, Halbert White inaugurated a “model-robust” form of statistical inference based on the “sandwich estimator” of standard error. This estimator is known to be “heteroskedasticity-consistent”, but it is less well known to be “nonlinearity-consistent” as well. Nonlinearity, however, raises fundamental issues because in its presence regressors are not ancillary, hence cannot be treated as fixed. The consequences are deep: (1) population slopes need to be reinterpreted as statistical functionals obtained from OLS fits to largely arbitrary joint $$x$$-$$y$$ distributions; (2) the meaning of slope parameters needs to be rethought; (3) the regressor distribution affects the slope parameters; (4) randomness of the regressors becomes a source of sampling variability in slope estimates of order $$1/\sqrt{N}$$; (5) inference needs to be based on model-robust standard errors, including sandwich estimators or the $$x$$-$$y$$ bootstrap. In theory, model-robust and model-trusting standard errors can deviate by arbitrary magnitudes either way. In practice, significant deviations between them can be detected with a diagnostic test.
For Part II, see [Zbl 1440.62021].
Reviewer: Reviewer (Berlin)

##### MSC:
 62A01 Foundations and philosophical topics in statistics 62J05 Linear regression; mixed models 62P20 Applications of statistics to economics 62F40 Bootstrap, jackknife and other resampling methods 62F35 Robustness and adaptive procedures (parametric inference)
bootstrap; R
Full Text:
##### References:
 [1] Berk, R. H. (1966). Limiting behavior of posterior distributions when the model is incorrect. Ann. Math. Stat. 37 51-58. · Zbl 0151.23802 [2] Berk, R. H. (1970). Consistency a posteriori. Ann. Math. Stat. 41 894-906. · Zbl 0214.45703 [3] Berk, R., Brown, L., Buja, A., Zhang, K. and Zhao, L. (2013). Valid post-selection inference. Ann. Statist. 41 802-837. · Zbl 1267.62080 [4] Berk, R., Kriegler, B. and Ylvisaker, D. (2008). Counting the homeless in Los Angeles county. In Probability and Statistics: Essays in Honor of David A. Freedman. Inst. Math. Stat. (IMS) Collect. 2 127-141. IMS, Beachwood, OH. · Zbl 1166.62381 [5] Berman, M. (1988). A theorem of Jacobi and its generalization. Biometrika 75 779-783. · Zbl 0653.62054 [6] Bickel, P. J., Götze, F. and van Zwet, W. R. (1997). Resampling fewer than $$n$$ observations: Gains, losses, and remedies for losses. Statist. Sinica 7 1-31. · Zbl 0927.62043 [7] Box, G. E. P. (1979). Robustness in the strategy of scientific model building. In Robustness in Statistics: Proceedings of a Workshop (R. L. Launer and G. N. Wilkinson, eds.) Academic Press (Elsevier), Amsterdam. [8] Bühlmann, P. and van de Geer, S. (2015). High-dimensional inference in misspecified linear models. Electron. J. Stat. 9 1449-1473. · Zbl 1327.62420 [9] Buja, A., Brown, L., Berk, R., George, E., Pitkin, E., Traskin, M., Zhang, K., and Zhao, L. (2019). Supplement to “Models as Approximations I: Consequences Illustrated with Linear Regression.” 10.1214/18-STS693SUPP. [10] Cox, D. R. (1962). Further results on tests of separate families of hypotheses (1962). J. Roy. Statist. Soc. Ser. B 24 406-424. · Zbl 0131.35801 [11] Cox, D. R. (1995). Discussion of Chatfield (1995). J. Roy. Statist. Soc. Ser. A 158 455-456. [12] Cox, D. R. and Hinkley, D. V. (1974). Theoretical Statistics. CRC Press, London. · Zbl 0334.62003 [13] Davies, L. (2014). Data Analysis and Approximate Models. Monographs on Statistics and Applied Probability 133. CRC Press, Boca Raton, FL. · Zbl 1360.62007 [14] De Blasi, P. (2013). Discussion on article “Bayesian inference with misspecified models” by Stephen G. Walker. J. Statist. Plann. Inference 143 1634-1637. · Zbl 1432.62052 [15] Diggle, P. J., Heagerty, P. J., Liang, K.-Y. and Zeger, S. L. (2002). Analysis of Longitudinal Data, 2nd ed. Oxford Statistical Science Series 25. Oxford Univ. Press, Oxford. · Zbl 1031.62002 [16] Donoho, D. and Montanari, A. (2014). Variance Breakdown of Huber (M)-estimators: $$n/p\rightarrow m\in(1,\infty)$$. Available at arXiv:1503.02106. [17] Efron, B. (1982). The Jackknife, the Bootstrap and Other Resampling Plans. CBMS-NSF Regional Conference Series in Applied Mathematics 38. SIAM, Philadelphia, PA. · Zbl 0496.62036 [18] Efron, B. and Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Monographs on Statistics and Applied Probability 57. CRC Press, New York. · Zbl 0835.62038 [19] Eicker, F. (1963). Asymptotic normality and consistency of the least squares estimators for families of linear regressions. Ann. Math. Stat. 34 447-456. · Zbl 0111.34003 [20] El Karoui, N., Bean, D., Bickel, P. and Yu, B. (2013). Optimal M-estimation in high-dimensional regression. Proc. Natl. Acad. Sci. 110 14563-14568. [21] Freedman, D. A. (1981). Bootstrapping regression models. Ann. Statist. 9 1218-1228. · Zbl 0449.62046 [22] Freedman, D. A. (2006). On the so-called “Huber sandwich estimator” and “robust standard errors”. Amer. Statist. 60 299-302. [23] Gelman, A. and Park, D. K. (2009). Splitting a predictor at the upper quarter or third and the lower quarter or third. Amer. Statist. 63 1-8. [24] Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer Series in Statistics. Springer, New York. · Zbl 0744.62026 [25] Hall, A. R. (2005). Generalized Method of Moments. Advanced Texts in Econometrics. Oxford Univ. Press, Oxford. [26] Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel, W. A. (1986). Robust Statistics: The Approach Based on Influence Functions. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. Wiley, New York. · Zbl 0593.62027 [27] Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica 50 1029-1054. · Zbl 0502.62098 [28] Hartigan, J. A. (1983). Asymptotic normality of posterior distributions. In Bayes Theory. Springer Series in Statistics 107-118. Springer, New York, NY. [29] Hausman, J. A. (1978). Specification tests in econometrics. Econometrica 46 1251-1271. · Zbl 0397.62043 [30] Hinkley, D. V. (1977). Jacknifing in unbalanced situations. Technometrics 19 285-292. · Zbl 0367.62085 [31] Hoff, P. and Wakefield, J. (2013). Bayesian sandwich posteriors for pseudo-true parameters. A discussion of “Bayesian inference with misspecified models” by Stephen Walker. J. Statist. Plann. Inference 143 1638-1642. · Zbl 1432.62055 [32] Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66), Vol. I: Statistics 221-233. Univ. California Press, Berkeley, CA. [33] Huber, P. J. and Ronchetti, E. M. (2009). Robust Statistics, 2nd ed. Wiley Series in Probability and Statistics. Wiley, Hoboken, NJ. · Zbl 1276.62022 [34] Kauermann, G. and Carroll, R. J. (2001). A note on the efficiency of sandwich covariance matrix estimation. J. Amer. Statist. Assoc. 96 1387-1396. · Zbl 1073.62539 [35] Krasker, W. S. and Welsch, R. E. (1982). Efficient bounded-influence regression estimation. J. Amer. Statist. Assoc. 77 595-604. · Zbl 0501.62062 [36] Lee, J. D., Sun, D. L., Sun, Y. and Taylor, J. E. (2016). Exact post-selection inference, with application to the lasso. Ann. Statist. 44 907-927. · Zbl 1341.62061 [37] Lehmann, E. L. and Romano, J. P. (2008). Testing Statistical Hypotheses, 3rd ed. Springer Texts in Statistics. Springer, New York. · Zbl 1076.62018 [38] Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73 13-22. · Zbl 0595.62110 [39] Loh, P.-L. (2017). Statistical consistency and asymptotic normality for high-dimensional robust $$M$$-estimators. Ann. Statist. 45 866-896. · Zbl 1371.62023 [40] Long, J. S. and Ervin, L. H. (2000). Using heteroscedasticity consistent standard errors in the linear model. Amer. Statist. 54 217-224. [41] MacKinnon, J. and White, H. (1985). Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties. J. Econometrics 29 305-325. [42] Mammen, E. (1993). Bootstrap and wild bootstrap for high-dimensional linear models. Ann. Statist. 21 255-285. · Zbl 0771.62032 [43] Mammen, E. (1996). Empirical process of residuals for high-dimensional linear models. Ann. Statist. 24 307-335. · Zbl 0853.62042 [44] McCarthy, D., Zhang, K., Brown, L. D., Berk, R., Buja, A., George, E. I. and Zhao, L. (2018). Calibrated percentile double bootstrap for robust linear regression inference. Statist. Sinica 28 2565-2589. · Zbl 1406.62076 [45] Newey, W. K. and West, K. D. (1987). A simple, positive semidefinite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55 703-708. · Zbl 0658.62139 [46] O’Hagan, A. (2013). Bayesian inference with misspecified models: Inference about what?. J. Statist. Plann. Inference 143 1643-1648. · Zbl 1432.62058 [47] Owen, A. B. (2001). Empirical Likelihood. Chapman & Hall/CRC, Boca Raton, FL. · Zbl 0989.62019 [48] Politis, D. N. and Romano, J. P. (1994). Large sample confidence regions based on subsamples under minimal assumptions. Ann. Statist. 22 2031-2050. · Zbl 0828.62044 [49] R Development Core Team (2008). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Available at http://www.R-project.org. [50] Stigler, S. M. (2001). Ancillary history. In State of the Art in Probability and Statistics: Festschrift for W. R. van Zwet (M. DeGunst, C. Klaassen and A. van der Vaart, eds.) 555-567. · Zbl 1373.62013 [51] Szpiro, A. A., Rice, K. M. and Lumley, T. (2010). Model-robust regression and a Bayesian “sandwich” estimator. Ann. Appl. Stat. 4 2099-2113. · Zbl 1220.62025 [52] Walker, S. G. (2013). Bayesian inference with misspecified models. J. Statist. Plann. Inference 143 1621-1633. · Zbl 1279.62066 [53] Wasserman, L. (2011). Low assumptions, high dimensions. Ration. Mark. Moral. 2 201-209. [54] Weber, N. C. (1986). The jackknife and heteroskedasticity: Consistent variance estimation for regression models. Econom. Lett. 20 161-163. · Zbl 1328.62450 [55] White, H. (1980a). Using least squares to approximate unknown regression functions. Internat. Econom. Rev. 21 149-170. · Zbl 0444.62119 [56] White, H. (1980b). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48 817-838. · Zbl 0459.62051 [57] White, H. (1981). Consequences and detection of misspecified nonlinear regression models. J. Amer. Statist. Assoc. 76 419-433. · Zbl 0467.62058 [58] White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica 50 1-25. · Zbl 0478.62088 [59] White, H. (1994). Estimation, Inference and Specification Analysis. Econometric Society Monographs 22. Cambridge Univ. Press, Cambridge. · Zbl 0860.62100 [60] Wu, C. · Zbl 0618.62072
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.