Perturbation and scaled Cook’s distance. (English) Zbl 1273.62180

Summary: R. D. Cook’s distance [Technometrics 19, 15–18 (1977; Zbl 0371.62096)] is one of the most important diagnostic tools for detecting influential individual or subsets of observations in linear regression for cross-sectional data. However, for many complex data structures (e.g., longitudinal data), no rigorous approach has been developed to address a fundamental issue: deleting subsets with different numbers of observations introduces different degrees of perturbation to the current model fitted to the data, and the magnitude of Cook’s distance is associated with the degree of the perturbation. The aim of this paper is to address this issue in general parametric models with complex data structures. We propose a new quantity for measuring the degree of the perturbation introduced by deleting a subset. We use stochastic ordering to quantify the stochastic relationship between the degree of the perturbation and the magnitude of Cook’s distance. We develop several scaled Cook’s distances to resolve the comparison of Cook’s distance for different subset deletions. Theoretical and numerical examples are examined to highlight the broad spectrum of applications of these scaled Cook’s distances in a formal influence analysis.


62J20 Diagnostics, and linear inference and regression
62J12 Generalized linear models (logistic models)


Zbl 0371.62096
Full Text: DOI arXiv Euclid


[1] Andersen, E. B. (1992). Diagnostics in categorical data analysis. J. R. Stat. Soc. Ser. B Stat. Methodol. 54 781-791.
[2] Andrews, D. W. K. (1999). Estimation when a parameter is on a boundary. Econometrica 67 1341-1383. · Zbl 1056.62507
[3] Banerjee, M. (1998). Cook’s distance in linear longitudinal models. Comm. Statist. Theory Methods 27 2973-2983. · Zbl 0956.62054
[4] Banerjee, M. and Frees, E. W. (1997). Influence diagnostics for linear longitudinal models. J. Amer. Statist. Assoc. 92 999-1005. · Zbl 0889.62063
[5] Beckman, R. J. and Cook, R. D. (1983). Outlier\cdots s. Technometrics 25 119-163. · Zbl 0514.62041
[6] Chatterjee, S. and Hadi, A. S. (1988). Sensitivity Analysis in Linear Regression . Wiley, New York. · Zbl 0648.62066
[7] Christensen, R., Pearson, L. M. and Johnson, W. (1992). Case-deletion diagnostics for mixed models. Technometrics 34 38-45. · Zbl 0761.62098
[8] Cook, R. D. (1977). Detection of influential observation in linear regression. Technometrics 19 15-18. · Zbl 0371.62096
[9] Cook, R. D. (1986). Assessment of local influence. J. Roy. Statist. Soc. Ser. B 48 133-169. · Zbl 0608.62041
[10] Cook, R. D. and Weisberg, S. (1982). Residuals and Influence in Regression . Chapman & Hall, London. · Zbl 0564.62054
[11] Critchley, F., Atkinson, R. A., Lu, G. and Biazi, E. (2001). Influence analysis based on the case sensitivity function. J. R. Stat. Soc. Ser. B Stat. Methodol. 63 307-323. · Zbl 0979.62050
[12] Davison, A. C. and Tsai, C. L. (1992). Regression model diagnostics. International Statistical Review 60 337-353. · Zbl 0775.62201
[13] Eaton, M. L. and Tyler, D. E. (1991). On Wielandt’s inequality and its application to the asymptotic distribution of the eigenvalues of a random symmetric matrix. Ann. Statist. 19 260-271. · Zbl 0742.62015
[14] Fung, W.-K., Zhu, Z.-Y., Wei, B.-C. and He, X. (2002). Influence diagnostics and outlier tests for semiparametric mixed models. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 565-579. · Zbl 1090.62039
[15] Haslett, J. (1999). A simple derivation of deletion diagnostic results for the general linear model with correlated errors. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 603-609. · Zbl 0924.62076
[16] Huber, P. J. (1981). Robust Statistics . Wiley, New York. · Zbl 0536.62025
[17] Lin, D. Y., Wei, L. J. and Ying, Z. (2002). Model-checking techniques based on cumulative residuals. Biometrics 58 1-12. · Zbl 1209.62168
[18] McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models , 2nd ed. Chapman & Hall/CRC, Boca Raton. · Zbl 0744.62098
[19] Preisser, J. S. and Qaqish, B. F. (1996). Deletion diagnostics for generalised estimating equations. Biometrika 83 551-562. · Zbl 0866.62041
[20] Shaked, M. and Shanthikumar, G. J. (2006). Stochastic Orders . Springer, New York. · Zbl 0806.62009
[21] Stier, D. M., Leventhal, J. M., Berg, A. T., Johnson, L. and Mezger, J. (1993). Are children born to young mothers at increased risk of maltreatment. Pediatrics 91 642-648.
[22] Wasserman, D. R. and Leventhal, J. M. (1993). Maltreatment of children born to cocaine-dependent mothers. Am. J. Dis. Child. 147 1324-1328.
[23] Wei, B.-C. (1998). Exponential Family Nonlinear Models. Lecture Notes in Statist. 130 . Springer, Singapore. · Zbl 0904.62076
[24] White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica 50 1-25. · Zbl 0478.62088
[25] White, H. (1994). Estimation , Inference and Specification Analysis. Econometric Society Monographs 22 . Cambridge Univ. Press, Cambridge. · Zbl 0860.62100
[26] Zhang, H. (1999). Analysis of infant growth curves using multivariate adaptive splines. Biometrics 55 452-459. · Zbl 1059.62740
[27] Zhu, H. and Ibrahim, J. G. (2011). Supplement to “Perturbation and scaled Cook’s distance.” .
[28] Zhu, H., Ibrahim, J. G., Lee, S. and Zhang, H. (2007). Perturbation selection and influence measures in local influence analysis. Ann. Statist. 35 2565-2588. · Zbl 1129.62068
[29] Zhu, H., Lee, S.-Y., Wei, B.-C. and Zhou, J. (2001). Case-deletion measures for models with incomplete data. Biometrika 88 727-737. · Zbl 1006.62021
[30] Zhu, H. and Zhang, H. (2006). Asymptotics for estimation and testing procedures under loss of identifiability. J. Multivariate Anal. 97 19-45. · Zbl 1078.62063
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.