×

A comparative analysis of multiple outlier detection procedures in the linear regression model. (English) Zbl 1038.62062

We evaluate several published techniques to detect multiple outliers in linear regression using an extensive Monte Carlo simulation. These procedures include both direct methods from algorithms and indirect methods from robust regression estimators. We evaluate the impact of outlier density and geometry, regressor variable dimension, and outlying distance in both leverage and residuals on detection capability and false alarm (swamping) probability. The simulation scenarios focus on outlier configurations likely to be encountered in practice and use a designed experiment approach. The results for each scenario provide insight and limitations to performance for each technique. Finally, we summarize each procedure’s performance and make recommendations.

MSC:

62J20 Diagnostics, and linear inference and regression
62J05 Linear regression; mixed models
65C05 Monte Carlo methods

Software:

ROBETH
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Atkinson, A. C.; Riani, M.: Bivariate boxplots, multiple outliers, multivariate transformations and discriminant analysis: the 1997 Hunter lecture. Environmetrics 8, 583-602 (1997)
[2] Barnett, V.; Lewis, T.: Outliers in statistical data. (1994) · Zbl 0801.62001
[3] Coakley, C. W.; Hettmansperger, T. P.: A bounded influence, high breakdown, efficient regression estimator. J. amer. Statist. assoc. 88, 872-880 (1993) · Zbl 0783.62024
[4] Hadi, A. S.: Identifying multiple outliers in multivariate data. J. roy. Statist. soc. Ser. B 54, 761-777 (1992)
[5] Hadi, A. S.: A modification of a method for the detection of outliers in multivariate samples. J. roy. Statist. soc. Ser. B 56, 393-396 (1994) · Zbl 0800.62347
[6] Hadi, A. S.; Simonoff, J. S.: Procedures for the identification of multiple outliers in linear models. J. amer. Statist. assoc. 88, 1264-1272 (1993)
[7] Hadi, A. S.; Simonoff, J. S.: A more robust outlier identifier for regression data. Bull. internat. Statist. inst. 281–282 (1997)
[8] Huber, P. J.: Robust regression: asymptotics, conjectures and Monte Carlo. Ann. statist. 1, 799-821 (1973) · Zbl 0289.62033
[9] Kianifard, F.; Swallow, W.: A Monte Carlo comparison of five procedures for identifying outliers in linear regression. Commun. statist. Part A theory methods 19, 1913-1938 (1990)
[10] Marazzi, A.: Algorithms, routines, and S functions for robust statistics. (1993) · Zbl 0777.62004
[11] Rocke, D. M.; Woodruff, D. L.: Identification of outliers in multivariate data. J. amer. Statist. assoc. 91, 1047-1061 (1996) · Zbl 0882.62049
[12] Rousseeuw, P. J.: Multivariate estimation with high breakdown point.. Mathematical statistics and applications, vol B., 283-297 (1985) · Zbl 0609.62054
[13] Rousseeuw, P. J.: Least median of squares regression. J. amer. Statist. assoc. 79, 871-881 (1984) · Zbl 0547.62046
[14] Rousseeuw, P. J.; Van Zomeren, B. C.: Unmasking multivariate outliers and leverage points. J. amer. Statist. assoc. 85, 633-639 (1990)
[15] Sebert, D.M., 1996. Identifying multiple outliers and influential subsets: a clustering approach. Unpublished Dissertation, Arizona State University, AZ.
[16] Sebert, D. M.; Montgomery, D. C.; Rollier, D.: A clustering algorithm for identifying multiple outliers in linear regression. Comput. statist. Data anal. 27, 461-484 (1998) · Zbl 1042.62575
[17] Simpson, J. R.; Montgomery, D. C.: A compound estimator for robust regression.. Naval res. Logist. 45, 125-134 (1998)
[18] Swallow, W.; Kianifard, F.: Using robust scale estimates in detecting multiple outliers in linear regression. Biometrics 52, 545-556 (1996) · Zbl 0875.62283
[19] Wilcox, R. R.: Introduction to robust estimation and hypothesis testing.. (1997) · Zbl 0991.62508
[20] Yohai, V. J.: High breakdown-point and high efficiency robust estimates for regression. Ann. statist. 15, 642-656 (1987) · Zbl 0624.62037
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.