zbMATH — the first resource for mathematics

Robust regression and outlier detection. (English) Zbl 0711.62030
Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. New York etc.: John Wiley & Sons. xvi, 329 p. $ 36.95 (1987).
This is the third book devoted to robustness. The other two are P. J. Huber, Robust statistics. (1981; Zbl 0536.62025), and F. R. Hampel, E. Ronchetti, P. J. Rousseeuw and W. A. Stahel, Robust statistics. The approach based on influence functions. (1986; Zbl 0593.62027). The approach of the new book is much more data analysis oriented than the former two, which are basically theoretical. The book deals mainly with robust estimation in linear regression; however, there is also a chapter devoted to univariate location, a section on multivariate location and covariance matrix, and a brief introduction to robust methods in time series. According to the authors, and the reviewer agrees, the basic property of a robust estimator is to be very little sensitive to the presence of a sizable fraction of any type of outlier in the sample. A key measure for assessing the robustness in this approach is the breakdown point, and the authors recommend using estimates with a 50 % breakdown point. Other measures of robustness such as efficiency robustness, central in Huber’s approach, or those based on Hampel’s infinitesimal approach, are de-emphasized, and the corresponding optimality theory is not covered. The authors advocate the use of least median squares (LMS) followed by one step of reweighted least squares (RLS) as a robust estimate of regression.
The authors discuss many examples of regression using data sets containing outliers, comparing the solutions given by least squares (LS), LMS and RLS estimates. These examples show how sensitive the LS estimate is and how successfully the LMS and te RLS estimates cope with different types of outliers, even when there is a large fraction of them. All the examples are computed with the program PROGRESS, developed by the authors, which is available to those who apply for it. Since the authors introduce the concepts emphasizing their intuitive aspects and illustrate them with many examples, the book is appropriate not only for professional statisticians, but also for potential users of robust methods coming from other disciplines. The book is very well written, and does not assume any prior knowledge of robustness. However, a prior knowledge of least squares estimation and the standard diagnostic methods in regression is recommended.
Chapter 1 introduces outliers and the concept of robust regression. Chapter 2 deals with the LMS and RLS estimators in simple regression. Chapter 3 deals with the use of these estimators in multiple regression. It also describes in some detail other high breakdown point estimates such as the least trimmed squares (LTS) estimator and scale (S) estimators. Other approaches to robustness such as M-estimators and GM- estimators are only sketched. Chapter 4 deals with unidimensional location, which is treated as a special case of regression. Chapter 5 is devoted to algorithms for computing the LMS. Chapter 6 deals with outlier diagnostics, reviewing the procedures based on the LS estimate and proposing robust outlier detection procedures based on the LMS. Chapter 7 deals with the robust estimation of multivariate location and covariance matrices. Procedures with 50 % breakdown analogous to the LMS are proposed here. In this chapter there is also a brief introduction to robust estimation in time series.

62F35 Robustness and adaptive procedures (parametric inference)
62J05 Linear regression; mixed models
62-01 Introductory exposition (textbooks, tutorial papers, etc.) pertaining to statistics
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62-02 Research exposition (monographs, survey articles) pertaining to statistics