×

An adjusted boxplot for skewed distributions. (English) Zbl 1452.62074

Summary: The boxplot is a very popular graphical tool for visualizing the distribution of continuous unimodal data. It shows information about the location, spread, skewness as well as the tails of the data. However, when the data are skewed, usually many points exceed the whiskers and are often erroneously declared as outliers. An adjustment of the boxplot is presented that includes a robust measure of skewness in the determination of the whiskers. This results in a more accurate representation of the data and of possible outliers. Consequently, this adjusted boxplot can also be used as a fast and automatic outlier detection tool without making any parametric assumption about the distribution of the bulk of the data. Several examples and simulation results show the advantages of this new procedure.

MSC:

62-08 Computational methods for problems pertaining to statistics
62A09 Graphical methods in statistics

Software:

ROBPCA; LIBRA
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Aucremanne, L.; Brys, G.; Hubert, M.; Rousseeuw, P. J.; Struyf, A., A study of belgian inflation, relative prices and nominal rigidities using new robust measures of skewness and tail weight, (Hubert, M.; Pison, G.; Struyf, A.; Van Aelst, S., Theory and Applications of Recent Robust Methods, Series: Statistics for Industry and Technology (2004), Birkhauser: Birkhauser Basel), 13-25 · Zbl 1088.62135
[2] Bowley, A. L., Elements of Statistics (1920), Charles Scribner’s Sons: Charles Scribner’s Sons New York · JFM 48.0616.06
[3] Brys, G.; Hubert, M.; Rousseeuw, P. J., A robustification of independent component analysis, Journal of Chemometrics, 19, 364-375 (2005)
[4] Brys, G.; Hubert, M.; Struyf, A., A robust measure of skewness, Journal of Computational and Graphical Statistics, 13, 996-1017 (2004)
[5] Brys, G.; Hubert, M.; Struyf, A., Robust measures of tail weight, Computational Statistics and Data Analysis, 50, 733-759 (2006) · Zbl 1431.62047
[6] Carling, K., Resistant outlier rules and the non-Gaussian case, Computational Statistics and Data Analysis, 33, 249-258 (2000) · Zbl 1061.62502
[7] Chambers, J.M., Hastie, T.J., 1992. Statistical Models in S. Wadsworth and Brooks, Pacific Grove, pp. 348-351; Chambers, J.M., Hastie, T.J., 1992. Statistical Models in S. Wadsworth and Brooks, Pacific Grove, pp. 348-351
[8] Goegebeur, Y.; Planchon, V.; Beirlant, J.; Oger, R., Quality assessment of pedochemical data using extreme value methodology, Journal of Applied Science, 5, 1092-1102 (2005)
[9] Hoaglin, D. C.; Mosteller, F.; Tukey, J. W., Understanding Robust and Exploratory Data Analysis (1983), Wiley: Wiley New York, pp. 58-77
[10] Hoaglin, D. C.; Mosteller, F.; Tukey, J. W., Exploring Data Tables, Trends and Shapes (1985), Wiley: Wiley New York, pp. 463-478 · Zbl 0659.62002
[11] Hubert, M.; Rousseeuw, P. J.; Vanden Branden, K., ROBPCA: A new approach to robust principal components analysis, Technometrics, 47, 64-79 (2005)
[12] Hubert, M., Van der Veeken, S., 2007. Outlier detection for skewed data. Journal of Chemometrics (in press); Hubert, M., Van der Veeken, S., 2007. Outlier detection for skewed data. Journal of Chemometrics (in press) · Zbl 1284.62378
[13] Hubert, M., Rousseeuw, P.J., Verdonck, T., 2007. Robust PCA for skewed data (submitted for publication); Hubert, M., Rousseeuw, P.J., Verdonck, T., 2007. Robust PCA for skewed data (submitted for publication)
[14] Jarret, R. G., A note on the intervals between coal mining disasters, Biometrika, 66, 191-193 (1979)
[15] Kimber, A. C., Exploratory data analysis for possibly censored data from skewed distributions, Applied Statistics, 39, 21-30 (1990) · Zbl 0707.62004
[16] Moors, J. J.A.; Wagemakers, R. Th. A.; Coenen, V. M.J.; Heuts, R. M.J.; Janssens, M. J.B. T., Characterizing systems of distributions by quantile measures, Statistica Neerlandica, 50, 417-430 (1996) · Zbl 0899.62011
[17] Rousseeuw, P. J.; Ruts, I.; Tukey, J. W., The Bagplot: A bivariate boxplot, The American Statistician, 53, 382-387 (1999)
[18] Ruffieux, C.; Paccaud, F.; Marazzi, A., Comparing rules for truncating hospital length of stay, Casemix Quaterly, 2, 1 (2000)
[19] Schwertman, N. C.; Owens, M. A.; Adnan, R., A simple more general boxplot method for identifying outliers, Computational Statistics and Data Analysis, 47, 165-174 (2004) · Zbl 1430.62019
[20] Schwertman, N. C.; de Silva, R., Identifying outliers with sequential fences, Computational Statistics and Data Analysis, 51, 3800-3810 (2007) · Zbl 1161.62303
[21] Tukey, J. W., Exploratory Data Analysis (1977), Addison-Wesley: Addison-Wesley Reading, Massachusetts, pp. 39-49
[22] Vandewalle, B.; Beirlant, J.; Christmann, A.; Hubert, M., A robust estimator for the tail index of pareto-type distributions, Computational Statistics and Data Analysis, 51, 6252-6268 (2007) · Zbl 1445.62102
[23] Verboven, S.; Hubert, M., LIBRA: A MATLAB library for robust analysis, Chemometrics and Intelligent Laboratory Systems, 75, 127-136 (2005)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.