×

Fast computation of Tukey trimmed regions and median in dimension \(p > 2\). (English) Zbl 07499086

Summary: Given data in \(\mathbb{R}^p\), a Tukey \(\kappa\)-trimmed region is the set of all points that have at least Tukey depth \(\kappa\) w.r.t. the data. As they are visual, affine equivariant and robust, Tukey regions are useful tools in nonparametric multivariate analysis. While these regions are easily defined and interpreted, their practical use in applications has been impeded so far by the lack of efficient computational procedures in dimension \(p > 2\). We construct two novel algorithms to compute a Tukey \(\kappa\)-trimmed region, a naïve one and a more sophisticated one that is much faster than known algorithms. Further, a strict bound on the number of facets of a Tukey region is derived. In a large simulation study the novel fast algorithm is compared with the naïve one, which is slower and by construction exact, yielding in every case the same correct results. Finally, the approach is extended to an algorithm that calculates the innermost Tukey region and its barycenter, the Tukey median. Supplementary materials for this article are available online.

MSC:

62-XX Statistics
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Adler, D., Murdoch, D., Nenadic, O., Urbanek, S., Chen, M., Gebhardt, A., Bolker, B., Csardi, G., Strzelecki, A., Senger, A., The R Core Team, and Eddelbuettel, D. (2018), rgl: 3D Visualization Using OpenGL, R package version 0.99.16.
[2] Azzalini, A.; Capitanio, A., “Statistical Applications of the Multivariate Skew Normal Distribution,”, Journal of the Royal Statistical Society, Series B, 6, 579-602 (1999) · Zbl 0924.62050 · doi:10.1111/1467-9868.00194
[3] Barber, C. B.; Dobkin, D. P.; Huhdanpaa, H., “The Quickhull Algorithm for Convex Hulls,”, ACM Transactions on Mathematical Software, 22, 469-483 (1996) · Zbl 0884.65145 · doi:10.1145/235815.235821
[4] Barber, C. B., and Mozharovskyi, P. (2018), TukeyRegion: Tukey Region and Median, R package version 0.1.2. DOI: .
[5] Cascos, I.; Molchanov, I., “Multivariate Risks and Depth-Trimmed Regions,”, Finance and Stochastics, 11, 373-397 (2007) · Zbl 1164.91027 · doi:10.1007/s00780-007-0043-7
[6] Donoho, D., Breakdown Properties of Multivariate Location Estimators (1982)
[7] Dua, D., and Karra Taniskidou, E. (2017), “UCI Machine Learning Repository.”
[8] Dyckerhoff, R., “Data Depths Satisfying the Projection Property, AStA: Advances in Statistical Analysis, 88, 163-190 (2004) · Zbl 1294.62112 · doi:10.1007/s101820400167
[9] Dyckerhoff, R.; Mozharovskyi, P., “Exact Computation of the Halfspace Depth,”, Computational Statistics and Data Analysis, 98, 19-30 (2016) · Zbl 1468.62048 · doi:10.1016/j.csda.2015.12.011
[10] Hallin, M.; Paindaveine, D.; Šiman, M., “Multivariate Quantiles and Multiple-Output Regression Quantiles: From l_1 Optimization to Halfspace Depth,”, The Annals of Statistics, 38, 635-669 (2010) · Zbl 1183.62088 · doi:10.1214/09-AOS723
[11] Hubert, M.; Rousseeuw, P. J.; Segaert, P., “Multivariate Functional Outlier Detection,”, Statistical Methods and Applications, 24, 177-202 (2015) · Zbl 1441.62124 · doi:10.1007/s10260-015-0297-8
[12] Kong, L.; Mizera, I., “Quantile Tomography: Using Quantiles With Multivariate Data,”, Statistica Sinica, 22, 1589-1610 (2012) · Zbl 1359.62175
[13] Li, J.; Cuesta-Albertos, J. A.; Liu, R. Y., “DD-Classifier: Nonparametric Classification Procedure Based on DD-Plot,”, Journal of the American Statistical Association, 107, 737-753 (2012) · Zbl 1261.62058 · doi:10.1080/01621459.2012.688462
[14] Liu, R. Y.; Parelius, J. M.; Singh, K., Multivariate Analysis by Data Depth: Descriptive Statistics, Graphics and Inference,(With Discussion and a Rejoinder by Liu and Singh),”, The Annals of Statistics, 27, 783-858 (1999) · Zbl 0984.62037
[15] Liu, X., “Fast Implementation of the Tukey Depth,”, Computational Statistics, 32, 1395-1410 (2017) · Zbl 1417.65048 · doi:10.1007/s00180-016-0697-8
[16] Liu, X.; Zuo, Y., “Computing Projection Depth and Its Associated Estimators,”, Statistics and Computing, 24, 51-63 (2014) · Zbl 1325.62014 · doi:10.1007/s11222-012-9352-6
[17] Liu, X.; Luo, S.; Zuo, Y., “Some Results on the Computing of Tukey’s Halfspace Median,”, Statistical Papers (2017) · Zbl 1437.62100 · doi:10.1007/s00362-017-0941-5
[18] Loader, C. (2013), locfit: Local Regression, Likelihood and Density Estimation, R package version 1.5-9.1.
[19] Miller, K.; Ramaswami, S.; Rousseeuw, P.; Sellarès, J. A.; Souvaine, D.; Streinu, I.; Struyf, A., Efficient Computation of Location Depth Contours by Methods of Computational Geometry, Statistics and Computing, 13, 153-162 (2003)
[20] Mosler, K.; Bazovkin, P., “Stochastic Linear Programming With a Distortion Risk Constraint,”, OR Spectrum, 36, 949-969 (2014) · Zbl 1305.90321 · doi:10.1007/s00291-014-0372-9
[21] Mosler, K.; Lange, T.; Bazovkin, P., “Computing Zonoid Trimmed Regions of Dimension d > 2,”, Computational Statistics and Data Analysis,, 53, 2500-2510 (2009) · Zbl 1453.62159 · doi:10.1016/j.csda.2009.01.017
[22] Paindaveine, D.; Šiman, M., “On Directional Multiple-Output Quantile Regression,”, Journal of Multivariate Analysis, 102, 193-212 (2011) · Zbl 1328.62311 · doi:10.1016/j.jmva.2010.08.004
[23] Paindaveine, D.; Šiman, M., “Computing Multiple-Output Regression Quantile Regions, Computational Statistics and Data Analysis, 56, 840-853 (2012) · Zbl 1244.62060
[24] Paindaveine, D.; Šiman, M., “Computing Multiple-Output Regression Quantile Regions From Projection Quantiles, Computational Statistics, 27, 29-49 (2012) · Zbl 1304.65060
[25] Pokotylo, O., Mozharovskyi, P., Dyckerhoff, R., and Nagy, S. (2018), ddalpha: Depth-Based Classification and Calculation of Data Depth, R package version 1.3.4.
[26] R Core Team, R: A Language and Environment for Statistical Computing (2018), Vienna, Austria: R Foundation for Statistical Computing, Vienna, Austria
[27] Reaven, G. M.; Miller, R. G., “An Attempt to Define the Nature of Chemical Diabetes Using a Multidimensional Analysis,”, Diabetologia, 16, 17-24 (1979) · doi:10.1007/BF00423145
[28] Rousseeuw, P. J.; Leroy, A. M., Robust Regression and Outlier Detection (1987), New York: Wiley, New York · Zbl 0711.62030
[29] Rousseeuw, P. J.; Ruts, I., “Constructing the Bivariate Tukey Median,”, Statistica Sinica, 8, 827-839 (1998) · Zbl 0905.62029
[30] Rousseeuw, P. J.; Ruts, I.; Tukey, J. W., “The Bagplot: A Bivariate Boxplot,”, The American Statistician, 53, 382-387 (1999) · doi:10.2307/2686061
[31] Rousseeuw, P. J.; Struyf, A., Computing Location Depth and Regression Depth in Higher Dimensions, Statistics and Computing, 8, 193-203 (1998) · doi:10.1023/A:1008945009397
[32] Ruts, I.; Rousseeuw, P. J., “Computing Depth Contours of Bivariate Point Clouds,”, Computational Statistics and Data Analysis, 23, 153-168 (1996) · Zbl 0900.62337 · doi:10.1016/S0167-9473(96)00027-8
[33] Serfling, R.; Liu, R. Y.; Serfling, R.; Souvaine, D. L., Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications, “Depth Functions in Nonparametric Multivariate Inference, 1-16 (2006), Providence, RI: American Mathematical Society, Providence, RI
[34] Struyf, A. J.; Rousseeuw, P. J., “Halfspace Depth and Regression Depth Characterize the Empirical Distribution,”, Journal of Multivariate Analysis, 69, 135-153 (1999) · Zbl 1070.62509 · doi:10.1006/jmva.1998.1804
[35] Theussl, S., and Hornik, K. (2017), Rglpk: R/GNU Linear Programming Kit Interface, R package version 0.6-3.
[36] Tukey, J. W.; James, R., Proceedings of the International Congress of Mathematicians, 2, Mathematics and the Picturing of Data, 523-531 (1975), Montreal: Canadian Mathematical Congress, Montreal · Zbl 0347.62002
[37] Šiman, M., and Boček, P. (2016), modQR: Multiple-Output Directional Quantile Regression, R package version 0.1.1. · Zbl 1374.62002
[38] Yeh, A. B.; Singh, K., Balanced Confidence Regions Based on Tukey’s Depth and the Bootstrap,”, Journal of the Royal Statistical Society, Series B, 59, 639-652 (1997) · Zbl 1090.62539 · doi:10.1111/1467-9868.00088
[39] Yeh, I.-C.; Yang, K.-J.; Ting, T.-M, “Knowledge Discovery on RFM Model Using Bernoulli Sequence,”, Expert Systems with Applications, 36, 5866-5871 (2009) · doi:10.1016/j.eswa.2008.07.018
[40] Zuo, Y.; Serfling, R., “General Notions of Statistical Depth Function,”, The Annals of Statistics, 28, 461-482 (2000) · Zbl 1106.62334 · doi:10.1214/aos/1016218226
[41] Zuo, Y.; Serfling, R., Structural Properties and Convergence Results for Contours of Sample Statistical Depth Functions, The Annals of Statistics, 28, 483-499 (2000) · Zbl 1105.62343
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.