×

zbMATH — the first resource for mathematics

\(k\)-means clustering of extremes. (English) Zbl 1439.62121
Summary: The \(k\)-means clustering algorithm and its variant, the spherical \(k\)-means clustering, are among the most important and popular methods in unsupervised learning and pattern detection. In this paper, we explore how the spherical \(k\)-means algorithm can be applied in the analysis of only the extremal observations from a data set. By making use of multivariate extreme value analysis we show how it can be adopted to find “prototypes” of extremal dependence and derive a consistency result for our suggested estimator. In the special case of max-linear models we show furthermore that our procedure provides an alternative way of statistical inference for this class of models. Finally, we provide data examples which show that our method is able to find relevant patterns in extremal observations and allows us to classify extremal events.

MSC:
62G32 Statistics of extreme values; tail inference
62H30 Classification and discrimination; cluster analysis (statistical aspects)
60G70 Extreme value theory; extremal stochastic processes
62M15 Inference from stochastic processes and spectral analysis
PDF BibTeX XML Cite
Full Text: DOI Euclid
References:
[1] J. Beirlant, Y. Goegebeur, J. Segers, and J. L. Teugels., Statistics of extremes: theory and applications. John Wiley & Sons, 2006. · Zbl 1070.62036
[2] E. Bernard, P. Naveau, M. Vrac, and O. Mestre. Clustering of maxima: Spatial dependencies among heavy rainfall in France., Journal of Climate, 26(20) :7929-7937, 2013.
[3] H.-H. Bock. Origins and extensions of the k-means algorithm in cluster analysis., Journal Electronique d’Histoire des Probabilités et de la Statistique Electronic Journal for History of Probability and Statistics, 4:48-49, 2008.
[4] P. S. Bradley and U. M. Fayyad. Refining initial points for k-means clustering. In, ICML, volume 98, pages 91-99. Citeseer, 1998.
[5] E. Chautru. Dimension reduction in multivariate extreme value analysis., Electron. J. Statist., 9(1):383-418, 2015. · Zbl 1308.62121
[6] M. Chiapino, A. Sabourin, and J. Segers. Identifying groups of variables with the potential of being large simultaneously., Extremes, Jan 2019. ISSN 1572-915X. · Zbl 1420.62226
[7] S. G. Coles and J. A. Tawn. Modelling extreme multivariate events., Journal of the Royal Statistical Society: Series B (Methodological), 53(2):377-392, 1991. · Zbl 0800.60020
[8] D. Cooley and E. Thibaud. Decompositions of dependence for high-dimensional extremes., Biometrika, 106:587-604, 2019.
[9] A. Davison and R. Huser. Statistics of extremes., Annual Review of Statistics and Its Application, 2(1):203-235, 2015.
[10] A. C. Davison, S. A. Padoan, M. Ribatet, et al. Statistical modeling of spatial extremes., Statistical science, 27(2):161-186, 2012. · Zbl 1330.86021
[11] L. de Haan and A. Ferreira., Extreme value theory: an introduction. Springer Science & Business Media, 2007. · Zbl 1101.62002
[12] I. S. Dhillon and D. S. Modha. Concept decompositions for large sparse text data using clustering., Machine learning, 42(1-2):143-175, 2001. · Zbl 0970.68167
[13] J. H. Einmahl and J. Segers. Maximum empirical likelihood estimation of the spectral measure of an extreme-value distribution., The Annals of Statistics, 37(5B) :2953-2989, 2009. · Zbl 1173.62042
[14] J. H. Einmahl, L. de Haan, and A. K. Sinha. Estimating the spectral measure of an extreme value distribution., Stochastic Processes and their Applications, 70(2):143-171, 1997. · Zbl 0905.62051
[15] J. H. Einmahl, L. de Haan, and V. I. Piterbarg. Nonparametric estimation of the spectral measure of an extreme value distribution., Ann. Statist., 29(5) :1401-1423, 10 2001. · Zbl 1043.62046
[16] J. H. Einmahl, A. Krajina, and J. Segers. An M-estimator for tail dependence in arbitrary dimensions., The Annals of Statistics, 40(3) :1764-1793, 2012. · Zbl 1257.62058
[17] J. H. Einmahl, A. Kiriliouk, A. Krajina, and J. Segers. An M-estimator of spatial tail dependence., Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(1):275-298, 2016. · Zbl 1411.62125
[18] J. H. Einmahl, A. Kiriliouk, and J. Segers. A continuous updating weighted least squares estimator of tail dependence in high dimensions., Extremes, pages 1-29, 2018. · Zbl 1402.62088
[19] A.-L. Fougères. Multivariate extremes. In, Extreme values in finance, telecommunications, and the environment, pages 373-388. Chapman and Hall/CRC, 2003.
[20] A.-L. Fougères, C. Mercadier, and J. P. Nolan. Dense classes of multivariate extreme value distributions., Journal of Multivariate Analysis, 116:109-129, 2013. · Zbl 1277.62143
[21] G. Gan, C. Ma, and J. Wu., Data clustering: theory, algorithms, and applications, volume 20. Siam, 2007. · Zbl 1185.68274
[22] N. Gissibl., Graphical Modeling of Extremes: Max-linear Models on Directed Acyclic Graphs. PhD thesis, Technical University of Munich, 2018. · Zbl 1419.62138
[23] N. Gissibl and C. Klüppelberg. Max-linear models on directed acyclic graphs., Bernoulli, 24(4A) :2693-2720, 2018. · Zbl 1419.62138
[24] N. Gissibl, C. Klüppelberg, and M. Otto. Tail dependence of recursive max-linear models with regularly varying noise variables., Econometrics and statistics, 6:149-167, 2018.
[25] N. Goix, A. Sabourin, and S. Clémençon. Sparse representation of multivariate extremes with applications to anomaly detection., Journal of Multivariate Analysis, 161:12 - 31, 2017. ISSN 0047-259X. · Zbl 1373.62252
[26] T. Hastie, R. Tibshirani, and J. Friedman., The elements of statistical learning: data mining, inference and prediction. Springer, 2 edition, 2009. · Zbl 1273.62005
[27] S. Haug, C. Klüppelberg, and G. Kuhn. Copula structure analysis based on extreme dependence., Statistics and Its Interface, 8:93-107, 2015. · Zbl 1407.62164
[28] J. E. Heffernan and J. A. Tawn. A conditional approach for multivariate extreme values (with discussion)., Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66(3):497-546, 2004. · Zbl 1046.62051
[29] K. Hornik, I. Feinerer, M. Kober, and C. Buchta. Spherical \(k\)-means clustering., Journal of Statistical Software, 50(10):1-22, 2012..
[30] X. Huang., Statistics of bivariate extreme values. PhD thesis, Erasmus University Rotterdam, 1992.
[31] A. Kiriliouk., tailDepFun: Minimum Distance Estimation of Tail Dependence Models, 2016. URL https://CRAN.R-project.org/package=tailDepFun. R package version 1.0.0.
[32] J. MacQueen. Some methods for classification and analysis of multivariate observations. In, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, pages 281-297, Berkeley, Calif., 1967. University of California Press. · Zbl 0214.46201
[33] M. Mahajan, P. Nimbhorkar, and K. Varadarajan. The planar k-means problem is np-hard., Theoretical Computer Science, 442:13-21, 2012. · Zbl 1260.68158
[34] S. A. Morris, B. J. Reich, and E. Thibaud. Exploration and inference in spatial extremes using empirical basis functions., Journal of Agricultural, Biological and Environmental Statistics, 24:555-572, 2019. · Zbl 1428.62491
[35] D. Pollard. Strong consistency of \(k\)-means clustering., Ann. Statist., 9(1):135-140, 01 1981. URL https://doi.org/10.1214/aos/1176345339. · Zbl 0451.62048
[36] D. Pollard. Quantization and the method of k-means., IEEE Transactions on Information theory, 28(2):199-205, 1982. · Zbl 0476.94010
[37] Y. Qi. Almost sure convergence of the stable tail empirical dependence function in multivariate extreme statistics., Acta Mathematicae Applicatae Sinica, 13(2):167-175, Apr 1997. ISSN 1618-3932. · Zbl 0904.62061
[38] D. Schuhmacher, B. Bähre, C. Gottschlich, V. Hartmann, F. Heinemann, and B. Schmitzer., transport: Computation of Optimal Transport Plans and Wasserstein Distances, 2019. URL https://cran.r-project.org/package=transport. R package version 0.11-1.
[39] H. Southworth, J. E. Heffernan, and P. D. Metcalfe., texmex: Statistical modelling of extreme values, 2018. R package version 2.4.2.
[40] R. Yuen. R-code for fitting max-linear models via minimum CRPS., http://hdl.handle.net/2027.42/110774, 2015.
[41] R. Yuen and S. Stoev. CRPS M-estimation for max-stable models., Extremes, 17(3):387-410, 2014. · Zbl 1309.62100
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.