zbMATH — the first resource for mathematics

A Bayesian mixture model for clustering circular data. (English) Zbl 07135555
Summary: Clustering complex circular phenomena is a common problem in different scientific disciplines. Examples include the clustering of directions of animal movement in the wild to identify migration patterns, and the classification of angular positions of meteorological events to investigate seasonality fluctuations. The main goal is to develop a novel methodology for clustering and classification of circular data, under a Bayesian mixture modeling framework. The mixture model is defined assuming that the number of components is finite, but unknown, and that each component follows a projected normal distribution. Model selection is performed by jointly making inferences about the parameters of the mixture model and the number of components, choosing the model with the highest posterior probability. A deterministic relabeling strategy is used to recover identifiability for the components in the chosen model. Estimates of both the posterior classification probabilities and the scaled densities are approximated via the relabeled MCMC output. The proposed methods are illustrated using both simulated and real datasets, and performance comparisons with existing strategies are also given. The results suggest that the new approach is an appealing alternative for the clustering and classification of circular data.
62 Statistics
Full Text: DOI
[1] Ackermann, H., A note on circular nonparametrical classification, Biom. J., 39, 5, 577-587 (1997) · Zbl 0882.62052
[2] Burkard, R. E.; Dell’Amico, M.; Martello, S., Assignment Problems (2009), SIAM: SIAM Philadelphia
[3] Cappé, O.; Robert, C. P.; Rydén, T., Reversible jump, birth-and-death and more general continuous time Markov chain Monte Carlo samplers, J. R. Stat. Soc. Ser. B Stat. Methodol., 65, 3, 679-700 (2003) · Zbl 1063.62133
[4] Chang, F.; Qiu, W.; Zamar, R. H.; Lazarus, R.; Wang, X., Clues: An R package for nonparametric clustering based on local shrinking, J. Stat. Softw., 33, 4, 1-16 (2010)
[5] Chang-Chien, S.; Wen-Liang, H.; Miin-Shen, Y., On mean shift-based clustering for circular data, Soft Comput., 16, 6, 1043-1060 (2012)
[6] Cressie, N., On some properties of the scan statistic on the circle and the line, J. Appl. Probab., 14, 2, 272?283 (1977) · Zbl 0364.60073
[7] Diebolt, J.; Robert, C. P., Estimation of finite mixtures distributions through Bayesian sampling, J. R. Stat. Soc. Ser. B Stat. Methodol., 56, 2, 363-375 (1994) · Zbl 0796.62028
[8] Escobar, M. D.; West, M., Bayesian density estimation and inference using mixtures, J. Amer. Statist. Assoc., 90, 430, 577-588 (1995) · Zbl 0826.62021
[9] Forgy, E. W., Cluster analysis of multivariate data: efficiency vs interpretability of classifications, Biometrics, 21, 3, 768-769 (1965)
[10] (Fruhwirth-Schnatter, S.; Celeux, G.; Robert, C. P., Handbook of Mixture Analysis. Handbook of Mixture Analysis, Chapman & Hall/CRC Handbooks of Modern Statistical Methods (2019), CRC Press) · Zbl 1419.62001
[11] Ghosh, K.; Jammalamadaka, R.; Tiwari, R., Semiparametric Bayesian techniques for problems in circular data, J. Appl. Stat., 30, 2, 145-161 (2003) · Zbl 1121.62376
[12] Godsill, S. J., On the relationship between Markov chain Monte Carlo methods for model uncertainty, J. Comput. Graph. Statist., 10, 2, 230-240 (2001)
[14] Green, P. J., Reversible jump Markov chain Monte Carlo computation and Bayesian model determination, Biometrika, 82, 4, 711-732 (1995) · Zbl 0861.62023
[15] Hernandez Stumpfhauser, D.; Breidt, F. J.; van der Woerd, M. J., The general projected normal distribution of arbitrary dimension: modeling and Bayesian inference, Bayesian Anal., 12, 1, 113-133 (2017) · Zbl 1384.62176
[17] Hubert, L.; Arabie, P., Comparing partitions, J. Classification, 2, 1, 193-218 (1985)
[18] Jammalamadaka, S. R.; SenGupta, A., (Topics in Circular Statistics. Topics in Circular Statistics, Series on Multivariate Analysis (2001), World Scientific)
[20] Kaufman, L.; Rousseeuw, P. J., Finding Groups in Data (2008), John Wiley & Sons, Inc.
[21] Kendall, D. G.; Harding, E. F., Stochastic Geometry: A Tribute to the Memory of Rollo Davidson (1974), John Wiley & Sons, Wiley: John Wiley & Sons, Wiley London, New York · Zbl 0267.00016
[22] Lloyd, S., Least squares quantization in PCM, IEEE Trans. Inform. Theory, 28, 2, 129-137 (1982) · Zbl 0504.94015
[23] Lund, U., Least circular distance regression for directional data, J. Appl. Stat., 26, 6, 723-733 (1999) · Zbl 0939.62049
[24] Malsiner-Walli, G.; Frühwirth-Schnatter, S.; Grün, B., Model-based clustering based on sparse finite Gaussian mixtures, Stat. Comput., 26, 1, 303-324 (2016) · Zbl 1342.62109
[25] Mardia, K., (Statistics of Directional Data. Statistics of Directional Data, Probability and Mathematical Statistics: A Series of Monographs and Textbooks (1972), Academic Press) · Zbl 0244.62005
[26] Mardia, K. V.; Jupp, P. E., Circular Data (2000), John Wiley and Sons, Inc.
[27] McVinish, R.; Mengersen, K., Semiparametric Bayesian circular statistics, Comput. Statist. Data Anal., 52, 10, 4722-4730 (2008) · Zbl 1452.62228
[28] Nobile, A.; Fearnside, A. T., Bayesian finite mixtures with an unknown number of components: the allocation sampler, Stat. Comput., 17, 2, 147-162 (2007)
[29] Núñez-Antonio, G.; Gutiérrez-Peña, E., A Bayesian analysis of directional data using the projected normal distribution, J. Appl. Stat., 32, 10, 995-1001 (2005) · Zbl 1121.62453
[30] Núñez-Antonio, G.; Mendoza, M.; Contreras-Cristán, A.; Gutiérrez-Peña, E.; Mendoza, E., Bayesian nonparametric inference for the overlap of daily animal activity patterns, Environ. Ecol. Stat., 25, 4, 471-494 (2018)
[33] Rand, W. M., Objective criteria for the evaluation of clustering methods, J. Amer. Statist. Assoc., 66, 336, 846-850 (1971)
[34] Rastelli, R.; Friel, N., Optimal Bayesian estimators for latent variable cluster models, Stat. Comput., 28, 6, 1169-1186 (2018) · Zbl 1430.62140
[35] Richardson, S.; Green, P. J., On Bayesian analysis of mixtures with an unknown number of components (with discussion), J. R. Stat. Soc. Ser. B Stat. Methodol., 59, 4, 731-792 (1997) · Zbl 0891.62020
[36] Richardson, S.; Green, P. J., Corrigendum: On Bayesian analysis of mixtures with an unknown number of components, J. R. Stat. Soc. Ser. B Stat. Methodol., 60, 3, 661 (1998)
[37] Rodríguez, C. E.; Walker, S. G., Label switching in Bayesian mixture models: deterministic relabeling strategies, J. Comput. Graph. Statist., 1, 23, 25-45 (2014)
[38] Rodríguez, C. E.; Walker, S. G., Univariate Bayesian nonparametric mixture modeling with unimodal kernels, Stat. Comput., 1, 24, 35-49 (2014) · Zbl 1325.62016
[39] Scrucca, L.; Fop, M.; M., T. B.; Raftery, A. E., Mclust 5: clustering, classification and density estimation using Gaussian finite mixture models, R J., 8, 1, 205-233 (2016)
[42] Stephens, M., Bayesian analysis of mixture models with an unknown number of components - an alternative to reversible jump methods, Ann. Statist., 28, 1, 40-74 (2000) · Zbl 1106.62316
[43] Stephens, M., Dealing with label switching in mixture models, J. R. Stat. Soc. Ser. B Stat. Methodol., 62, 4, 795-809 (2000) · Zbl 0957.62020
[44] Tierney, L., Markov chains for exploring posterior distributions (with discussion), Ann. Statist., 22, 4, 1701-1762 (1994) · Zbl 0829.62080
[46] Wang, F.; Gelfand, A. E., Directional data analysis under the general projected normal distribution, Stat. Methodol., 10, 1, 113-127 (2013) · Zbl 1365.62195
[47] Whitfield, P. H., Clustering of seasonal events: A simulation study using circular methods, Comm. Statist. Simulation Comput., 1-23 (2017)
[48] Xu, R.; Wunsch, D., Survey of clustering algorithms, IEEE Trans. Neural Netw., 16, 3, 645-678 (2005)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.