zbMATH — the first resource for mathematics

Hierarchical normalized completely random measures to cluster grouped data. (English) Zbl 1437.62224
Summary: In this article, we propose a Bayesian nonparametric model for clustering grouped data. We adopt a hierarchical approach: at the highest level, each group of data is modeled according to a mixture, where the mixing distributions are conditionally independent normalized completely random measures (NormCRMs) centered on the same base measure, which is itself a NormCRM. The discreteness of the shared base measure implies that the processes at the data level share the same atoms. This desired feature allows to cluster together observations of different groups. We obtain a representation of the hierarchical clustering model by marginalizing with respect to the infinite dimensional NormCRMs. We investigate the properties of the clustering structure induced by the proposed model and provide theoretical results concerning the distribution of the number of clusters, within and between groups. Furthermore, we offer an interpretation in terms of generalized Chinese restaurant franchise process, which allows for posterior inference under both conjugate and nonconjugate models. We develop algorithms for fully Bayesian inference and assess performances by means of a simulation study and a real-data illustration.

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G05 Nonparametric estimation
62F15 Bayesian inference
Full Text: DOI
[1] Argiento, R.; Bianchini, I.; Guglielmi, A., “Posterior Sampling From ε-Approximation of Normalized Completely Random Measure Mixtures,”, Electronic Journal of Statistics, 10, 3516-3547 (2016) · Zbl 1358.62034
[2] Argiento, R.; Cremaschi, A.; Guglielmi, A., “A ‘Density-Based’ Algorithm for Cluster Analysis Using Species Sampling Gaussian Mixture Models,”, Journal of Computational and Graphical Statistics, 23, 1126-1142 (2014)
[3] Bassetti, F.; Casarin, R.; Rossini, L., “Hierarchical Species Sampling Models,”, arXiv no. 1803.05793 (2018)
[4] Blei, D. M., “Probabilistic Topic Models,”, Communications of the ACM, 55, 77-84 (2012)
[5] Camerlenghi, F.; Lijoi, A.; Orbanz, P.; Prünster, I., “Distribution Theory for Hierarchical Processes,”, The Annals of Statistics, 47, 67-92 (2019) · Zbl 07036195
[6] Camerlenghi, F.; Lijoi, A.; Prünster, I., “Bayesian Prediction With Multiple-Samples Information,”, Journal of Multivariate Analysis, 156, 18-28 (2017) · Zbl 1369.62116
[7] Durrett, R., Probability: Theory and Examples (1991), Pacific Grove, CA: Wadsworth & Brooks/Cole, Pacific Grove, CA · Zbl 0709.60002
[8] Favaro, S.; Teh, Y., “MCMC for Normalized Random Measure Mixture Models,”, Statistical Science, 28, 335-359 (2013) · Zbl 1331.62138
[9] Ferguson, T. S.; Rizvi, M.; Rustagi, J.; Siegmund, D., Recent Advances in Statistics: Papers in Honor of Herman Chernoff on His Sixtieth Birthday, “Bayesian Density Estimation by Mixtures of Normal Distributions,”, 287-302 (1983), New York: Academic Press, Inc, New York · Zbl 0517.00012
[10] Hoff, P. D., A First Course in Bayesian Statistical Methods (2009), New York: Springer Verlag, New York · Zbl 1213.62044
[11] Ishwaran, H.; James, L. F., “Gibbs Sampling Methods for Stick-Breaking Priors,”, Journal of the American Statistical Association, 96, 161-173 (2001) · Zbl 1014.62006
[12] Ishwaran, H.; James, L. F., Generalized Weighted Chinese Restaurant Processes for Species Sampling Mixture Models, Statistica Sinica, 13, 1211-1235 (2003) · Zbl 1086.62036
[13] James, L. F.; Lijoi, A.; Prünster, I., “Posterior Analysis for Normalized Random Measures With Independent Increments,”, Scandinavian Journal of Statistics, 36, 76-97 (2009) · Zbl 1190.62052
[14] Kallenberg, O., Probabilistic Symmetries and Invariance Principles (2005), New York: Springer, New York · Zbl 1084.60003
[15] Kingman, J. F. C., Poisson Processes, 3 (1993), Oxford: Oxford University Press, Oxford · Zbl 0771.60001
[16] Lau, J. W.; Green, P. J., “Bayesian Model-Based Clustering Procedures,”, Journal of Computational and Graphical Statistics, 16, 526-558 (2007)
[17] Lijoi, A.; Mena, R. H.; Prünster, I., “Controlling the Reinforcement in Bayesian Non-parametric Mixture Models,”, Journal of the Royal Statistical Society, Series B, 69, 715-740 (2007)
[18] Lijoi, A.; Prünster, I.; Hjort, N.; Holmes, C.; Müller, P.; Walker, S., Bayesian Nonparametrics, Models Beyond the Dirichlet Process, 80-136 (2010), Cambridge: Cambridge University Press, Cambridge
[19] Lo, A. Y., “On a Class of Bayesian Nonparametric Estimates: I. Density Estimates,”, The Annals of Statistics, 12, 351-357 (1984) · Zbl 0557.62036
[20] MacEachern, S. N., Dependent Nonparametric Processes, 50-55 (1999)
[21] Malsiner-Walli, G.; Frühwirth-Schnatter, S.; Grün, B., “Identifying Mixtures of Mixtures Using Bayesian Estimation,”, Journal of Computational and Graphical Statistics, 26, 285-295 (2017)
[22] Müller, P.; Quintana, F., “Random Partition Models With Regression on Covariates,”, Journal of Statistical Planning and Inference, 140, 2801-2808 (2010) · Zbl 1191.62073
[23] Neal, R. M., “Markov Chain Sampling Methods for Dirichlet Process Mixture Models,”, Journal of Computational and Graphical Statistics, 9, 249-265 (2000)
[24] Pinheiro, J.; Bates, D., Mixed-Effects Models in S and S-PLUS (2000), New York: Springer, New York · Zbl 0953.62065
[25] Pitman, J.; Ferguson, T. S.; Shapley, L. S.; MacQueen, J. B., Statistics, Probability and Game Theory: Papers in Honor of David Blackwell, 30, Some Developments of the Blackwell-MacQueen Urn Scheme, 245-267 (1996), Hayward: Institute of Mathematical Statistics, Hayward
[26] Pitman, J., Science and Statistics: A Festschrift for Terry Speed, IMS Lecture Notes-Monograph Series, 40, Poisson-Kingman Partitions, 1-34 (2003), Hayward, CA: Institute of Mathematical Statistics, Hayward, CA
[27] Pitman, J., Combinatorial Stochastic Processes, Lecture Notes-Monograph Series, 1875 (2006), New York: Springer, New York
[28] Regazzini, E.; Lijoi, A.; Prünster, I., “Distributional Results for Means of Normalized Random Measures With Independent Increments,”, The Annals of Statistics, 31, 560-585 (2003) · Zbl 1068.62034
[29] Teh, Y. W.; Jordan, M. I.; Hjort, N.; Holmes, C.; Müller, P., Bayesian Nonparametrics, “Hierarchical Bayesian Nonparametric Models with Applications,”, 158-207 (2010), Cambridge: Cambridge University Press, Cambridge
[30] Teh, Y. W.; Jordan, M. I.; Beal, M. J.; Blei, D. M., “Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes, Advances in Neural Information Processing Systems, 1385-1392 (2005)
[31] Teh, Y. W.; Jordan, M. I.; Beal, M. J.; Blei, D. M., “Hierarchical Dirichlet Processes,”, Journal of the American Statistical Association, 101, 1566-1581 (2006) · Zbl 1171.62349
[32] Tyurin, I. S., “An Improvement of Upper Estimates of the Constants in the Lyapunov Theorem,”, Russian Mathematical Surveys, 65, 201-202 (2010)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.