Mixture models with a prior on the number of components. (English) Zbl 1398.62066

Summary: A natural Bayesian approach for mixture models with an unknown number of components is to take the usual finite mixture model with symmetric Dirichlet weights, and put a prior on the number of components – that is, to use a mixture of finite mixtures (MFM). The most commonly used method of inference for MFMs is reversible jump Markov chain Monte Carlo, but it can be nontrivial to design good reversible jump moves, especially in high-dimensional spaces. Meanwhile, there are samplers for Dirichlet process mixture (DPM) models that are relatively simple and are easily adapted to new applications. It turns out that, in fact, many of the essential properties of DPMs are also exhibited by MFMs – an exchangeable partition distribution, restaurant process, random measure representation, and stick-breaking representation – and crucially, the MFM analogues are simple enough that they can be used much like the corresponding DPM properties. Consequently, many of the powerful methods developed for inference in DPMs can be directly applied to MFMs as well; this simplifies the implementation of MFMs and can substantially improve mixing. We illustrate with real and simulated data, including high-dimensional gene expression data used to discriminate cancer subtypes.


62F15 Bayesian inference
60G57 Random measures
62G05 Nonparametric estimation
62G07 Density estimation
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62P10 Applications of statistics to biology and medical sciences; meta analysis


Full Text: DOI arXiv Link


[1] Aldous, D. J., Exchangeability and Related Topics, (1985), Springer, New York · Zbl 0562.60042
[2] Antoniak, C. E., Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems, The Annals of Statistics, 2, 1152-1174, (1974) · Zbl 0335.60034
[3] Armstrong, S. A.; Staunton, J. E.; Silverman, L. B.; Pieters, R.; den Boer, M. L.; Minden, M. D.; Sallan, S. E.; Lander, E. S.; Golub, T. R.; Korsmeyer, S. J., MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia, Nature Genetics, 30, 41-47, (2001)
[4] Barry, D.; Hartigan, J. A., Product partition models for change point problems, The Annals of Statistics, 260-279, (1992) · Zbl 0780.62071
[5] Blackwell, D.; MacQueen, J. B., Ferguson distributions via Pólyaurn schemes, The Annals of Statistics, 1, 353-355, (1973) · Zbl 0276.62010
[6] Blei, D. M.; Jordan, M. I., Variational inference for Dirichlet process mixtures, Bayesian Analysis, 1, 121-143, (2006) · Zbl 1331.62259
[7] Blei, D. M.; Ng, A. Y.; Jordan, M. I., Latent Dirichlet allocation, Journal of Machine Learning Research, 3, 993-1022, (2003) · Zbl 1112.68379
[8] Broderick, T.; Jordan, M. I.; Pitman, J., Beta processes, stick-breaking and power laws, Bayesian Analysis, 7, 439-476, (2012) · Zbl 1330.62218
[9] Brooks, S. P.; Giudici, P.; Roberts, G. O., Efficient construction of reversible jump Markov chain Monte Carlo proposal distributions, Journal of the Royal Statistical Society, 65, 3-39, (2003) · Zbl 1063.62120
[10] Bush, C. A.; MacEachern, S. N., A semiparametric Bayesian model for randomised block designs, Biometrika, 83, 275-285, (1996) · Zbl 0864.62052
[11] Carlson, D. E.; Vogelstein, J. T.; Wu, Q.; Lian, W.; Zhou, M.; Stoetzner, C. R.; Kipke, D.; Weber, D.; Dunson, D. B.; Carin, L., Multichannel electrophysiological spike sorting via joint dictionary learning and mixture modeling, IEEE Transactions on Biomedical Engineering, 61, 41-54, (2014)
[12] Cerquetti, A., Conditional α-diversity for exchangeable Gibbs partitions driven by the stable subordinator, arXiv:1105.0892, (2011)
[13] Marginals of multivariate Gibbs distributions with applications in Bayesian species sampling, Electronic Journal of Statistics, 7, 697-716, (2013) · Zbl 1327.62196
[14] Chung, Y.; Dunson, D. B., Nonparametric Bayes conditional distribution modeling with variable selection, Journal of the American Statistical Association, 1646-1660, (2009) · Zbl 1205.62039
[15] da Silva, A. R. F., A Dirichlet process mixture model for brain MRI tissue classification, Medical Image Analysis, 11, 169-182, (2007)
[16] Dahl, D. B., An improved merge-split sampler for conjugate Dirichlet process mixture models, Technical Report, Department of Statistics, University of Wisconsin—Madison, (2003)
[17] Sequentially-allocated merge-split sampler for conjugate and nonconjugate Dirichlet process mixture models, Journal of Computational and Graphical Statistics, 11, (2005)
[18] Modal clustering in a class of product partition models, Bayesian Analysis, 4, 243-264, (2009) · Zbl 1330.62248
[19] de Souto, M. C.; Costa, I. G.; de Araujo, D. S.; Ludermir, T. B.; Schliep, A., Clustering cancer gene expression data: A comparative study, BMC Bioinformatics, 9, 497, (2008)
[20] Dellaportas, P.; Papageorgiou, I., Multivariate mixtures of normals with unknown number of components, Statistics and Computing, 16, 57-68, (2006)
[21] Dunson, D. B.; Park, J.-H., Kernel stick-breaking processes, Biometrika, 95, 307-323, (2008) · Zbl 1437.62448
[22] Durrett, R., Probability: Theory and Examples, (1996), Cambridge University Press, Cambridge, UK
[23] Escobar, M. D.; West, M., Bayesian density estimation and inference using mixtures, Journal of the American Statistical Association, 90, 577-588, (1995) · Zbl 0826.62021
[24] Favaro, S.; Lijoi, A.; Pruenster, I., On the stick-breaking representation of normalized inverse Gaussian priors, Biometrika, 99, 663-674, (2012) · Zbl 1437.62455
[25] Ferguson, T. S., A Bayesian analysis of some nonparametric problems, The Annals of Statistics, 209-230, (1973) · Zbl 0255.62037
[26] Ghosal, S.; Van der Vaart, A., Posterior convergence rates of Dirichlet mixtures at smooth densities, The Annals of Statistics, 35, 697-723, (2007) · Zbl 1117.62046
[27] Gnedin, A., A species sampling model with finitely many types, Electronic Communications in Probability, 15, 79-88, (2010) · Zbl 1202.60056
[28] Gnedin, A.; Pitman, J., Exchangeable Gibbs partitions and Stirling triangles, Journal of Mathematical Sciences, 138, 5674-5685, (2006)
[29] Green, P. J., Reversible jump Markov chain Monte Carlo computation and Bayesian model determination, Biometrika, 82, 711-732, (1995) · Zbl 0861.62023
[30] Green, P. J.; Richardson, S., Modeling heterogeneity with and without the Dirichlet process, Scandinavian Journal of Statistics, 28, 355-375, (2001) · Zbl 0973.62031
[31] Griffin, J. E.; Steel, M. J., Order-based dependent Dirichlet processes, Journal of the American Statistical Association, 101, 179-194, (2006) · Zbl 1118.62360
[32] Hansen, B.; Pitman, J., Prediction rules for exchangeable sequences related to species sampling, Statistics & Probability Letters, 46, 251-256, (2000) · Zbl 0944.62109
[33] Hartigan, J. A., Partition models, Communications in Statistics: Theory and Methods, 19, 2745-2756, (1990)
[34] Hastie, D. I.; Green, P. J., Model choice using reversible jump Markov chain Monte Carlo, Statistica Neerlandica, 66, 309-338, (2012)
[35] Henna, J., On estimating of the number of constituents of a finite mixture of continuous distributions, Annals of the Institute of Statistical Mathematics, 37, 235-240, (1985) · Zbl 0577.62031
[36] Estimation of the number of components of finite mixtures of multivariate distributions, Annals of the Institute of Statistical Mathematics, 57, 655-664, (2005) · Zbl 1094.62067
[37] Hjort, N. L., Bayesian analysis for a generalised Dirichlet process prior, Technical Report, University of Oslo, (2000)
[38] Ho, M.-W.; James, L. F.; Lau, J. W., Gibbs partitions (EPPF’s) derived from a stable subordinator are fox H and meijer G transforms, arXiv:0708.0619, (2007)
[39] Ishwaran, H.; James, L. F., Gibbs sampling methodsfor stick-breaking priors, Journal of the American Statistical Association, 161-173, (2001) · Zbl 1014.62006
[40] Generalized weighted Chinese restaurant processes for species sampling mixture models, Statistica Sinica, 13, 1211-1236, (2003) · Zbl 1086.62036
[41] Ishwaran, H.; James, L. F.; Sun, J., Bayesian model selection in finite mixtures by marginal density decompositions, Journal of the American Statistical Association, 1316-1332, (2001) · Zbl 1051.62027
[42] Ishwaran, H.; Zarepour, M., Markov chain Monte Carlo in approximate Dirichlet and beta two-parameter process hierarchical models, Biometrika, 87, 371-390, (2000) · Zbl 0949.62037
[43] Jain, S.; Neal, R. M., A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model, Journal of Computational and Graphical Statistics, 13, 158-182, (2004)
[44] Splitting and merging components of a nonconjugate Dirichlet process mixture model, Bayesian Analysis, 2, 445-472, (2007) · Zbl 1331.62145
[45] James, L. F.; Priebe, C. E.; Marchette, D. J., Consistent estimation of mixture complexity, The Annals of Statistics, 29, 1281-1296, (2001) · Zbl 1043.62023
[46] Jasra, A.; Holmes, C.; Stephens, D., Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling, Statistical Science, 20, 50-67, (2005) · Zbl 1100.62032
[47] Kalli, M.; Griffin, J. E.; Walker, S. G., Slice sampling mixture models, Statistics and Computing, 21, 93-105, (2011) · Zbl 1256.65006
[48] Kass, R. E.; Carlin, B. P.; Gelman, A.; Neal, R. M., Markov chain Monte Carlo in practice: A roundtable discussion, The American Statistician, 52, 93-100, (1998)
[49] Keribin, C., Consistent estimation of the order of mixture models, Sankhya, 62, 49-66, (2000) · Zbl 1081.62516
[50] Kim, S.; Tadesse, M. G.; Vannucci, M., Variable selection in clustering via Dirichlet process mixture models, Biometrika, 93, 877-893, (2006) · Zbl 1436.62266
[51] Kruijer, W.; Rousseau, J.; Van der Vaart, A., Adaptive Bayesian density estimation with location-scale mixtures, Electronic Journal of Statistics, 4, 1225-1257, (2010) · Zbl 1329.62188
[52] Landau, D. A.; Carter, S. L.; Stojanov, P.; McKenna, A.; Stevenson, K.; Lawrence, M. S.; Sougnez, C.; Stewart, C.; Sivachenko, A.; Wang, L.; Wan, Y.; Zhang, W.; Shukla, S. A.; Vartanov, A.; Fernandes, S. M.; Saksena, G.; Cibulskis, K.; Tesar, B.; Gabriel, S.; Hacohen, N.; Meyerson, M.; Lander, E. S.; Neuberg, D.; Brown, J. R.; Getz, G.; Wu, C. J., Evolution and impact of subclonal mutations in chronic lymphocytic leukemia, Cell, 152, 714-726, (2013)
[53] Lau, J. W.; Green, P. J., Bayesian model-based clustering procedures, Journal of Computational and Graphical Statistics, 16, 526-558, (2007)
[54] Leroux, B. G., Consistent estimation of a mixing distribution, The Annals of Statistics, 20, 1350-1360, (1992) · Zbl 0763.62015
[55] Lijoi, A.; Mena, R. H.; Prünster, I., Hierarchical mixture modeling with normalized inverse-Gaussian priors, Journal of the American Statistical Association, 100, 1278-1291, (2005) · Zbl 1117.62386
[56] Bayesian nonparametric estimation of the probability of discovering new species, Biometrika, 94, 769-786, (2007) · Zbl 1156.62374
[57] Lijoi, A.; Prünster, I., Models beyond the Dirichlet process, Bayesian Nonparametrics, 28, 80, (2010)
[58] Lijoi, A.; Prünster, I.; Walker, S. G., Bayesian nonparametric estimators derived from conditional Gibbs structures, The Annals of Applied Probability, 18, 1519-1547, (2008) · Zbl 1142.62333
[59] Liu, J. S., The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem, Journal of the American Statistical Association, 89, 958-966, (1994) · Zbl 0804.62033
[60] MacEachern, S. N., Estimating normal means with a conjugate style Dirichlet process prior, Communications in Statistics: Simulation and Computation, 23, 727-741, (1994) · Zbl 0825.62053
[61] Dey, D. D.; Müller, P.; Sinha, D., Computational methods for mixture of Dirichlet process models, Practical Nonparametric and Semiparametric Bayesian Statistics, 23-43, (1998), Springer, New York
[62] Dependent nonparametric processes, ASA Proceedings of the Section on Bayesian Statistical Science, 50-55, (1999)
[63] Dependent Dirichlet processes, (2000)
[64] MacEachern, S. N.; Müller, P., Estimating mixture of Dirichlet process models, Journal of Computational and Graphical Statistics, 7, 223-238, (1998)
[65] Marrs, A. D., An application of reversible-jump MCMC to multivariate spherical Gaussian mixtures, Advances in Neural Information Processing Systems, 577-583, (1998)
[66] McCullagh, P.; Yang, J., How many clusters?, Bayesian Analysis, 3, 101-120, (2008) · Zbl 1330.62033
[67] McLachlan, G. J.; Bean, R. W.; Peel, D., A mixture model-based approach to the clustering of microarray expression data, Bioinformatics, 18, 413-422, (2002)
[68] McNicholas, P. D.; Murphy, T. B., Model-based clustering of microarray expression data via latent Gaussian mixture models, Bioinformatics, 26, 2705-2712, (2010)
[69] Medvedovic, M.; Sivaganesan, S., Bayesian infinite mixture model based clustering of gene expression profiles, Bioinformatics, 18, 1194-1206, (2002)
[70] Medvedovic, M.; Yeung, K. Y.; Bumgarner, R. E., Bayesian mixture model based clustering of replicated microarray data, Bioinformatics, 20, 1222-1232, (2004)
[71] Miller, J. W.; Harrison, M. T.; Burges, C. J. C.; Bottou, L.; Welling, M.; Ghahramani, Z.; Weinberger, K. Q., A simple example of Dirichlet process mixture inconsistency for the number of components, Advances in Neural Information Processing Systems, 26, (2013), Neural Information Processing Systems Foundation, Inc
[72] Inconsistency of Pitman–Yor process mixtures for the number of components, Journal of Machine Learning Research, 15, 3333-3370, (2014) · Zbl 1319.62100
[73] Müller, P.; Quintana, F., Random partition models with regression on covariates, Journal of Statistical Planning and Inference, 140, 2801-2808, (2010) · Zbl 1191.62073
[74] Müller, P.; Quintana, F.; Rosner, G. L., A product partition model with regression on covariates, Journal of Computational and Graphical Statistics, 20, 260-278, (2011)
[75] Neal, R. M.; Smith, C. R.; Erickson, G. J.; Neudorfer, P. O., Bayesian mixture modeling, Maximum Entropy and Bayesian Methods, 197-211, (1992), Springer, New York
[76] Markov chain sampling methods for Dirichlet process mixture models, Journal of Computational and Graphical Statistics, 9, 249-265, (2000)
[77] Nguyen, X. L., Convergence of latent mixing measures in finite and infinite mixture models, The Annals of Statistics, 41, 370-400, (2013) · Zbl 1347.62117
[78] Nobile, A., Bayesian Analysis of Finite Mixture Distributions, (1994), Department of Statistics, Carnegie Mellon University, Ph.D. thesis, Pittsburgh, PA
[79] Bayesian finite mixtures: A note on prior specification and posterior computation, Technical Report, Department of Statistics, University of Glasgow, (2005)
[80] Nobile, A.; Fearnside, A. T., Bayesian finite mixtures with an unknown number of components: the allocation sampler, Statistics and Computing, 17, 147-162, (2007)
[81] Pagel, M.; Meade, A., A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data, Systematic Biology, 53, 571-581, (2004)
[82] Paisley, J. W.; Zaas, A. K.; Woods, C. W.; Ginsburg, G. S.; Carin, L., A stick-breaking construction of the beta process, Proceedings of the 27th International Conference on Machine Learning, 847-854, (2010)
[83] Papaspiliopoulos, O.; Roberts, G. O., Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models, Biometrika, 95, 169-186, (2008) · Zbl 1437.62576
[84] Papastamoulis, P.; Iliopoulos, G., An artificial allocations based solution to the label switching problem in Bayesian analysis of mixtures of distributions, Journal of Computational and Graphical Statistics, 19, 313-331, (2010)
[85] Park, J.-H.; Dunson, D. B., Bayesian generalized product partition model, Statistica Sinica, 20, 1203-1226, (2010) · Zbl 1507.62242
[86] Phillips, D. B.; Smith, A. F. M.; Gilks, W. R.; Richardson, S.; Spiegelhalter, D. J., Bayesian model comparison via jump diffusions, Markov Chain Monte Carlo in Practice, 215-239, (1996), New York: Springer · Zbl 0855.62018
[87] Pitman, J., Exchangeable and partially exchangeable random partitions, Probability Theory and Related Fields, 102, 145-158, (1995) · Zbl 0821.60047
[88] Some developments of the blackwell-macqueen urn scheme, Lecture Notes-Monograph Series, 245-267, (1996)
[89] Combinatorial Stochastic Processes, (2006), Springer-Verlag, Berlin
[90] Pritchard, J. K.; Stephens, M.; Donnelly, P., Inference of population structure using multilocus genotype data, Genetics, 155, 945-959, (2000)
[91] Quintana, F. A.; Iglesias, P. L., Bayesian clustering and product partition models, Journal of the Royal Statistical Society, 65, 557-574, (2003) · Zbl 1065.62115
[92] Rasmussen, C. E.; de la Cruz, B. J.; Ghahramani, Z.; Wild, D. L., Modeling and visualizing uncertainty in gene expression clusters using Dirichlet process mixtures, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 6, 615-628, (2009)
[93] Reynolds, D. A.; Quatieri, T. F.; Dunn, R. B., Speaker verification using adapted Gaussian mixture models, Digital Signal Processing, 10, 19-41, (2000)
[94] Richardson, S.; Green, P. J., On Bayesian analysis of mixtures with an unknown number of components, Journal of the Royal Statistical Society, 59, 731-792, (1997) · Zbl 0891.62020
[95] Rodriguez, A.; Dunson, D. B., Nonparametric Bayesian models through probit stick-breaking processes, Bayesian Analysis, 6, 145-177, (2011) · Zbl 1330.62120
[96] Rodríguez, C. E.; Walker, S. G., Univariate Bayesian nonparametric mixture modeling with unimodal kernels, Statistics and Computing, 24, 35-49, (2014) · Zbl 1325.62016
[97] Roeder, K., Density estimation with confidence sets exemplified by superclusters and voids in the galaxies, Journal of the American Statistical Association, 85, 617-624, (1990) · Zbl 0704.62103
[98] Sethuraman, J., A constructive definition of Dirichlet priors, Statistica Sinica, 4, 639-650, (1994) · Zbl 0823.62007
[99] Sethuraman, J.; Tiwari, R. C., Convergence of Dirichlet measures and the interpretation of their parameter, (1981)
[100] Stauffer, C.; Grimson, W. E. L.; Baldwin, T.; Sipple, R. S., Adaptive background mixture models for real-time tracking, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2, (1999), IEEE, Los Alamitos, CA
[101] Stephens, M., Bayesian analysis of mixture models with an unknown number of components—an alternative to reversible jump methods, The Annals of Statistics, 28, 40-74, (2000) · Zbl 1106.62316
[102] Tadesse, M. G.; Sha, N.; Vannucci, M., Bayesian variable selection in clustering high-dimensional data, Journal of the American Statistical Association, 100, 602-617, (2005) · Zbl 1117.62433
[103] Teh, Y. W.; Görür, D.; Ghahramani, Z.; Meila, M.; Shen, X., Stick-breaking construction for the Indian buffet process, Proceedings of Machine Learning Research, 556-563, (2007)
[104] Thibaux, R.; Jordan, M. I.; Meila, M.; Shen, X., Hierarchical beta processes and the Indian buffet process, Proceedings of Machine Learning Research, 564-571, (2007)
[105] Walker, S. G., Sampling the Dirichlet mixture model with slices, Communications in Statistics: Simulation and Computation, 36, 45-54, (2007) · Zbl 1113.62058
[106] West, M., Hyperparameter estimation in Dirichlet process mixture models, (1992)
[107] West, M.; Müller, P.; Escobar, M. D., Hierarchical priors and mixture models, with application in regression and density estimation, Aspects of Uncertainty: A Tribute to D.V. Lindley, 363-386, (1994), Wiley, New York · Zbl 0842.62001
[108] Woo, M.-J.; Sriram, T. N., Robust estimation of mixture complexity, Journal of the American Statistical Association, 101, (2006)
[109] Robust estimation of mixture complexity for count data, Computational Statistics and Data Analysis, 51, 4379-4392, (2007) · Zbl 1162.62321
[110] Xing, E. P.; Jordan, M. I.; Sharan, R., Bayesian haplotype inference via the Dirichlet process, Journal of Computational Biology, 14, 267-284, (2007)
[111] Xing, E. P.; Sohn, K. A.; Jordan, M. I.; Teh, Y. W.; Cohen, W.; Moore, A., Bayesian multi-population haplotype inference via a hierarchical Dirichlet process mixture, Proceedings of the 23rd International Conference on Machine Learning, 1049-1056, (2006), New York: ACM
[112] Yeung, K. Y.; Fraley, C.; Murua, A.; Raftery, A. E.; Ruzzo, W. L., Model-based clustering and data transformations for gene expression data, Bioinformatics, 17, 977-987, (2001)
[113] Zhang, Z.; Chan, K. L.; Wu, Y.; Chen, C., Learning a multivariate Gaussian mixture model with the reversible jump MCMC algorithm, Statistics and Computing, 14, 343-355, (2004)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.