zbMATH — the first resource for mathematics

Improved model-based clustering performance using Bayesian initialization averaging. (English) Zbl 1417.62177
Summary: The expectation-maximization (EM) algorithm is a commonly used method for finding the maximum likelihood estimates of the parameters in a mixture model via coordinate ascent. A serious pitfall with the algorithm is that in the case of multimodal likelihood functions, it can get trapped at a local maximum. This problem often occurs when sub-optimal starting values are used to initialize the algorithm. Bayesian initialization averaging (BIA) is proposed as an ensemble method to generate high quality starting values for the EM algorithm. Competing sets of trial starting values are combined as a weighted average, which is then used as the starting position for a full EM run. The method can also be extended to variational Bayes methods, a class of algorithm similar to EM that is based on an approximation of the model posterior. The BIA method is demonstrated on real continuous, categorical and network data sets, and the convergent log-likelihoods and associated clustering solutions presented. These compare favorably with the output produced using competing initialization methods such as random starts, hierarchical clustering and deterministic annealing, with the highest available maximum likelihood estimates obtained in a higher percentage of cases, at reasonable computational cost. For the Stochastic Block Model for network data promising results are demonstrated even when the likelihood is unavailable. The implications of the different clustering solutions obtained by local maxima are also discussed.
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62F10 Point estimation
62F15 Bayesian inference
62P10 Applications of statistics to biology and medical sciences; meta analysis
65C60 Computational problems in statistics (MSC2010)
Full Text: DOI
[1] Agresti A (2002) Categorical data analysis, 2nd edn. Wiley, London · Zbl 1018.62002
[2] Aitkin, M.; Aitkin, I., A hybrid EM/Gauss-Newton algorithm for maximum likelihood in mixture distributions, Stat Comput, 6, 127-130, (1996)
[3] Andrews, JL; McNicholas, PD, Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions, Stat Comput, 22, 1021-1029, (2011) · Zbl 1252.62062
[4] Baudry, JP; Celeux, G., EM for mixtures, Stat Comput, 25, 713-726, (2015) · Zbl 1331.62301
[5] Baudry, JP; Cardoso, M.; Celeux, G.; Amorim, MJ; Ferreira, AS, Enhancing the selection of a model-based clustering with external categorical variables, Adv Data Anal Classif, 9, 177-196, (2015)
[6] Besag, J., On the statistical analysis of dirty pictures, J R Stat Soc Ser B Methodol, 48, 259-302, (1986) · Zbl 0609.62150
[7] Biernacki, C.; Celeux, G.; Govaert, G., Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models, Comput Stat Data Anal, 41, 561-575, (2003) · Zbl 1429.62235
[8] Byrd, R.; Lu, P.; Nocedal, J.; Zhu, C., A limited memory algorithm for bound constrained optimization, ACM Trans Math Softw, 16, 1190-1208, (1995) · Zbl 0836.65080
[9] Carpaneto, G.; Toth, P., Algorithm 548: solution of the assignment problem, ACM Trans Math Softw, 6, 104-111, (1980)
[10] Celeux, G.; Govaert, G., A classification EM algorithm for clustering and two-stochastic versions, Comput Stat Data Anal, 14, 315-332, (1992) · Zbl 0937.62605
[11] Cook R, Weisberg S (1994) An introduction to regression graphics. Wiley, New York · Zbl 0925.62287
[12] Csardi, G.; Nepusz, T., The igraph software package for complex network research, Int J Complex Syst, 1695, 1-9, (2006)
[13] Daudin, JJ; Picard, F.; Robin, S., A mixture model for random graphs, Stat Comput, 18, 173-183, (2008)
[14] Dempster, AP; Laird, NM; Rubin, DB, Maximum likelihood from incomplete data via the EM algorithm, J R Stat Soc Ser B Methodol, 39, 1-38, (1977) · Zbl 0364.62022
[15] Fraley, C.; Raftery, AE, Mclust: software for model-based clustering, J Classif, 16, 297-306, (1999) · Zbl 0951.91500
[16] Fruchterman, TMJ; Reingold, EM, Graph drawing by force-directed placement, Softw Pract Exp, 21, 1129-1164, (1991)
[17] Goodman, LA, Exploratory latent structure analysis using both identifiable and unidentifiable models, Biometrika, 61, 215-231, (1974) · Zbl 0281.62057
[18] Hand, DJ; Yu, K., Idiot’s Bayes: not so stupid after all?, Int Stat Rev, 69, 385-398, (2001) · Zbl 1213.62010
[19] Hoeting, A.; Madigan, D.; Raftery, A.; Volinsky, C., Bayesian model averaging: a tutorial, Stat Sci, 14, 382-401, (1999) · Zbl 1059.62525
[20] Holland, PW; Laskey, KB; Leinhardt, S., Stochastic blockmodels: first steps, Soc Netw, 5, 109-137, (1983)
[21] Karlis, D.; Xekalaki, E., Choosing initial values for the EM algorithm for finite mixtures, Comput Stat Data Anal, 41, 577-590, (2003) · Zbl 1429.62082
[22] Keribin, C., Consistent estimation of the order of mixture models. Sankhy?, Indian J Stat Ser A (1961-2002), 62, 49-66, (2000) · Zbl 1081.62516
[23] Lee, S.; McLachlan, GJ, Finite mixtures of multivariate skew t-distributions: some recent and new results, Stat Comput, 24, 181-202, (2012) · Zbl 1325.62107
[24] Linzer, DA; Lewis, JB, poLCA: an R package for polytomous variable latent class analysis, J Stat Softw, 42, 1-29, (2011)
[25] McGrory C, Ahfock D (2014) Transdimensional sequential Monte Carlo for hidden Markov models using variational Bayes-SMCVB. In: Proceedings of the 2014 federated conference on computer science and information systems, vol 3. pp 61-66
[26] McLachlan GJ, Krishnan T (1997) The EM algorithm and extensions. Wiley, New York · Zbl 0882.62012
[27] McLachlan GJ, Peel D (1998) Advances in pattern recognition: joint IAPR international workshops on structual and syntactic pattern recognition (SSPR) and statistical pattern recognition (SPR) Sydney, Australia, August 11-13, 1998 Proceedings, Springer, Berlin, chap Robust cluster analysis via mixtures of multivariate t-distributions, pp 658-666
[28] McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York · Zbl 0963.62061
[29] Meng XL, Rubin DB (1992) Recent extensions of the EM algorithm (with discussion). In: Bayesian statistics 4. Oxford University Press, Oxford, pp 307-320
[30] Meng, XL; Rubin, DB, Maximum likelihood estimation via the ECM algorithm: a general framework, Biometrika, 80, 267-278, (1993) · Zbl 0778.62022
[31] Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F (2012) e1071: Misc Functions of the Department of Statistics (e1071), TU Wien. R package version 1.6-1. http://CRAN.R-project.org/package=e1071
[32] Moran, M.; Walsh, C.; Lynch, A.; Coen, RF; Coakley, D.; Lawlor, BA, Syndromes of behavioural and psychological symptoms in mild Alzheimer’s disease, Int J Geriatr Psychiatry, 19, 359-364, (2004)
[33] Murphy, M.; Wang, D., Do previous birth interval and mother’s education influence infant survival? A Bayesian model averaging analysis of Chinese data, Popul Stud, 55, 37-47, (2001)
[34] Neal, RM; Hinton, GE; Jordan, MI (ed.), A view of the EM algorithm that justifies incremental, sparse, and other variants, 355-368, (1999), Cambridge · Zbl 0916.62019
[35] Nobile, A.; Fearnside, AT, Bayesian finite mixtures with an unknown number of components: the allocation sampler, Stat Comput, 17, 147-162, (2007)
[36] O’Hagan, A.; Murphy, T.; Gormley, I., Computational aspects of fitting mixture models via the expectation-maximisation algorithm, Comput Stat Data Anal, 56, 3843-3864, (2012) · Zbl 1255.62180
[37] Raftery, AE; Balabdaoui, F.; Gneiting, T.; Polakowski, M., Using Bayesian model averaging to calibrate forecast ensembles, Mon Weather Rev, 133, 1155-1174, (2005)
[38] Redner, R.; Walker, H., Mixture densities, maximum likelihood, and the EM algorithm, Soc Ind Appl Math Rev, 26, 195-329, (1984) · Zbl 0536.62021
[39] Rokach L, Maimon O (2010) Clustering methods. In: Data mining and knowledge discovery handbook. Springer, Berlin, pp 321-352 · Zbl 1213.68237
[40] Schwarz, G., Estimating the dimension of a model, Ann Stat, 6, 461-464, (1978) · Zbl 0379.62005
[41] Slonim, N.; Atwal, GS; Tkacik, G.; Bialek, W.; Mumford, D., Information-based clustering, Proc Natl Acad Sci USA, 102, 18297-18302, (2005) · Zbl 1135.62054
[42] Snijders, TAB; Nowicki, K., Estimation and prediction for stochastic blockmodels for graphs with latent block structure, J Classif, 14, pp.75-100, (1997) · Zbl 0896.62063
[43] Ueda, N., Deterministic annealing EM algorithm, Neural Netw, 11, 271-282, (1998)
[44] Volant, S.; Martin Magniette, ML; Robin, S., Variational Bayes approach for model aggregation in unsupervised classification with Markovian dependency, Comput Stat Data Anal, 56, 2375-2387, (2012) · Zbl 1252.62067
[45] Volinsky, CT; Madigan, D.; Raftery, AE; Kronmal, RA, Bayesian model averaging in proportional hazard models: assessing the risk of a stroke, J R Stat Soc Ser C Appl Stat, 46, 433-448, (1997) · Zbl 0903.62093
[46] Walsh, C., Latent class analysis identification of syndromes in Alzheimer’s disease: a Bayesian approach, Metodološki Zvezki-Adv Methodol Stat, 3, 147-162, (2006)
[47] Ward, JH, Hierarchical grouping to optimize an objective function, J Am Stat Assoc, 58, 236-244, (1963)
[48] Wasserman S, Faust K (1994) Social network analysis: methods and applications. Cambridge University Press, Cambridge · Zbl 0926.91066
[49] White, A.; Murphy, TB, BayesLCA: an R package for Bayesian latent class analysis, J Stat Softw, 61, 1-28, (2014)
[50] Wintle, BA; McCarthy, MA; Volinsky, CT; Kavanagh, RP, The use of Bayesian model averaging to better represent uncertainty in ecological models, Conserv Biol, 17, 1579-1590, (2003)
[51] Zachary, WW, An information flow model for conflict and fission in small groups, J Anthropol Res, 33, 452-473, (1977)
[52] Zhou, H.; Lange, KL, On the bumpy road to the dominant mode, Scand J Stat, 37, 612-631, (2010) · Zbl 1226.62027
[53] Zhu, C.; Byrd, R.; Lu, P.; Nocedal, J., Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization, Soc Ind Appl Math J Sci Comput, 23, 550-560, (1997) · Zbl 0912.65057
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.