EM for mixtures. Initialization requires special care. (English) Zbl 1331.62301

Summary: Maximum likelihood through the EM algorithm is widely used to estimate the parameters in hidden structure models such as Gaussian mixture models. But the EM algorithm has well-documented drawbacks: its solution could be highly dependent from its initial position and it may fail as a result of degeneracies. We stress the practical dangers of theses limitations and how carefully they should be dealt with. Our main conclusion is that no method enables to address them satisfactory in all situations. But improvements are introduced, first, using a penalized log-likelihood of Gaussian mixture models in a Bayesian regularization perspective and, second, choosing the best among several relevant initialisation strategies. In this perspective, we also propose new recursive initialization strategies which prove helpful. They are compared with standard initialization procedures through numerical experiments and their effects on model selection criteria are analyzed.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
65C60 Computational problems in statistics (MSC2010)
Full Text: DOI


[1] Banfield, JD; Raftery, AE, Model-based gaussian and non-Gaussian clustering, Biometrics, 49, 803-821, (1993) · Zbl 0794.62034
[2] Baudry, J.-P.: Sélection de modèle pour la classification non supervisée. Choix du nombre de classes. PhD thesis, Université Paris-Sud (2009) · Zbl 0937.62605
[3] Baudry, J-P; Maugis, C; Michel, B, Slope heuristics: overview and implementation, Stat. Comput., 22, 455-470, (2011) · Zbl 1322.62007
[4] Berchtold, A, Optimisation of mixture models: comparison of different strategies, Comput. Stat., 19, 385-406, (2004) · Zbl 1068.65020
[5] Biernacki, C; Celeux, G; Govaert, G, Assessing a mixture model for clustering with the integrated completed likelihood, IEEE Trans. Pattern Anal. Mach. Intell., 22, 719-725, (2000)
[6] Biernacki, C; Celeux, G; Govaert, G, Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models, Comput. Stat. Data Anal., 41, 561-575, (2003) · Zbl 1429.62235
[7] Birgé, L; Massart, P, Minimal penalties for Gaussian model selection, Probab. Theory Relat. Fields, 138, 33-73, (2007) · Zbl 1112.62082
[8] Celeux, G; Govaert, G, A classification EM algorithm for clustering and two stochastic versions, Comput. Stat. Data Anal., 14, 315-332, (1992) · Zbl 0937.62605
[9] Celeux, G; Govaert, G, Parsimonious Gaussian models in cluster analysis, Pattern Recognit., 28, 781-793, (1995)
[10] Ciuperca, G; Ridolfi, A; Idier, J, Penalized maximum likelihood estimator for normal mixtures, Scand. J. Stat., 30, 45-59, (2003) · Zbl 1034.62018
[11] Dempster, AP; Laird, NM; Rubin, DB, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. Ser. B (Methodological), 39, 1-38, (1977) · Zbl 0364.62022
[12] Fraley, C; Raftery, A; Wehrens, RJ, Incremental model-based clustering for large datasets with small clusters, J. Comput. Graph. Stat., 14, 529-546, (2005)
[13] Fraley, C; Raftery, AE, Model-based clustering, discriminant analysis and density estimation, J. Am. Stat. Assoc., 97, 611-631, (2002) · Zbl 1073.62545
[14] Fraley, C; Raftery, AE, Bayesian regularization for normal mixture estimation and model-based clustering, J. Classif., 24, 155-181, (2007) · Zbl 1159.62302
[15] Frazee, AC; etal., Recount: a multi-experiment resource of analysis-ready RNA-seq gene count datasets, BMC Bioinform., 12, 449, (2011)
[16] Graveley, BR; etal., The development transcriptome of drosophila melanogaster, Nature, 471, 473-479, (2011)
[17] Keribin, C, Consistent estimation of the order of mixture models, Sankhya A, 62, 49-66, (2000) · Zbl 1081.62516
[18] McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley, Hoboken (2008) · Zbl 1165.62019
[19] McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000) · Zbl 0963.62061
[20] Papastamoulis, P., Martin-Magniette, M.-L., Maugis-Rabusseau, C.: On the estimation of mixtures of poisson regression models with large numbers of components. Computat. Stat. Data Anal. (to appear) (2014) · Zbl 1468.62154
[21] Pelleg, D., Moore, A.W.: X-means: Extending k-means with efficient estimation of the number of clusters. In: Langley, P. (ed.) ICML, pp. 727-734. Morgan Kaufmann (2000)
[22] Rau, A., Maugis-Rabusseau, C., Martin-Magniette, M.-L., Celeux, G.: Co-expression analysis of high-throughput transcriptome sequencing data with Poisson mixture models. Bioinformatics. (to appear) (2015) · Zbl 1034.62018
[23] Roeder, K; Wasserman, L, Practical Bayesian density estimation using mixtures of normals, J. Am. Stat. Assoc., 92, 894-902, (1997) · Zbl 0889.62021
[24] Schwarz, G, Estimating the dimension of a model, Ann. Stat., 6, 461-464, (1978) · Zbl 0379.62005
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.