Model-based clustering of longitudinal data. (English. French summary) Zbl 1190.62120

Summary: A family of mixture models for model-based clustering of longitudinal data is introduced. The covariance structures of eight members of this new family of models are given and the associated maximum likelihood estimates for the parameters are derived via Expectation-Maximization (EM) algorithms. The Bayesian information criterion is used for model selection and a convergence criterion based on the Aitken acceleration is used to determine the convergence of these EM algorithms. This new family of models is applied to yeast sporulation time course data, where the models give good clustering performance. Further constraints are then imposed on the decomposition to allow a deeper investigation of the correlation structure of the yeast data. These constraints greatly extend this new family of models, with the addition of many parsimonious models.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62H12 Estimation in multivariate analysis
62P10 Applications of statistics to biology and medical sciences; meta analysis
65C60 Computational problems in statistics (MSC2010)


R; mclust; nlme
Full Text: DOI Link


[1] Banfield, Model-based Gaussian and non-Gaussian clustering, Biometrics 49 pp 803– (1993) · Zbl 0794.62034
[2] Böhning, The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family, Annals of the Institute of Statistical Mathematics 46 pp 373– (1994) · Zbl 0802.62017
[3] Bouveyron, High-dimensional data clustering, Computational Statistics and Data Analysis 52 pp 502– (2007) · Zbl 1452.62433
[4] Celeux, Gaussian parsimonious clustering models, Pattern Recognition 28 pp 781– (1995)
[5] Chu, The transcriptional program of sporulation in budding yeast, Science 282 pp 699– (1998)
[6] M. J. Crowder & D. J. Hand (1990). ”Analysis of Repeated Measures.” Chapman and Hall, London. · Zbl 0745.62064
[7] De la Cruz-Mesía, Model-based clustering for longitudinal data, Computational Statistics and Data Analysis 52 pp 1441– (2008)
[8] Dempster, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, Series B 39 pp 1– (1977) · Zbl 0364.62022
[9] Diggle, Analysis of Longitudinal Data (1994) · Zbl 0825.62010
[10] Everitt, The analysis of repeated measures: A practical review with examples, The Statistician 44 pp 113– (1995)
[11] Flury, Common Principal Components and Related Multivariate Models (1988) · Zbl 1081.62535
[12] Fraley, How many clusters? Which clustering methods? Answers via model-based cluster analysis, The Computer Journal 41 pp 578– (1998) · Zbl 0920.68038
[13] Fraley, Model-based clustering, discriminant analysis, and density estimation, Journal of the American Statistical Association 97 pp 611– (2002) · Zbl 1073.62545
[14] Fraley, Enhanced software for model-based clustering, density estimation, and discriminant analysis: MCLUST, Journal of Classification 20 pp 263– (2003) · Zbl 1055.62071
[15] C. Fraley & A. E. Raftery (2006). MCLUST version 3 for R: Normal mixture modeling and model-based clustering. Technical Report 504, Department of Statistics, University of Washington. First Published September 2006. Minor revisions January 2007 and November 2007.
[16] Z. Ghahramani & G. E. Hinton (1997). The EM algorithm for factor analyzers. Technical Report CRG-TR-96-1, University of Toronto.
[17] Hand, Practical Longitudinal Data Analysis (1996)
[18] Haslett, Conditional expectations and residual analysis for the linear models, Applied Stochastic Models and Data Analysis 13 pp 259– (1997) · Zbl 0910.62055
[19] Kenward, A method for comparing profiles of repeated measurements, Journal of the Royal Statistical Society, Series C 36 pp 296– (1987)
[20] Keribin, Estimation consistante de l’ordre de modèles de mélange, Comptes Rendus de l’Académie des Sciences, Série I, Mathématique 326 pp 243– (1998)
[21] Keribin, Consistent estimation of the order of mixture models, Sankhyā, Series A 62 pp 49– (2000) · Zbl 1081.62516
[22] Krzanowski, Discriminant analysis with singular covariance matrices: Methods and applications to spectroscopic data, Journal of the Royal Statistical Society, Series C 44 pp 101– (1995) · Zbl 0821.62032
[23] B. G. Lindsay (1995). ”Mixture Models: Theory, Geometry and Applications, Volume 5 of NSF-CBMS Regional Conference Series in Probability and Statistics.” Institute of Mathematical Statistics, Hayward, CA. · Zbl 1163.62326
[24] McLachlan, Modelling high-dimensional data by mixtures of factor analyzers, Computational Statistics and Data Analysis 41 pp 379– (2003) · Zbl 1256.62036
[25] McNicholas, Parsimonious Gaussian mixture models, Statistics and Computing 18 pp 285– (2008)
[26] McNicholas, Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models, Computational Statistics and Data Analysis 54 pp 711– (2010) · Zbl 1464.62131
[27] Mitchell, Control of meiotic gene expression in Saccharomyces cerevisiae, Microbiology and Molecular Biology Reviews 58 pp 56– (1994)
[28] Pan, On modelling mean-covariance structures in longitudinal studies, Biometrika 90 pp 239– (2003) · Zbl 1039.62068
[29] Pauler, A mixture model for longitudinal data with application to assessment of noncompliance, Biometrics 56 pp 464– (2000) · Zbl 1069.62558
[30] J. Pinheiro, D. Bates, S. DebRoy, D. Sarkar & the R Core team (2008). ”nlme: Linear and Nonlinear Mixed Effects Models.” R Package Version 3.1-89.
[31] Pourahmadi, Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation, Biometrika 86 pp 677– (1999) · Zbl 0949.62066
[32] Pourahmadi, Maximum likelihood estimation of generalised linear models for multivariate normal covariance matrix, Biometrika 87 pp 425– (2000) · Zbl 0954.62091
[33] Pourahmadi, Simultaneous modelling of the Cholesky decomposition of several covariance matrices, Journal of Multivariate Analysis 98 pp 568– (2007) · Zbl 1107.62043
[34] R Development Core Team (2009). ”R: A Language and Environment for Statistical Computing.” R Foundation for Statistical Computing, Vienna, Austria.
[35] Schwarz, Estimating the dimension of a model, Annals of Statistics 6 pp 31– (1978) · Zbl 0379.62005
[36] Tipping, Mixtures of probabilistic principal component analysers, Neural Computation 11 pp 443– (1999)
[37] Wakefield, Bayesian Statistics 7 pp 721– (2003)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.