Semi-parametric estimation for conditional independence multivariate finite mixture models. (English) Zbl 1307.62090

Summary: The conditional independence assumption for nonparametric multivariate finite mixture models, a weaker form of the well-known conditional independence assumption for random effects models for longitudinal data, is the subject of an increasing number of theoretical and algorithmic developments in the statistical literature. After presenting a survey of this literature, including an in-depth discussion of the all-important identifiability results, this article describes and extends an algorithm for estimation of the parameters in these models. The algorithm works for any number of components in three or more dimensions. It possesses a descent property and can be easily adapted to situations where the data are grouped in blocks of conditionally independent variables. We discuss how to adapt this algorithm to various location-scale models that link component densities, and we even adapt it to a particular class of univariate mixture problems in which the components are assumed symmetric. We give a bandwidth selection procedure for our algorithm. Finally, we demonstrate the effectiveness of our algorithm using a simulation study and two psychometric datasets.


62G05 Nonparametric estimation
62G07 Density estimation
62H12 Estimation in multivariate analysis


mixtools; R; UCI-ml
Full Text: DOI Euclid


[1] Allman, E. S., Matias, C., and Rhodes, J. A. (2009). Identifiability of parameters in latent structure models with many observed variables. Annals of Statistics , 37(6A):3099-3132. · Zbl 1191.62003
[2] Anderson, J. (1979). Multivariate logistic compounds. Biometrika , 66(1):17-26. · Zbl 0399.62029
[3] Bache, K. and Lichman, M. (2013). University of California, Irvine machine learning repository. .
[4] Benaglia, T., Chauveau, D., and Hunter, D. R. (2009a). An EM-like algorithm for semi-and non-parametric estimation in multivariate mixtures. Journal of Computational and Graphical Statistics , 18(2):505-526.
[5] Benaglia, T., Chauveau, D., and Hunter, D. R. (2010). Bandwidth selection in an EM-like algorithm for nonparametric multivariate mixtures. In Nonparametric Statistics and Mixture Models: A Festschrift in Honor of Thomas P. Hettmansperger .
[6] Benaglia, T., Chauveau, D., Hunter, D. R., and Young, D. (2009b). mixtools: An R package for analyzing finite mixture models. Journal of Statistical Software , 32(6):1-29.
[7] Bordes, L., Chauveau, D., and Vandekerkhove, P. (2007). A stochastic EM algorithm for a semiparametric mixture model. Computational Statistics and Data Analysis , 51(11):5429-5443. · Zbl 1445.62056
[8] Bordes, L., Mottelet, S., and Vandekerkhove, P. (2006). Semiparametric estimation of a two-component mixture model. Annals of Statistics , 34(3):1204-1232. · Zbl 1112.62029
[9] Bordes, L. and Vandekerkhove, P. (2010). Semiparametric two-component mixture model with a known component: An asymptotically normal estimator. Mathematical Methods of Statistics , 19(1):22-41. · Zbl 1282.62068
[10] Carreira-Perpiñán, M. Á. and Renals, S. (2000). Practical identifiability of finite mixtures of multivariate Bernoulli distributions. Neural Computation , 12(1):141-152.
[11] Chauveau, D., Saby, N. P. A., Orton, T. G., Lemercier, B., Walter, C., and Arrouays, D. (2014). Large-scale simultaneous hypothesis testing in monitoring carbon content from French soil database: A semi-parametric mixture approach. Geoderma , 219:117-124.
[12] Cruz-Medina, I. R. and Hettmansperger, T. P. (2004). Nonparametric estimation in semi-parametric univariate mixture models. Journal of Statistical Computation and Simulation , 74(7):513-524. · Zbl 1060.62041
[13] Eggermont, P. P. B. (1999). Nonlinear smoothing and the EM algorithm for positive integral equations of the first kind. Applied Mathematics and Optimization , 39(1):75-91. · Zbl 0969.65122
[14] Eggermont, P. P. B. and LaRiccia, V. N. (2001). Maximum Penalized Likelihood Estimation . Springer, New York. · Zbl 0984.62026
[15] Elmore, R. and Wang, S. (2003). Identifiability and estimation in finite mixture models with multinomial components. Technical report, Department of Statistics, Pennsylvania State University.
[16] Elmore, R. T., Hall, P., and Neeman, A. (2005). An application of classical invariant theory to identifiability in nonparametric mixtures. Annales de l’institut Fourier , 55(1):1-28. · Zbl 1137.62035
[17] Elmore, R. T., Hettmansperger, T. P., and Thomas, H. (2004). Estimating component cumulative distribution functions in finite mixture models. Communications in Statistics. Theory and Methods , 33(9):2075-2086. · Zbl 1215.62036
[18] Glick, N. (1973). Sample-based multinomial classification. Biometrics , 29(2):241-256.
[19] Gyllenberg, M., Koski, T., Reilink, E., and Verlaan, M. (1994). Non-uniqueness in probabilistic numerical identification of bacteria. Journal of Applied Probability , 31:542-548. · Zbl 0817.92002
[20] Hall, P., Neeman, A., Pakyari, R., and Elmore, R. T. (2005). Nonparametric inference in multivariate mixtures. Biometrika , 92(3):667-678. · Zbl 1152.62327
[21] Hall, P. and Zhou, X. H. (2003). Nonparametric estimation of component distributions in a multivariate mixture. Annals of Statistics , 31:201-224. · Zbl 1018.62021
[22] Hettmansperger, T. P. and Thomas, H. (2000). Almost nonparametric inference for repeated measures in mixture models. Journal of the Royal Statistical Society, Series B , 62(4):811-825. · Zbl 0957.62026
[23] Hohmann, D. (2010). Identification and Estimation in Semiparametric Two-Component Mixtures . PhD thesis, Philipps Universität Marburg.
[24] Hunter, D. R. and Lange, K. (2004). A tutorial on MM algorithms. The American Statistician , 58:30-37. · Zbl 05680564
[25] Hunter, D. R., Wang, S., and Hettmansperger, T. P. (2007). Inference for mixtures of symmetric distributions. Ann. Statist. , 35(1):224-251. · Zbl 1114.62035
[26] Kasahara, H. and Shimotsu, K. (2009). Nonparametric identification of finite mixture models of dynamic discrete choices. Econometrica , 77(1):135-175. · Zbl 1160.91323
[27] Kruskal, J. B. (1976). More factors than subjects, tests and treatments: An indeterminacy theorem for canonical decomposition and individual differences scaling. Psychometrika , 41(3):281-293. · Zbl 0339.92015
[28] Kruskal, J. B. (1977). Three-way arrays: Rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra and Its Applications , 18(2):95-138. · Zbl 0364.15021
[29] Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics , 38(4):963-974. · Zbl 0512.62107
[30] Leung, D. and Qin, J. (2006). Semi-parametric inference in a bivariate (multivariate) mixture model. Statistica Sinica , 16(1):153. · Zbl 1087.62046
[31] Levine, M., Hunter, D. R., and Chauveau, D. (2011). Maximum smoothed likelihood for multivariate mixtures. Biometrika , 98:403-416. · Zbl 1215.62055
[32] Lindsay, B. G. (1995). Mixture models: Theory, geometry and applications. In NSF-CBMS Regional Conference Series in Probability and Statistics , pages i-163. JSTOR. · Zbl 1163.62326
[33] Meng, X. L. and Rubin, D. B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika , 80(2):267. · Zbl 0778.62022
[34] Miller, C. A., Kail, R., and Leonard, L. B. (2001). Speed of processing in children with specific language impairment. Journal of Speech, Language, and Hearing Research , 44:416-433.
[35] Nash, W. J., Sellers, T. L., Talbot, S. R., Cawthorn, A. J., and Ford, W. B. (1994). The population biology of abalone ( Haliotis species) in Tasmania. I. blacklip abalone ( H. rubra ) from the north coast and islands of Bass Strait. Technical report, Tasmania Sea Fisheries Division. Technical Report No. 48 (ISSN 1034-3288).
[36] Nigam, K., McCallum, A., Thrun, S., and Mitchell, T. (2000). Text classification from labeled and unlabeled documents using EM. Machine Learning , 39(2):103-134. · Zbl 0949.68162
[37] R Core Team (2013). R: A Language and Environment for Statistical Computing . R Foundation for Statistical Computing, Vienna, Austria.
[38] Robin, S., Bar-Hen, A., Daudin, J.-J., and Pierre, L. (2007). A semi-parametric approach for mixture models: Application to local false discovery rate estimation. Computational Statistics & Data Analysis , 51(12):5483-5493. · Zbl 1445.62075
[39] Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis . Monographs on Statistics and Applied Probability. Chapman & Hall, London. · Zbl 0617.62042
[40] Thomas, H., Lohaus, A., and Brainerd, C. J. (1993). Modeling growth and individual differences in spatial tasks. Monographs of the Society for Research in Child Development , 58(9).
[41] Young, D. S., Benaglia, T., Chauveau, D., Elmore, R. T., Hettmansperger, T. P., Hunter, D. R., Thomas, H., and Xuan, F. (2009). mixtools: Tools for mixture models. R package version 0.3.3.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.