CHIME: clustering of high-dimensional Gaussian mixtures with EM algorithm and its optimality. (English) Zbl 1428.62182

Clustering and discrimination analysis for high-dimensional Gaussian mixtures are the focus in this paper. A procedure called CHIME is proposed, which is based on EM algorithm and a direct estimation method for the discrimination vector. High-dimensional clustering problem appears, for instance, in genetic data. After an adequately chosen number of iterations CHIME provides good estimations for the parameters. The discriminant vector estimator and the excess misclustering error attain minimax optimal rates of convergence. The CHIME algorithm requires two conditions to work well. First, it needs initialization values for the parameters in the mixture to be not very far away from their true values. The authors indicate using the Hardt-Price algorithm in order to obtaining a satisfying initialization. Second, the discriminant vector must be sparse. Next, the algorithm is adapted to attack the low-dimensional Gaussian mixture clustering problem. It is shown that the optimal properties are preserved. It is noticed that the estimators to the parameters in the Gaussian mixture given by CHIME achieve the same convergence rate as the maximum likelihood estimators obtained in the model with known sample labels. Initially the paper considers the two classes Gaussian mixtures. Later, the results are extended for the multi-class Gaussian mixtures. Simulation studies as well as an application to gioblastoma gene expression data are presented and discussed.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62C20 Minimax procedures in statistical decision theory
62H35 Image analysis in multivariate analysis
62P10 Applications of statistics to biology and medical sciences; meta analysis


Full Text: DOI Euclid