CHIME swMATH ID: 28514 Software Authors: Cai, T. Tony; Ma, Jing; Zhang, Linjun Description: CHIME: clustering of high-dimensional Gaussian mixtures with EM algorithm and its optimality. Unsupervised learning is an important problem in statistics and machine learning with a wide range of applications. In this paper, we study clustering of high-dimensional Gaussian mixtures and propose a procedure, called CHIME, that is based on the EM algorithm and a direct estimation method for the sparse discriminant vector. Both theoretical and numerical properties of CHIME are investigated. We establish the optimal rate of convergence for the excess misclustering error and show that CHIME is minimax rate optimal. In addition, the optimality of the proposed estimator of the discriminant vector is also established. Simulation studies show that CHIME outperforms the existing methods under a variety of settings. The proposed CHIME procedure is also illustrated in an analysis of a glioblastoma gene expression data set and shown to have superior performance. Clustering of Gaussian mixtures in the conventional low-dimensional setting is also considered. The technical tools developed for the high-dimensional setting are used to establish the optimality of the clustering procedure that is based on the classical EM algorithm. Homepage: http://www-stat.wharton.upenn.edu/~tcai/paper/CHIME.pdf Keywords: high-dimensional data; unsupervised learning; Gaussian mixture model; EM algorithm; misclustering error; minimax optimality Related Software: R; mvtnorm; ICtest; tidyr; REPPlab; ggpubr; Rmixmod; GGally; dplyr; RColorBrewer; ggplot2; MASS (R); mclust; VanHuffel; boost; libPLS; L1-MAGIC; ElemStatLearn; CLEMM; GitHub Cited in: 9 Publications Standard Articles 1 Publication describing the Software, including 1 Publication in zbMATH Year CHIME: clustering of high-dimensional Gaussian mixtures with EM algorithm and its optimality. Zbl 1428.62182Cai, T. Tony; Ma, Jing; Zhang, Linjun 2019 all top 5 Cited by 26 Authors 2 Cai, Tony Tony 2 Zhang, Linjun 1 Cao, Lixiong 1 Dwivedi, Raaz 1 Gao, Chao 1 Han, Feiyang 1 Ho, Nhat 1 Jordan, Michael Irwin 1 Khamaru, Koulik 1 Li, Yuanzhi 1 Liu, Jie 1 Ma, Jing 1 Ma, Zongming 1 Mai, Qing 1 Meng, Xianghua 1 Nordhausen, Klaus 1 Radojičić, Una 1 Sun, Yuekai 1 Virta, Joni 1 Wainwright, Martin J. 1 Wang, Wenjing 1 Yang, Dongmin 1 Yu, Bin 1 Yu, ZhongBo 1 Zhang, Xin 1 Zhao, Ruofei Cited in 4 Serials 4 The Annals of Statistics 3 Electronic Journal of Statistics 1 Computer Methods in Applied Mechanics and Engineering 1 Journal of Computational and Applied Mathematics Cited in 4 Fields 9 Statistics (62-XX) 2 Probability theory and stochastic processes (60-XX) 2 Numerical analysis (65-XX) 1 Computer science (68-XX) Citations by Year