Jin, Jiashun; Wang, Wanjie Influential features PCA for high dimensional clustering. (English) Zbl 1359.62249 Ann. Stat. 44, No. 6, 2323-2359 (2016). Starting from the problem of clustering using gene microarray data, the paper approaches the situation when the feature vectors come from different classes the labels of which are unknown. The authors propose as solution to this problem the influential features PCA (IF-PCA) technique as a new spectral clustering method, along with the Kolmogorov-Smirnov (K-S) score. Since the performance of IF-PCA depends on the choice of the corresponding threshold, the Higher Criticism (H-C) technique is used accordingly. The model is applied to ten different microarray medical datasets (brain, breast cancer, leukemia, etc.) and compared with other clustering methods, and the method is proved to be efficient. Reviewer: Florin Gorunescu (Craiova) Cited in 5 ReviewsCited in 9 Documents MSC: 62H30 Classification and discrimination; cluster analysis (statistical aspects) 62H25 Factor analysis and principal components; correspondence analysis 62G10 Nonparametric hypothesis testing 62G32 Statistics of extreme values; tail inference 62E20 Asymptotic distribution theory in statistics 62P10 Applications of statistics to biology and medical sciences; meta analysis Keywords:empirical null; feature selection; gene microarray; Hamming distance; phase transition; post-selection spectral clustering; sparsity PDF BibTeX XML Cite \textit{J. Jin} and \textit{W. Wang}, Ann. Stat. 44, No. 6, 2323--2359 (2016; Zbl 1359.62249) Full Text: DOI arXiv OpenURL