Influential features PCA for high dimensional clustering. (English) Zbl 1359.62249

Starting from the problem of clustering using gene microarray data, the paper approaches the situation when the feature vectors come from different classes the labels of which are unknown. The authors propose as solution to this problem the influential features PCA (IF-PCA) technique as a new spectral clustering method, along with the Kolmogorov-Smirnov (K-S) score. Since the performance of IF-PCA depends on the choice of the corresponding threshold, the Higher Criticism (H-C) technique is used accordingly. The model is applied to ten different microarray medical datasets (brain, breast cancer, leukemia, etc.) and compared with other clustering methods, and the method is proved to be efficient.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62H25 Factor analysis and principal components; correspondence analysis
62G10 Nonparametric hypothesis testing
62G32 Statistics of extreme values; tail inference
62E20 Asymptotic distribution theory in statistics
62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text: DOI arXiv