×

zbMATH — the first resource for mathematics

Model-free feature screening for ultrahigh dimensional classification. (English) Zbl 1440.62110
Summary: In this paper, a new model-free feature screening method based on classification accuracy of marginal classifiers is proposed for ultrahigh dimensional classification. Different from existing methods, which use the differences of means or differences of conditional cumulative distribution functions between classes as the screening indexes, we propose a new feature screening method to rank the importance of predictors based on classification accuracy of marginal classifiers. For each variable, we construct the corresponding marginal classifier according to the Bayes rule and thus classification accuracy of these marginal classifiers can be used as effective feature screening indexes to select all important variables. Not only for a fixed number of classes but also for a diverging number of classes, we can prove that the proposed method enjoys the sure screening property under some regularity conditions. Finally, simulations and the real data analysis well demonstrate good performance of the proposed method in comparison with existing methods.
MSC:
62G05 Nonparametric estimation
62H30 Classification and discrimination; cluster analysis (statistical aspects)
Software:
e1071
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Alon, U.; Barkai, N.; Notterman, D. A.; Gish, K.; Ybarra, S.; Mack, D.; Levine, A. J., Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proc. Natl. Acad. Sci., 96, 12, 6745-6750 (1999)
[2] Cui, H.; Li, R.; Zhong, W., Model-free feature screening for ultrahigh dimensional discriminant analysis, J. Amer. Statist. Assoc., 110, 510, 630-641 (2015) · Zbl 1373.62305
[3] Fan, J.; Fan, Y., High dimensional classification using features annealed independence rules, Ann. Statist., 36, 6, 2605 (2008) · Zbl 1360.62327
[4] Fan, J.; Feng, Y.; Jiang, J.; Tong, X., Feature augmentation via nonparametrics and selection (FANS) in high-dimensional classification, J. Amer. Statist. Assoc., 111, 513, 275-287 (2016)
[5] Fan, J.; Feng, Y.; Song, R., Nonparametric independence screening in sparse ultra-high-dimensional additive models, J. Amer. Statist. Assoc., 106, 494, 544-557 (2011) · Zbl 1232.62064
[6] Fan, J.; Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc., 96, 456, 1348-1360 (2001) · Zbl 1073.62547
[7] Fan, J.; Lv, J., Sure independence screening for ultrahigh dimensional feature space, J. R. Stat. Soc. Ser. B Stat. Methodol., 70, 5, 849-911 (2008) · Zbl 1411.62187
[8] Fan, J.; Song, R., Sure independence screening in generalized linear models with NP-dimensionality, Ann. Statist., 38, 6, 3567-3604 (2010) · Zbl 1206.68157
[9] Hoeffding, W., Probability inequalities for sums of bounded random variables, (The Collected Works of Wassily Hoeffding (1994), Springer), 409-426
[10] Li, R.; Zhong, W.; Zhu, L., Feature screening via distance correlation learning, J. Amer. Statist. Assoc., 107, 499, 1129-1139 (2012) · Zbl 1443.62184
[11] Mai, Q.; Zou, H., The Kolmogorov filter for variable screening in high-dimensional binary classification, Biometrika, 100, 1, 229-234 (2012) · Zbl 1452.62456
[12] Mai, Q.; Zou, H., The fused Kolmogorov filter: A nonparametric model-free screening method, Ann. Statist., 43, 4, 1471-1497 (2015) · Zbl 1431.62216
[13] Meyer, D.; Dimitriadou, E.; Hornik, K.; Weingessel, A.; Leisch, F.; Chang, C.-C.; Lin, C.-C., Package ‘e1071’, R J (2019)
[14] Nutt, C. L.; Mani, D.; Betensky, R. A.; Tamayo, P.; Cairncross, J. G.; Ladd, C.; Pohl, U.; Hartmann, C.; McLaughlin, M. E.; Batchelor, T. T., Gene expression-based classification of malignant gliomas correlates better with survival than histological classification, Cancer Res., 63, 7, 1602-1607 (2003)
[15] Pan, R.; Wang, H.; Li, R., Ultrahigh-dimensional multiclass linear discriminant analysis by pairwise sure independence screening, J. Amer. Statist. Assoc., 111, 513, 169-179 (2016)
[16] Rigollet, P.; Vert, R., Optimal rates for plug-in estimators of density level sets, Bernoulli, 15, 4, 1154-1178 (2009) · Zbl 1200.62034
[17] Tibshirani, R., Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B Stat. Methodol., 58, 1, 267-288 (1996) · Zbl 0850.62538
[18] Tong, X., A plug-in approach to Neyman-Pearson classification, J. Mach. Learn. Res., 14, 1, 3011-3040 (2013) · Zbl 1318.62219
[19] Xu, C.; Chen, J., The sparse MLE for ultrahigh-dimensional feature screening, J. Amer. Statist. Assoc., 109, 507, 1257-1269 (2014) · Zbl 1368.62295
[20] Zhang, T., Statistical analysis of some multi-category large margin classification methods, J. Mach. Learn. Res., 5, Oct, 1225-1251 (2004) · Zbl 1222.68344
[21] Zhang, C.-H., Nearly unbiased variable selection under minimax concave penalty, Ann. Statist., 38, 2, 894-942 (2010) · Zbl 1183.62120
[22] Zhu, L.-P.; Li, L.; Li, R.; Zhu, L.-X., Model-free feature screening for ultrahigh-dimensional data, J. Amer. Statist. Assoc., 106, 496, 1464-1475 (2011) · Zbl 1233.62195
[23] Zou, H., The adaptive lasso and its oracle properties, J. Amer. Statist. Assoc., 101, 476, 1418-1429 (2006) · Zbl 1171.62326
[24] Zou, H.; Hastie, T., Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B Stat. Methodol., 67, 2, 301-320 (2005) · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.