Feature elimination in kernel machines in moderately high dimensions. (English) Zbl 1420.68167

Summary: We develop an approach for feature elimination in statistical learning with kernel machines, based on recursive elimination of features. We present theoretical properties of this method and show that it is uniformly consistent in finding the correct feature space under certain generalized assumptions. We present a few case studies to show that the assumptions are met in most practical situations and present simulation results to demonstrate performance of the proposed approach.


68T05 Learning and adaptive systems in artificial intelligence
62H30 Classification and discrimination; cluster analysis (statistical aspects)
Full Text: DOI arXiv Euclid


[1] Aksu, Y. (2014). Fast SVM-based feature elimination utilizing data radius, hard-margin, soft-margin. Preprint. Available at arXiv:1210.4460v4.
[2] Aksu, Y., Miller, D. J., Kesidis, G. and Yang, Q. X. (2010). Margin-maximizing feature elimination methods for linear and nonlinear kernel-based discriminant functions. IEEE Trans. Neural Netw.21 701–717.
[3] Allen, G. I. (2013). Automatic feature selection via weighted kernels and regularization. J. Comput. Graph. Statist.22 284–299.
[4] Bradley, P. S. and Mangasarian, O. L. (1998). Feature selection via concave minimization and support vector machines. In Machine Learning Proceedings of the Fifteenth International Conference (ICML 1998) 82–90. Morgan Kaufmann, San Francisco, CA.
[5] Chapelle, O., Haffner, P. and Vapnik, V. N. (1999). Support vector machines for histogram-based image classification. IEEE Trans. Neural Netw.10 1055–1064.
[6] Dasgupta, S., Goldberg, Y. and Kosorok, M. R (2019). Supplement to “Feature elimination in kernel machines in moderately high dimensions.” DOI:10.1214/18-AOS1696SUPP.
[7] Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B. Stat. Methodol.70 849–911.
[8] Goldberg, Y. and Kosorok, M. R. (2017). Support vector regression for right censored data. Electron. J. Stat.11 532–569. · Zbl 1390.62195
[9] Guyon, I., Weston, J., Barnhill, S. and Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Mach. Learn.46 389–422. · Zbl 0998.68111
[10] Hidalgo-Muñoz, A. R., López, M. M., Santos, I. M., Pereira, A. T., Vázquez-Marrufo, M., Galvao-Carmona, A. and Tomé, A. M. (2013). Application of SVM-RFE on EEG signals for detecting the most relevant scalp regions linked to affective valence processing. Expert Syst. Appl.40 2102–2108.
[11] Hu, X., Schwarz, J. K., Lewis, J. S., Huettner, P. C., Rader, J. S., Deasy, J. O., Grigsby, P. W. and Wang, X. (2010). A microRNA expression signature for cervical cancer prognosis. Cancer Res.70 1441–1448.
[12] Jolliffe, I. T. (2002). Principal Component Analysis, 2nd ed. Springer, New York. · Zbl 1011.62064
[13] Leslie, C. S., Eskin, E., Cohen, A., Weston, J. and Noble, W. S. (2004). Mismatch string kernels for discriminative protein classification. Bioinformatics20 467–476.
[14] Li, R., Zhong, W. and Zhu, L. (2012). Feature screening via distance correlation learning. J. Amer. Statist. Assoc.107 1129–1139. · Zbl 1443.62184
[15] Liu, Y., Zhang, H. H., Park, C. and Ahn, J. (2007). Support vector machines with adaptive \(L_{q}\) penalty. Comput. Statist. Data Anal.51 6380–6394. · Zbl 1446.62179
[16] Louw, N. and Steel, S. J. (2006). Variable selection in kernel Fisher discriminant analysis by means of recursive feature elimination. Comput. Statist. Data Anal.51 2043–2055. · Zbl 1157.62440
[17] Micchelli, C. A., Xu, Y. and Zhang, H. (2006). Universal kernels. J. Mach. Learn. Res.7 2651–2667. · Zbl 1222.68266
[18] Mundra, P. and Rajapakse, J. C. (2010). SVM-RFE with MRMR filter for gene selection. IEEE Trans. Nanobiosci.9 31–37.
[19] Rakotomamonjy, A. (2003). Variable selection using SVM-based criteria. J. Mach. Learn. Res.3 1357–1370. · Zbl 1102.68583
[20] Schiele, B. and Crowley, J. L. (1996). Object recognition using multidimensional receptive field histograms. In Computer Vision ECCV’96 610–619. Springer, Berlin.
[21] Steinwart, I. and Christmann, A. (2008). Support Vector Machines. Springer, New York. · Zbl 1203.68171
[22] Steinwart, I. and Scovel, C. (2007). Fast rates for support vector machines using Gaussian kernels. Ann. Statist.35 575–607. · Zbl 1127.68091
[23] Swain, M. J. and Ballard, D. H. (1992). Indexing via color histograms. In Active Perception and Robot Vision 261–273. Springer, Berlin.
[24] Tang, Y., Zhang, Y. Q. and Huang, Z. (2007). Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis. IEEE/ACM Trans. Comput. Biol. Bioinform.4 365–381.
[25] Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist.32 135–166. · Zbl 1105.62353
[26] Wang, L., Zhu, J. and Zou, H. (2006). The doubly regularized support vector machine. Statist. Sinica16 589–615. · Zbl 1126.68070
[27] Weston, J., Elisseeff, A., Schölkopf, B. and Tipping, M. (2003). Use of the zero-norm with linear models and kernel methods. J. Mach. Learn. Res.3 1439–1461. · Zbl 1102.68605
[28] Zhang, H. H., Ahn, J., Lin, X. and Park, C. (2006a). Gene selection using support vector machines with non-convex penalty. Bioinformatics22 88–95.
[29] Zhang, X., Lu, X., Shi, Q., Xu, X., Hon-chiu, E. L., Harris, L. N., Iglehart, J. D., Miron, A., Liu, J. S. and Wong, W. H. (2006b). Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinform.7 Art. ID 197.
[30] Zhu, J., Rosset, S., Hastie, T. and Tibshirani, R. (2003). 1-norm support vector machines. In Neural Information Processing Systems 16 49–56.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.