Feature selection for support vector machines via mixed integer linear programming. (English) Zbl 1354.68226

Summary: The performance of classification methods, such as Support Vector Machines, depends heavily on the proper choice of the feature set used to construct the classifier. Feature selection is an NP-hard problem that has been studied extensively in the literature. Most strategies propose the elimination of features independently of classifier construction by exploiting statistical properties of each of the variables, or via greedy search. All such strategies are heuristic by nature. In this work we propose two different Mixed Integer Linear Programming formulations based on extensions of Support Vector Machines to overcome these shortcomings. The proposed approaches perform variable selection simultaneously with classifier construction using optimization models. We ran experiments on real-world benchmark datasets, comparing our approaches with well-known feature selection techniques and obtained better predictions with consistently fewer relevant features.


68T05 Learning and adaptive systems in artificial intelligence
62H30 Classification and discrimination; cluster analysis (statistical aspects)
90C11 Mixed integer programming


Full Text: DOI Link


[1] Ali, S.; Smith-Miles, K. A., On learning algorithm selection for classification, Appl. Soft Comput., 6, 119-138, (2006)
[2] U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, A.J. Levine, Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligo-nucleotide arrays, in: Proceedings of the National Academy of Sciences, pp. 6745-6750.
[3] A. Asuncion, D. Newman, UCI Machine Learning Repository, 2007.
[4] P. Bradley, O. Mangasarian, Feature selection vía concave minimization and support vector machines, in: Machine Learning proceedings of the Fifteenth International Conference (ICML’98) 82-90, San Francisco, California, Morgan Kaufmann.
[5] Carrizosa, E.; Martín-Barragán, B.; Romero-Morales, D., Multi-group support vector machines with measurement costs: a biobjective approach, Discrete Appl. Math., 156, 950-966, (2008) · Zbl 1152.90536
[6] Carrizosa, E.; Martín-Barragán, B.; Romero-Morales, D., Detecting relevant variables and interactions in supervised classification, Euro. J. Oper. Res., 213, 260-269, (2011)
[7] Guyon, I.; Gunn, S.; Nikravesh, M.; Zadeh, L. A., Feature extraction, foundations and applications, (2006), Springer Berlin
[8] Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V., Gene selection for cancer classification using support vector machines, Mach. Learn., 46, 389-422, (2002) · Zbl 0998.68111
[9] Hassan, R.; Othman, R. M.; Saad, P.; Kasim, S., A compact hybrid feature vector for an accurate secondary structure prediction, Inform. Sci., 181, 5267-5277, (2011)
[10] Iannarilli, F. J.; Rubin, P. A., Feature selection for multiclass discrimination via mixed-integer linear programming, IEEE Trans. Pattern Anal. Mach. Intell., 25, 779-783, (2003)
[11] Kittler, J., Pattern Recognition and Signal Processing, Pattern Recognition and Signal Processing, (1978), Sijthoff and Noordhoff Netherlands, pp. 41-60
[12] Maldonado, S.; López, J., Imbalanced data classification using second-order cone programming support vector machines, Pattern Recogn., 47, 2070-2079, (2014) · Zbl 1339.68227
[13] Maldonado, S.; Weber, R.; Basak, J., Kernel-penalized SVM for feature selection, Inform. Sci., 181, 115-128, (2011)
[14] Mangasarian, O. L.; Wild, E. W., Feature selection for nonlinear kernel support vector machines, (Seventh IEEE International Conference on Data Mining, (2007), IEEE Omaha, NE), 231-236
[15] Song, L.; Smola, A.; Gretton, A.; Bedo, J.; Borgwardt, K., Feature selection via dependence maximization, J. Mach. Learn. Res., 13, 1393-1434, (2012) · Zbl 1303.68110
[16] Turney, P., Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm, J. Artif. Intell. Res., 2, 369-409, (1995)
[17] Uncu, O.; Türksen, I. B., A novel feature selection approach: combining feature wrappers and filters, Inform. Sci., 177, 449-466, (2007) · Zbl 1142.68494
[18] Vapnik, V., Statistical learning theory, (1998), John Wiley and Sons · Zbl 0935.62007
[19] Victo Sudha George, G.; Cyril Raj, V., Review on feature selection techniques and the impact of svm for cancer classification using gene expression profile, Int. J. Comput. Sci. Eng. Surv., 2, 3, 16-27, (2011)
[20] Weston, J.; Elisseeff, A.; Schölkopf, B.; Tipping, M., The use of zero-norm with linear models and kernel methods, J. Mach. Learn. Res., 3, 1439-1461, (2003) · Zbl 1102.68605
[21] Yu, H.; Kim, J.; Kim, Y.; Hwang, S.; Lee, Y. H., An efficient method for learning nonlinear ranking SVM functions, Inform. Sci., 209, 37-48, (2012)
[22] Zhou, W.; Zhang, L.; Jiao, L., Linear programming support vector machines, Pattern Recogn., 35, 2927-2936, (2002) · Zbl 1010.68108
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.