Integer programming models for feature selection: new extensions and a randomized solution algorithm. (English) Zbl 1346.90605

Summary: Feature selection methods are used in machine learning and data analysis to select a subset of features that may be successfully used in the construction of a model for the data. These methods are applied under the assumption that often many of the available features are redundant for the purpose of the analysis. In this paper, we focus on a particular method for feature selection in supervised learning problems, based on a linear programming model with integer variables. For the solution of the optimization problem associated with this approach, we propose a novel robust metaheuristics algorithm that relies on a Greedy Randomized Adaptive Search Procedure, extended with the adoption of short memory and a local search strategy. The performances of our heuristic algorithm are successfully compared with those of well-established feature selection methods, both on simulated and real data from biological applications. The obtained results suggest that our method is particularly suited for problems with a very large number of binary or categorical features.


90C10 Integer programming
62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI


[1] Aha, D.; Kibler, D., Instance-based learning algorithms, Machine Learning, 6, 37-66, (1991)
[2] Almuallim, H.; Dietterich, T. G., Learning Boolean concepts in the presence of many irrelevant features, Artificial Intelligence, 69, 1, 279-305, (1994) · Zbl 0942.68657
[3] Bermejo, P.; Gámez, J. A.; Puerta, J. M., A grasp algorithm for fast hybrid (filter-wrapper) feature subset selection in high-dimensional datasets, Pattern Recognition Letters, 32, 5, 701-711, (2011)
[4] Bertolazzi, P.; Felici, G.; Festa, P.; Lancia, G., Logic classification and feature selection for biomedical data, Computers and Mathematics with Applications, 55, 889-899, (2008) · Zbl 1139.92013
[5] Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A., A review of feature selection methods on synthetic data, Knowledge and Information Systems, 34, 3, 483-519, (2013)
[6] Boros, E.; Ibaraki, T.; Makino, K., Logical analysis of binary data with missing bits, Artificial Intelligence, 107, 219-263, (1999) · Zbl 0996.68067
[7] Bradley, O.; Mangasarian, O., Feature selection via concave minimization and support vector machines, Proceedings of the fifteenth international conference on machine learning, ICML98, 82-90, (1998), Morgan Kaufmann San Francisco, California
[8] Carrizosa, E.; Martin-Barragan, B.; Morales, D. R., Multi-group support vector machines with measurement costs: a biobjective approach, Discrete Applied Mathematics, 156, 6, 950-966, (2008) · Zbl 1152.90536
[9] Cristianini, N.; Shawe-Taylor, J., An introduction to support vector machines and other kernel-based learning methods, (2000), Cambridge University Press
[10] Dagliyan, O.; Uney-Yuksektepe, F.; Kavakli, I. H.; Turkay, M., Optimization based tumor classification from microarray gene expression data, PLoS One, 6, 2, e14579, (2011)
[11] Dasarathy, B. V., Nearest neighbor NN norms: NN pattern classification techniques, (1991), IEEE Computer Society Press
[12] Devijver, P. A.; Kittler, J., Pattern recognition: a statistical approach: Vol. 761, (1982), Prentice-Hall London · Zbl 0476.68057
[13] Feo, T.; Resende, M., A probabilistic heuristic for a computationally difficult set covering problem, Operations Research Letters, 8, 67-71, (1989) · Zbl 0675.90073
[14] Feo, T.; Resende, M., Greedy randomized adaptive search procedures, Journal of Global Optimization, 6, 109-133, (1995) · Zbl 0822.90110
[15] Festa, P.; Resende, M., GRASP: an annotated bibliography, (Hansen, P.; Rebeiro, C. C., Essays and surveys on metaheuristics, (2002), Kluwer Academic Publishers), 325-367 · Zbl 1017.90001
[16] Festa, P.; Resende, M., GRASP: basic components and enhancements, Telecommunication Systems, 46, 253-271, (2011)
[17] Forman, G., An extensive empirical study of feature selection metrics for text classification, The Journal of Machine Learning Research, 3, 1289-1305, (2003) · Zbl 1102.68553
[18] Garey, M.; Johnson, D., Computers and intractability: a guide to the theory of NP-completeness (Series of books in the mathematical sciences), (1979), W.H. Freeman
[19] Guyon, I.; Elisseeff, A., An introduction to variable and feature selection, Journal of Machine Learning Research, 3, 1157-1182, (2003) · Zbl 1102.68556
[20] Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I. H., The WEKA data mining software: an update, ACM SIGKDD Explorations Newsletter, 11, 1, 10-18, (2009)
[21] Hall, M. A., Correlation-based feature selection for machine learning, (1999), The University of Waikato, Ph.D. thesis
[22] Hall, M. A.; Smith, L. A., Practical feature subset selection for machine learning, Proceedings of the 21st Australian computer science conference, 181-191, (1998), Springer
[23] Hastie, T.; Tibshirani, R.; Friedman, J.; Franklin, J., The elements of statistical learning: data mining, inference and prediction, The Mathematical Ingelligencer, 27-3, 83-85, (2005)
[24] Iannarilli Jr., F. J.; Rubin, P. A., Feature selection for multiclass discrimination via mixed-integer linear programming, IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 6, 779-783, (2003)
[25] John, G. H.; Kohavi, R.; Pfleger, K., Irrelevant features and the subset selection problem, Proceedings of international conference on machine learning, ICML: Vol. 94, 121-129, (1994)
[26] Keerthi, S.; Shevade, S.; Bhattacharyya, C.; Murthy, K., Improvements to platt’s SMO algorithm for SVM classifier design, Neural Computation, 13, 3, 637-649, (2001) · Zbl 1085.68629
[27] Kira, K.; Rendell, L., A practical approach to feature selection, Proceedings of international conference on machine learning, ICML: Vol. 11, 249-256, (1992)
[28] Kira, K.; Rendell, L. A., The feature selection problem: traditional methods and a new algorithm, Proceedings of conference on artificial intelligence, AAAI, 129-134, (1992)
[29] Lan, L.; Vucetic, S., Multi-task feature selection in microarray data by binary integer programming, BMC Proceedings, 7, Suppl 7, S5, (2013)
[30] Liu, H.; Li, J.; Wong, L., A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns, Genome Informatics Series, 51-60, (2002)
[31] Liu, H.; Motoda, H., Feature selection for knowledge discovery and data mining, (2000), Kluwer Academic Publishers
[32] Liu, H.; Setiono, R., A probabilistic approach to feature selection-a filter solution, Proceedings of international conference on machine learning, ICML: Vol. 96, 319-327, (1996), Citeseer
[33] Maldonado, S.; Pérez, J.; Weber, R.; Labbé, M., Feature selection for support vector machines via mixed integer linear programming, Information Sciences, 279, 163-175, (2014) · Zbl 1354.68226
[34] Meiri, R.; Zahavi, J., Using simulated annealing to optimize the feature selection problem in marketing applications, European Journal of Operational Research, 171, 842-858, (2006) · Zbl 1116.90069
[35] Peter, T. J.; Somasundaram, K., Study and development of novel feature selection framework for heart disease prediction, International Journal of Scientific and Research Publications, 2, 10, 1-7, (2012)
[36] Li, S. H.; Yen, D. C.; Lu, W. H.; Wang, C., Identifying the signs of fraudulent accounts using data mining techniques, Computers in Human Behavior, 28, 3, 1002-1013, (2012)
[37] Piramuthu, S., Evaluating feature selection methods for learning in data mining applications, European Journal of Operational Research, 156, 483494, (2004)
[38] Platt, J., Fast training of support vector machines using sequential minimal optimization, (Schoelkopf, B.; Burges, C.; Smola, A., Advances in kernel methods - support vector learning, (1998), MIT Press)
[39] Quinlan, J. R., Improved use of continuous attributes in C4.5, Journal of Artificial Intelligence Research, 4, 77-90, (1996) · Zbl 0900.68112
[40] Quinlan, R., C4.5: programs for machine learning, (1993), Morgan Kaufmann Publishers San Mateo, CA
[41] Rubin, P. A., Heuristic solution procedures for a mixed-integer programming discriminant model, Managerial and Decision Economics, 11, 4, 255-266, (1990)
[42] Shannon, C., A mathematical theory of comunication, Bell System Technical Journal, 27, 379-423623-656, (1948)
[43] Swiniarski, R. W.; Skowron, A., Rough set methods in feature selection and recognition, Pattern Recognition Letters, 24, 6, 833-849, (2003) · Zbl 1053.68093
[44] Unler, A.; Murat, A., A discrete particle swarm optimization method for feature selection in binary classication problems, European Journal of Operational Research, 206, 528-539, (2010) · Zbl 1188.90280
[45] Weitschek, E.; Presti, A. L.; Drovandi, G.; Felici, G.; Ciccozzi, M.; Ciotti, M.; Bertolazzi, P., Human polyomaviruses identification by logic mining techniques, Virology Journal, 9, 1, 1-6, (2012)
[46] Weitschek, E.; Velzen, R.; Felici, G.; Bertolazzi, P., Blog 2.0: a software system for character-based species classification with DNA barcode sequences. what it does, how to use it, Molecular Ecology Resources, 13, 6, 1043-1046, (2013)
[47] Witten, I.; Frank, E., Data mining: practical machine learning tools and techniques, (2005), Morgan Kaufmann · Zbl 1076.68555
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.