×

Feature subset selection for logistic regression via mixed integer optimization. (English) Zbl 1352.90068

Summary: This paper concerns a method of selecting a subset of features for a logistic regression model. Information criteria, such as the Akaike information criterion and Bayesian information criterion, are employed as a goodness-of-fit measure. The purpose of our work is to establish a computational framework for selecting a subset of features with an optimality guarantee. For this purpose, we devise mixed integer optimization formulations for feature subset selection in logistic regression. Specifically, we pose the problem as a mixed integer linear optimization problem, which can be solved with standard mixed integer optimization software, by making a piecewise linear approximation of the logistic loss function. The computational results demonstrate that when the number of candidate features was less than 40, our method successfully provided a feature subset that was sufficiently close to an optimal one in a reasonable amount of time. Furthermore, even if there were more candidate features, our method often found a better subset of features than the stepwise methods did in terms of information criteria.

MSC:

90C11 Mixed integer programming
90C27 Combinatorial optimization

Software:

UCI-ml; aplore3
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control. 19, 716-723 (1974). doi:10.1109/TAC.1974.1100705 · Zbl 0314.62039 · doi:10.1109/TAC.1974.1100705
[2] Arthanari, T.S., Dodge, Y.: Mathematical Programming in Statistics. Wiley, New York (1981) · Zbl 0549.62002
[3] Altman, E.I.: Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J. Financ. 23, 589-609 (1968). doi:10.1111/j.1540-6261.1968.tb00843.x · doi:10.1111/j.1540-6261.1968.tb00843.x
[4] Bache, K., Lichman, M.: UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science (2013)
[5] Beale, E.M.L.: Two transportation problems. In: Kreweras, G., Morlat, G. (eds.) Proceedings of the Third International Conference on Operational Research, pp. 780-788 (1963) · Zbl 1188.90280
[6] Beale, EML; Tomlin, JA; Lawrence, J. (ed.), Special facilities in a general mathematical programming system for non-convex problems using ordered sets of variables, 447-454 (1970), London, UK
[7] Bertsimas, D., King, A., Mazumder, R.: Best subset selection via a modern optimization lens. arXiv preprint, arXiv:1507.03133 · Zbl 1335.62115
[8] Bertsimas, D., Shioda, R.: Algorithm for cardinality-constrained quadratic optimization. Comput. Optim. Appl. 43, 1-22 (2009). doi:10.1007/s10589-007-9126-9 · Zbl 1178.90262 · doi:10.1007/s10589-007-9126-9
[9] Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97, 245-271 (1997). doi:10.1016/S0004-3702(97)00063-5 · Zbl 0904.68142 · doi:10.1016/S0004-3702(97)00063-5
[10] Burnham, K.P., Anderson, D.R.: Model Selection and Multimodel Inference: A Practical Information-theoretic Approach, 2nd edn. Springer, New York (2002). doi:10.1007/b97636 · Zbl 1005.62007 · doi:10.1007/b97636
[11] Chen, X.: An improved branch and bound algorithm for feature selection. Pattern Recognit. Lett. 24, 1925-1933 (2003). doi:10.1016/S0167-8655(03)00020-5 · doi:10.1016/S0167-8655(03)00020-5
[12] Cormack, G.V.: Email spam filtering: a systematic review. Found. Trends Inf. Retr. 1, 335-455 (2007). doi:10.1561/1500000006 · doi:10.1561/1500000006
[13] Efroymson, MA; Ralston, A. (ed.); Wilf, HS (ed.), Multiple regression analysis, 191-203 (1960), New York
[14] George, E.I.: The variable selection problem. J. Am. Stat. Assoc. 95, 1304-1308 (2000). doi:10.1080/01621459.2000.10474336 · Zbl 1018.62050 · doi:10.1080/01621459.2000.10474336
[15] Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389-422 (2002). doi:10.1023/A:1012487302797 · Zbl 0998.68111 · doi:10.1023/A:1012487302797
[16] Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157-1182 (2003) · Zbl 1102.68556
[17] Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2009). doi:10.1007/978-0-387-84858-7 · Zbl 1273.62005 · doi:10.1007/978-0-387-84858-7
[18] Hosmer Jr., D.W., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression, 3rd edn. Wiley, Hoboken (2013) · Zbl 1276.62050 · doi:10.1002/9781118548387
[19] Huberty, C.J.: Issues in the use and interpretation of discriminant analysis. Psychol. Bull. 95, 156-171 (1984) · doi:10.1037/0033-2909.95.1.156
[20] Huberty, C.J.: Problems with stepwise methods-better alternatives. Adv. Soc. Sci. Methodol. 1, 43-70 (1989)
[21] James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning. Springer, New York (2013) · Zbl 1281.62147 · doi:10.1007/978-1-4614-7138-7
[22] Koh, K., Kim, S., Boyd, S.: An interior-point method for large-scale \[\ell_1\] ℓ1-regularized logistic regression. J. Mach. Learn. Res. 8, 1519-1555 (2007) · Zbl 1222.62092
[23] Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273-324 (1997). doi:10.1016/S0004-3702(97)00043-X · Zbl 0904.68143 · doi:10.1016/S0004-3702(97)00043-X
[24] Konno, H., Takaya, Y.: Multi-step methods for choosing the best set of variables in regression analysis. Comput. Optim. Appl. 46, 417-426 (2010). doi:10.1007/s10589-008-9193-6 · Zbl 1200.62076 · doi:10.1007/s10589-008-9193-6
[25] Konno, H., Yamamoto, R.: A mean-variance-skewness model: algorithm and applications. Int. J. Theor. Appl. Financ. 8, 409-423 (2005). doi:10.1142/S0219024905003116 · Zbl 1139.91335 · doi:10.1142/S0219024905003116
[26] Konno, H., Yamamoto, R.: Choosing the best set of variables in regression analysis using integer programming. J. Glob. Optim. 44, 273-282 (2009). doi:10.1007/s10898-008-9323-9 · Zbl 1178.62069 · doi:10.1007/s10898-008-9323-9
[27] Lee, S., Lee, H., Abbeel, P., Ng, A.Y.: Efficient \[L_1\] L1 regularized logistic regression. In: Proceedings of the Twenty-First National Conference on Artificial Intelligence. AAAI Press, Menlo Park, pp. 401-408 (2006)
[28] Liu, H., Motoda, H. (eds.): Computational Methods of Feature Selection. Chapman & Hall/CRC, Boca Raton (2007) · Zbl 1130.62118
[29] McFadden, D.; Zarembka, P. (ed.), Conditional logit analysis of qualitative choice behavior, 105-142 (1974), New York
[30] Mallows, C.L.: Some comments on \[C_p\] Cp. Technometrics 15, 661-675 (1973). doi:10.1080/00401706.1973.10489103 · Zbl 0269.62061 · doi:10.1080/00401706.1973.10489103
[31] Miyashiro, R., Takano, Y.: Mixed integer second-order cone programming formulations for variable selection in linear regression. Eur. J. Oper. Res. 247, 721-731 (2015). doi:10.1016/j.ejor.2015.06.081 · Zbl 1346.90616 · doi:10.1016/j.ejor.2015.06.081
[32] Miyashiro, R., Takano, Y.: Subset selection by Mallows’ \[C_p\] Cp: a mixed integer programming approach. Expert Syst. Appl. 42, 325-331 (2015). doi:10.1016/j.eswa.2014.07.056 · doi:10.1016/j.eswa.2014.07.056
[33] Nakariyakul, S., Casasent, D.P.: Adaptive branch and bound algorithm for selecting optimal features. Pattern Recognit. Lett. 28, 1415-1427 (2007). doi:10.1016/j.patrec.2007.02.015 · doi:10.1016/j.patrec.2007.02.015
[34] Narendra, P.M., Fukunaga, K.: A branch and bound algorithm for feature subset selection. IEEE Trans. Comput. C-26, 917-922 (1977). doi:10.1109/TC.1977.1674939 · Zbl 0363.68059 · doi:10.1109/TC.1977.1674939
[35] Pacheco, J., Casado, S., Núñez, L.: A variable selection method based on tabu search for logistic regression models. Eur. J. Oper. Res. 199, 506-511 (2009). doi:10.1016/j.ejor.2008.10.007 · Zbl 1176.90268 · doi:10.1016/j.ejor.2008.10.007
[36] Sato, T., Takano, Y., Miyashiro, R.: Piecewise-linear approximation for feature subset selection in a sequential logit model. arXiv preprint, arXiv:1510.05417 · Zbl 1371.90068
[37] Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461-464 (1978) · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[38] Somol, P., Pudil, P., Kittler, J.: Fast branch & bound algorithms for optimal feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 26, 900-912 (2004). doi:10.1109/TPAMI.2004.28 · doi:10.1109/TPAMI.2004.28
[39] Unler, A., Murat, A.: A discrete particle swarm optimization method for feature selection in binary classification problems. Eur. J. Oper. Res. 206, 528-539 (2010). doi:10.1016/j.ejor.2010.02.032 · Zbl 1188.90280 · doi:10.1016/j.ejor.2010.02.032
[40] Yamamoto, R., Konno, H.: An efficient algorithm for solving mean-variance model under transaction costs. Pac. J. Optim. 2, 367-384 (2006) · Zbl 1142.90022
[41] Yu, B., Yuan, B.: A more efficient branch and bound algorithm for feature selection. Pattern Recognit. 26, 883-889 (1993). doi:10.1016/0031-3203(93)90054-Z · doi:10.1016/0031-3203(93)90054-Z
[42] Yusta, S.C.: Different metaheuristic strategies to solve the feature selection problem. Pattern Recognit. Lett. 30, 525-534 (2009). doi:10.1016/j.patrec.2008.11.012 · doi:10.1016/j.patrec.2008.11.012
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.