Logistic regression: from art to science. (English) Zbl 1442.62166

Summary: A high quality logistic regression model contains various desirable properties: predictive power, interpretability, significance, robustness to error in data and sparsity, among others. To achieve these competing goals, modelers incorporate these properties iteratively as they hone in on a final model. In the period 1991-2015, algorithmic advances in Mixed-Integer Linear Optimization (MILO) coupled with hardware improvements have resulted in an astonishing 450 billion factor speedup in solving MILO problems. Motivated by this speedup, we propose modeling logistic regression problems algorithmically with a mixed integer nonlinear optimization (MINLO) approach in order to explicitly incorporate these properties in a joint, rather than sequential, fashion. The resulting MINLO is flexible and can be adjusted based on the needs of the modeler. Using both real and synthetic data, we demonstrate that the overall approach is generally applicable and provides high quality solutions in realistic timelines as well as a guarantee of suboptimality. When the MINLO is infeasible, we obtain a guarantee that imposing distinct statistical properties is simply not feasible.


62J12 Generalized linear models (logistic models)
62-08 Computational methods for problems pertaining to statistics
Full Text: DOI Euclid


[1] Bach, F. R. (2008). Consistency of the group lasso and multiple kernel learning. J. Mach. Learn. Res.9 1179-1225. · Zbl 1225.68147
[2] Bache, K. and Lichman, M. (2014). UCI machine learning repository. Available at http://archive.ics.uci.edu/ml. Accessed: 2014-08-20.
[3] Ben-Tal, A., El Ghaoui, L. and Nemirovski, A. (2009). Robust Optimization. Princeton Univ. Press, Princeton, NJ. · Zbl 1221.90001
[4] Berk, R., Brown, L., Buja, A., Zhang, K. and Zhao, L. (2013). Valid post-selection inference. Ann. Statist.41 802-837. · Zbl 1267.62080
[5] Bertsimas, D., Brown, D. B. and Caramanis, C. (2011). Theory and applications of robust optimization. SIAM Rev.53 464-501. · Zbl 1233.90259
[6] Bertsimas, D., Dunn, J., Pawlowski, C. and Zhuo, Y. D. (2017). Robust classification. J. Mach. Learn. Res. To appear.
[7] Bertsimas, D. and King, A. (2017). Supplement to “Logistic Regression: From Art to Science.” DOI:10.1214/16-STS602SUPP. · Zbl 1442.62166
[8] Bertsimas, D., King, A. and Mazumder, R. (2016). Best subset selection via a modern optimization lens. Ann. Statist.44 813-852. · Zbl 1335.62115 · doi:10.1214/15-AOS1388
[9] Bezanson, J., Karpinski, S., Shah, V. B. and Edelman, A. (2012). Julia: A fast dynamic language for technical computing. Preprint. Available at https://arxiv.org/abs/1209.5145. · Zbl 1356.68030
[10] Bianco, A. M. and Yohai, V. J. (1996). Robust estimation in the logistic regression model. In Robust Statistics, Data Analysis, and Computer Intensive Methods (Schloss Thurnau, 1994). Lect. Notes Stat.109 17-34. Springer, New York. · Zbl 0839.62030
[11] Bonami, P., Kilinç, M. and Linderoth, J. (2012). Algorithms and software for convex mixed integer nonlinear programs. In Mixed Integer Nonlinear Programming 1-39. Springer, Berlin. · Zbl 1242.90121
[12] Box, G. E. P. and Tidwell, P. W. (1962). Transformation of the independent variables. Technometrics 4 531-550. · Zbl 0114.10602
[13] Bussieck, M. R. and Vigerske, S. (2010). Minlp Solver Software. In Wiley Encyclopedia of Operations Research and Management Science. Wiley Online Library.
[14] Carroll, R. J. and Pederson, S. (1993). On robustness in the logistic regression model. J. R. Stat. Soc. Ser. B. Stat. Methodol.55 693-706. · Zbl 0794.62021
[15] Chatterjee, S., Hadi, A. S. and Price, B. (2012). Regression Analysis by Example, 5th ed. Wiley, New York. · Zbl 0946.62064
[16] Cramer, J. S. (2002). The origins of logistic regression. Technical report, Tinbergen Institute.
[17] Croux, C. and Haesbroeck, G. (2003). Implementing the Bianco and Yohai estimator for logistic regression. Comput. Statist. Data Anal.44 273-295. Special issue in honour of Stan Azen: a birthday celebration. · Zbl 1429.62317
[18] Czyzyk, J., Mesnier, M. P. and Moré, J. J. (1998). The neos server. J. Comput. Sci. Eng.5 68-75.
[19] Dobson, A. J. and Barnett, A. G. (2008). An Introduction to Generalized Linear Models, 3rd ed. CRC Press, Boca Raton, FL. · Zbl 1165.62049
[20] Dolan, E. D. (2001). Neos server 4.0 administrative guide. Preprint. Available at arXiv:cs/0107034.
[21] Duran, M. A. and Grossmann, I. E. (1986). An outer-approximation algorithm for a class of mixed-integer nonlinear programs. Math. Program.36 307-339. · Zbl 0619.90052
[22] Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Ann. Statist.7 1-26. · Zbl 0406.62024
[23] Eldar, Y. C. and Kutyniok, G. (2012). Compressed Sensing: Theory and Applications. Cambridge Univ. Press, London.
[24] Figueiredo, M. A. T. (2003). Adaptive sparseness for supervised learning. IEEE Trans. Pattern Anal. Mach. Intell.25 1150-1159.
[25] Fithian, W., Sun, D. and Taylor, J. (2014). Optimal inference after model selection. Preprint. Available at arXiv:1410.2597.
[26] Free Software Foundation (2015). GNU linear programming kit. Available at http://www.gnu.org/software/glpk/glpk.html. Accessed: 2015-03-06.
[27] Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw.33 1-22.
[28] Furnival, G. M. and Wilson, R. W. (1974). Regressions by leaps and bounds. Technometrics 16 499-511. · Zbl 0294.62079
[29] Gropp, W. and Moré, J. (1997). Optimization environments and the neos server. In Approximation Theory and Optimization 167-182. Cambridge Univ. Press, Cambridge, UK. · Zbl 1031.65075
[30] Hilbe, J. M. (2011). Logistic Regression Models. CRC Press, Boca Raton, FL. · Zbl 1225.92043
[31] Hosmer, D. W., Jovanovic, B. and Lemeshow, S. (1989). Best subsets logistic regression. Biometrics 45 1265-1270. · Zbl 0715.62125
[32] Hosmer Jr., D. W. and Lemeshow, S. (2013). Applied Logistic Regression. Wiley, Hoboken, NJ. · Zbl 1276.62050
[33] IBM ILOG CPLEX Optimization Studio (2015). Cplex optimizer. Available at http://www-01.ibm.com/software/commerce/optimization/cplex-optimizer/index.html. Accessed: 2015-03-06.
[34] Gurobi Inc. (2014). Gurobi optimizer reference manual. Available at http://www.gurobi.com. Accessed: 2014-08-20.
[35] Kim, Y., Kim, J. and Kim, Y. (2006). Blockwise sparse regression. Statist. Sinica 16 375-390. · Zbl 1096.62076
[36] Koh, K., Kim, S.-J. and Boyd, S. P. (2007). An interior-point method for large-scale \(l_1\)-regularized logistic regression. J. Mach. Learn. Res.8 1519-1555. · Zbl 1222.62092
[37] Krishnapuram, B., Carin, L., Figueiredo, M. A. T. and Hartemink, A. J. (2005). Sparse multinomial logistic regression: Fast algorithms and generalization bounds. IEEE Trans. Pattern Anal. Mach. Intell.27 957-968.
[38] Krishnapuram, B., Harternink, A. J., Carin, L. and Figueiredo, M. A. T. (2004). A Bayesian approach to joint feature selection and classifier design. IEEE Trans. Pattern Anal. Mach. Intell.26 1105-1111.
[39] Lee, S.-I., Lee, H., Abbeel, P. and Ng, A. Y. (2006). Efficient \(ℓ_1\) regularized logistic regression. In Proceedings of the National Conference on Artificial Intelligence 21 401. AAAI Press, Menlo Park, CA.
[40] Lockhart, R., Taylor, J., Tibshirani, R. J. and Tibshirani, R. (2014). A significance test for the Lasso. Ann. Statist.42 413-468. · Zbl 1305.62254
[41] Lubin, M. and Dunning, I. (2015). Computing in operations research using Julia. INFORMS J. Comput.27 238-248. · Zbl 1331.90001
[42] Ma, S., Song, X. and Huang, J. (2007). Supervised group lasso with applications to microarray data analysis. BMC Bioinformatics 8 60.
[43] Maronna, R., Martin, R. D. and Yohai, V. (2006). Robust Statistics. Wiley, Chichester. · Zbl 1094.62040
[44] Meier, L., Van De Geer, S. and Bühlmann, P. (2008). The group lasso for logistic regression. J. R. Stat. Soc. Ser. B. Stat. Methodol.70 53-71. · Zbl 1400.62276
[45] Menard, S. (2002). Applied Logistic Regression Analysis 106. Sage, Thousand Oaks, CA.
[46] Pregibon, D. (1981). Logistic regression diagnostics. Ann. Statist.9 705-724. · Zbl 0478.62053
[47] Ryan, T. P. (2009). Modern Regression Methods, 2nd ed. Wiley, Hoboken, NJ. · Zbl 1166.62049
[48] Sato, T., Takano, Y., Miyashiro, R. and Yoshise, A. (2016). Feature subset selection for logistic regression via mixed integer optimization. Comput. Optim. Appl.64 865-880. · Zbl 1352.90068
[49] Shafieezadeh-Abadeh, S., Mohajerin, P. and Kuhn, D. Distributionally robust logistic regression. In Proceedings of the 28 th International Conference on Neural Information Processing Systems (NIPS’15), Montreal, Canada, December 07-12, 2015 (C. Cortes, D. D. Lee, M. Sugiyama and R. Garnett, eds.) 1576-1584. MIT Press, Cambridge, MA.
[50] Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2013). A sparse-group Lasso. J. Comput. Graph. Statist.22 231-245.
[51] Tabachnick, B. G., Fidell, L. S. et al. (2001). Using Multivariate Statistics. Allyn and Bacon, Boston, MA.
[52] Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res.1 211-244. · Zbl 0997.68109
[53] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B. Stat. Methodol.68 49-67. · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x
[54] Zhao, P., Rocha, G. and Yu, B. (2009). The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Statist.37 3468-3497. · Zbl 1369.62164
[55] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc.101 1418-1429. · Zbl 1171.62326
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.