×

Optimization problems for machine learning: a survey. (English) Zbl 1487.90004

Summary: This paper surveys the machine learning literature and presents in an optimization framework several commonly used machine learning approaches. Particularly, mathematical optimization models are presented for regression, classification, clustering, deep learning, and adversarial learning, as well as new emerging applications in machine teaching, empirical model learning, and Bayesian network structure learning. Such models can benefit from the advancement of numerical optimization techniques which have already played a distinctive role in several machine learning settings. The strengths and the shortcomings of these models are discussed and potential research directions and open problems are highlighted.

MSC:

90-08 Computational methods for problems pertaining to operations research and mathematical programming
68T05 Learning and adaptive systems in artificial intelligence
68T07 Artificial neural networks and deep learning
62H30 Classification and discrimination; cluster analysis (statistical aspects)
90-02 Research exposition (monographs, survey articles) pertaining to operations research and mathematical programming
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Agostinelli, F.; Hoffman, M.; Sadowski, P.; Baldi, P., Learning activation functions to improve deep neural networks, Technical Report (2014), arXiv preprint 1412.6830
[2] Alam, M. A.; Lin, H.-Y.; Deng, H.-W.; Calhoun, V. D.; Wang, Y.-P., A kernel machine method for detecting higher order interactions in multimodal datasets: Application to schizophrenia, Journal of Neuroscience Methods, 309, 161-174 (2018)
[3] Aloise, D.; Hansen, P.; Liberti, L., An improved column generation algorithm for minimum sum-of-squares clustering, Mathematical Programming, 131, 1, 195-220 (2012) · Zbl 1236.90095
[4] Amaldi, E.; Coniglio, S., A distance-based point-reassignment heuristic for the k-hyperplane clustering problem, European Journal of Operational Research, 227, 1, 22-29 (2013) · Zbl 1292.90323
[5] Amaldi, E.; Coniglio, S.; Taccari, L., Discrete optimization methods to fit piecewise affine models to data points, Computers & Operations Research, 75, 214-230 (2016) · Zbl 1349.68209
[6] Aoki, Y.; Hayami, K.; Sterck, H. D.; Konagaya, A., Cluster Newton method for sampling multiple solutions of underdetermined inverse problems: application to a parameter identification problem in pharmacokinetics, SIAM Journal on Scientific Computing, 36, 1, 14-44 (2014) · Zbl 1290.65062
[7] Atamturk, A.; Gomez, A., Rank-one convexification for sparse regression, Technical Report (2019), arXiv preprint 1901.10334
[8] Aytug, H., Feature selection for support vector machines using generalized Benders decomposition, European Journal of Operational Research, 244, 1, 210-218 (2015) · Zbl 1347.62105
[9] Azad, M.; Moshkov, M., Minimization of decision tree average depth for decision tables with many-valued decisions, Procedia Computer Science, 35, 368-377 (2014)
[10] Azad, M.; Moshkov, M., Minimization of decision tree depth for multi-label decision tables, Proceedings of the ieee international conference on granular computing, 7-12 (2014)
[11] Azad, M.; Moshkov, M., Classification and optimization of decision trees for inconsistent decision tables represented as MVD tables, Proceedings of the federated conference on computer science and information systems, 31-38 (2015)
[12] Azad, M.; Moshkov, M., Multi-stage optimization of decision and inhibitory trees for decision tables with many-valued decisions, European Journal of Operational Research, 263, 3, 910-921 (2017) · Zbl 1380.91054
[13] Bagirov, A. M.; Yearwood, J., A new nonsmooth optimization algorithm for minimum sum-of-squares clustering problems, European Journal of Operational Research, 170, 2, 578-596 (2006) · Zbl 1085.90045
[14] Barlett, M.; Cussens, J., Advances in Bayesian network learning using integer programming, Proceedings of the conference on uncertainty in artificial intelligence, 182-191 (2013)
[15] Barreno, M.; Nelson, B.; Joseph, A. D.; Tygar, J. D., The security of machine learning, Machine Learning, 81, 2, 121-148 (2010) · Zbl 1470.68080
[16] Bartlett, M.; Cussens, J., Integer linear programming for the Bayesian network structure learning problem, Artificial Intelligence, 244, 258-271 (2017) · Zbl 1404.68094
[17] Bastani, O.; Ioannou, Y.; Lampropoulos, L.; Vytiniotis, D.; Nori, A.; Criminisi, A., Measuring neural net robustness with constraints, (Lee, D. D.; Sugiyama, M.; Luxburg, U. V.; Guyon, I.; Garnett, R., Advances in neural information processing systems (2016), Curran Associates, Inc.), 2613-2621
[18] Baumann, P.; Hochbaum, D. S.; Yang, Y. T., A comparative study of the leading machine learning techniques and two new optimization algorithms, European Journal of Operational Research, 272, 3, 1041-1057 (2019) · Zbl 1403.90560
[19] Belhumeur, P. N.; Hespanha, J. P.; Kriegman, D. J., Eigenfaces vs. fisherfaces: Recognition using class specific linear projection, IEEE Transactions on Pattern Analysis & Machine Intelligence, 7, 711-720 (1997)
[20] Benati, S.; García, S., A mixed integer linear model for clustering with variable selection, Computers & Operations Research, 43, 280-285 (2014) · Zbl 1349.62258
[21] Bengio, Y.; Lodi, A.; Prouvost, A., Machine learning for combinatorial optimization: a methodological tour d’Horizon, Technical Report (2018), arXiv preprint 1811.06128
[22] Bennett, K. P., Decision tree construction via linear programming, Technical Report (1992), Center for Parallel Optimization, Computer Sciences Department, University of Wisconsin
[23] Bennett, K. P.; Blue, J., Optimal decision trees, Technical Report (1996), Rensselaer Polytechnic Institute
[24] Bennett, K. P.; Mangasarian, O. L., Robust linear programming discrimination of two linearly inseparable sets, Optimization Methods and Software, 1, 1, 23-34 (1992)
[25] Bennett, K. P.; Parrado-Hernández, E., The interplay of optimization and machine learning research, Journal of Machine Learning Research, 7, 1265-1281 (2006) · Zbl 1222.68146
[26] Bergstra, J.; Bengio, Y., Random search for hyper-parameter optimization, Journal of Machine Learning Research, 13, Feb, 281-305 (2012) · Zbl 1283.68282
[27] Bertsimas, D.; Copenhaver, M. S., Characterization of the equivalence of robustification and regularization in linear and matrix regression, European Journal of Operational Research, 270, 3, 931-942 (2018) · Zbl 1403.62040
[28] Bertsimas, D.; Dunn, J., Optimal classification trees, Machine Learning, 106, 7, 1039-1082 (2017) · Zbl 1455.68159
[29] Bertsimas, D.; Dunn, J.; Pawlowski, C.; Zhuo, Y. D., Robust classification, INFORMS Journal on Optimization, 1, 1, 2-34 (2019)
[30] Bertsimas, D.; Kallus, N., From predictive to prescriptive analytics, Management Science, 66, 3, 1025-1044 (2020)
[31] Bertsimas, D.; King, A., OR forum-An algorithmic approach to linear regression, Operations Research, 64, 1, 2-16 (2016) · Zbl 1338.90272
[32] Bertsimas, D.; King, A.; Mazumder, R., Best subset selection via a modern optimization lens, The Annals of Statistics, 44, 2, 813-852 (2016) · Zbl 1335.62115
[33] Bertsimas, D.; Shioda, R., Classification and regression via integer optimization, Operations Research, 55, 2, 252-271 (2007) · Zbl 1167.90593
[34] Bertsimas, D.; Van Parys, B., Sparse high-dimensional regression: Exact scalable algorithms and phase transitions, The Annals of Statistics, 48, 1, 300-323 (2020) · Zbl 1444.62094
[35] Biggio, B.; Fumera, G.; Roli, F., Multiple classifier systems under attack, Proceedings of the international workshop on multiple classifier systems, 74-83 (2010)
[36] Biggio, B.; Nelson, B.; Laskov, P., Poisoning attacks against support vector machines, Proceedings of the international conference on machine learning, 1467-1474 (2012)
[37] Blanco, V.; Puerto, J.; Salmerón, R., Locating hyperplanes to fitting set of points: A general framework, Computers & Operations Research, 95, 172-193 (2018) · Zbl 1458.90467
[38] Blanquero, R.; Carrizosa, E.; Molero-Rıo, C.; Morales, D. R., Optimal randomized classification trees, Technical Report (2018)
[39] Blanquero, R.; Carrizosa, E.; Molero-Río, C.; Morales, D. R., Sparsity in optimal randomized classification trees, European Journal of Operational Research, 284, 1, 255-272 (2020) · Zbl 1441.62163
[40] Bonami, P.; Lodi, A.; Tramontani, A.; Wiese, S., On mathematical programming with indicator constraints, Mathematical Programming, 151, 1, 191-223 (2015) · Zbl 1328.90086
[41] Bonami, P.; Lodi, A.; Zarpellon, G., Learning a classification of mixed-integer quadratic programming problems, Proceedings of the international conference on the integration of constraint programming, artificial intelligence, and operations research, 595-604 (2018) · Zbl 1511.90304
[42] Boţ, R. I.; Lorenz, N., Optimization problems in statistical learning: Duality and optimality conditions, European Journal of Operational Research, 213, 2, 395-404 (2011) · Zbl 1222.90079
[43] Bottou, L.; Curtis, F. E.; Nocedal, J., Optimization methods for large-scale machine learning, SIAM Review, 60, 2, 223-311 (2018) · Zbl 1397.65085
[44] Bradley, P.; Mangasarian, O., Massive data discrimination via linear support vector machines, Optimization Methods and Software, 13, 1, 1-10 (2000) · Zbl 0986.90085
[45] Breiman, L.; Friedman, J.; Olshen, R.; Stone, C., Classification and regression trees (1984), Chapman and Hall/CRC: Chapman and Hall/CRC London · Zbl 0541.62042
[46] Brückner, M.; Kanzow, C.; Scheffer, T., Static prediction games for adversarial learning problems, Journal of Machine Learning Research, 13, 2617-2654 (2012) · Zbl 1433.68328
[47] Brückner, M.; Scheffer, T., Stackelberg games for adversarial prediction problems, Proceedings of the international conference on knowledge discovery and data mining, 547-555 (2011)
[48] Bunel, R. R.; Turkaslan, I.; Torr, P.; Kohli, P.; Mudigonda, P. K., A unified view of piecewise linear neural network verification, (Bengio, S.; Wallach, H.; Larochelle, H.; Grauman, K.; Cesa-Bianchi, N.; Garnett, R., Advances in neural information processing systems (2018), Curran Associates, Inc.), 4790-4799
[49] Byrd, R. H.; Lu, P.; Nocedal, J.; Zhu, C., A limited memory algorithm for bound constrained optimization, SIAM Journal on Scientific Computing, 16, 5, 1190-1208 (1995) · Zbl 0836.65080
[50] Cafieri, S.; Costa, A.; Hansen, P., Reformulation of a model for hierarchical divisive graph modularity maximization, Annals of Operations Research, 222, 1, 213-226 (2014) · Zbl 1303.90111
[51] Cafieri, S.; Hansen, P.; Liberti, L., Improving heuristics for network modularity maximization using an exact algorithm, Discrete Applied Mathematics, 163, 65-72 (2014) · Zbl 1303.90112
[52] Campos, C. P.d.; Ji, Q., Efficient structure learning of Bayesian networks using constraints, Journal of Machine Learning Research, 12, 663-689 (2011) · Zbl 1280.68226
[53] Carlini, N.; Wagner, D., Towards evaluating the robustness of neural networks, Proceedings of the ieee symposium on security and privacy, 39-57 (2017)
[54] Carrizosa, E.; Guerrero, V., Biobjective sparse principal component analysis, Journal of Multivariate Analysis, 132, 151-159 (2014) · Zbl 1360.62301
[55] Carrizosa, E.; Guerrero, V., rs-Sparse principal component analysis: A mixed integer nonlinear programming approach with VNS, Computers & Operations Research, 52, 349-354 (2014) · Zbl 1349.62248
[56] Carrizosa, E.; Martín-Barragán, B.; Morales, D. R., Binarized support vector machines, INFORMS Journal on Computing, 22, 1, 154-167 (2010) · Zbl 1243.62088
[57] Carrizosa, E.; Martín-Barragán, B.; Morales, D. R., Detecting relevant variables and interactions in supervised classification, European Journal of Operational Research, 213, 1, 260-269 (2011)
[58] Carrizosa, E.; Mladenović, N.; Todosijević, R., Variable neighborhood search for minimum sum-of-squares clustering on networks, European Journal of Operational Research, 230, 2, 356-363 (2013) · Zbl 1317.91061
[59] Carrizosa, E.; Morales, D. R., Supervised classification and mathematical optimization, Computers & Operations Research, 40, 1, 150-165 (2013) · Zbl 1349.68135
[60] Chan, A. B.; Vasconcelos, N.; Lanckriet, G. R.G., Direct convex relaxations of sparse SVM, Proceedings of the international conference on machine learning, 145-153 (2007)
[61] Chatterjee, S.; Hadi, A. S., Regression analysis by example (2015), John Wiley & Sons: John Wiley & Sons New York
[62] Chaves, A. A.; Lorena, L. A.N., Clustering search algorithm for the capacitated centered clustering problem, Computers & Operations Research, 37, 3, 552-558 (2010) · Zbl 1173.90413
[63] Chen, X.; Yang, J.; Zhang, D.; Liang, J., Complete large margin linear discriminant analysis using mathematical programming approach, Pattern Recognition, 46, 6, 1579-1594 (2013) · Zbl 1264.68140
[64] Chen, Y.; Florian, M., The nonlinear bilevel programming problem: Formulations, regularity and optimality conditions, Optimization, 32, 3, 193-209 (1995) · Zbl 0817.90101
[65] Cheng, C.-H.; Nührenberg, G.; Ruess, H., Maximum resilience of artificial neural networks, (D’Souza, D.; Narayan Kumar, K., Automated technology for verification and analysis (2017), Springer International Publishing: Springer International Publishing Cham), 251-268
[66] Chickering, D. M., Learning Bayesian networks is NP-complete, Learning from data, 121-130 (1996), Springer
[67] Chikalov, I.; Hussain, S.; Moshkov, M., Bi-criteria optimization of decision trees with applications to data analysis, European Journal of Operational Research, 266, 2, 689-701 (2018) · Zbl 1403.91106
[68] Chouldechova, A.; Hastie, T., Generalized additive model selection, Technical Report (2015), arXiv preprint 1506.03850
[69] Chu, W.; Keerthi, S. S., Support vector ordinal regression, Neural Computation, 19, 3, 792-815 (2007) · Zbl 1127.68080
[70] Claassen, G.; Hendriks, T. H., An application of special ordered sets to a periodic milk collection problem, European Journal of Operational Research, 180, 2, 754-769 (2007) · Zbl 1123.90319
[71] Corne, D.; Dhaenens, C.; Jourdan, L., Synergies between operations research and data mining: The emerging use of multi-objective approaches, European Journal of Operational Research, 221, 3, 469-479 (2012) · Zbl 1253.90002
[72] Corrente, S.; Greco, S.; Kadziński, M.; Słowiński, R., Robust ordinal regression in preference learning and ranking, Machine Learning, 93, 2-3, 381-422 (2013) · Zbl 1300.68040
[73] Cortes, C.; Vapnik, V., Support-vector networks, Machine Learning, 20, 3, 273-297 (1995) · Zbl 0831.68098
[74] Courbariaux, M.; Bengio, Y.; David, J.-P., Binaryconnect: Training deep neural networks with binary weights during propagations, (Cortes, C.; Lawrence, N. D.; Lee, D. D.; Sugiyama, M.; Garnett, R., Advances in neural information processing systems (2015), Curran Associates, Inc), 3123-3131
[75] Cox, L. A.; Qiu, Y.; Kuehner, W., Heuristic least-cost computation of discrete classification functions with uncertain argument values, Annals of Operations Research, 21, 1, 1-29 (1989) · Zbl 0705.90089
[76] Cunningham, J. P.; Ghahramani, Z., Linear dimensionality reduction: survey, insights, and generalizations, The Journal of Machine Learning Research, 16, 1, 2859-2900 (2015) · Zbl 1351.62123
[77] Curtis, F. E.; Scheinberg, K., Optimization methods for supervised machine learning: From linear models to deep learning, Leading developments from INFORMS communities, 89-114 (2017), INFORMS
[78] Cussens, J., Bayesian network learning with cutting planes, Proceedings of the conference on uncertainty in artificial intelligence, 153-160 (2011)
[79] Cybenko, G., Approximation by superpositions of a sigmoidal function, Mathematics of Control, Signals and Systems, 2, 4, 303-314 (1989) · Zbl 0679.94019
[80] D’Ambrosio, C.; Lodi, A.; Wiese, S.; Bragalli, C., Mathematical programming techniques in water network optimization, European Journal of Operational Research, 243, 3, 774-788 (2015) · Zbl 1346.90211
[81] Dekel, O.; Shamir, O.; Xiao, L., Learning to classify with missing and corrupted features, Machine Learning, 81, 2, 149-178 (2010) · Zbl 1470.68095
[82] 9-1
[83] Dıaz-Bánez, J.; Mesa, J. A.; Schöbel, A., Continuous location of dimensional structures, European Journal of Operational Research, 152, 1, 22-44 (2004) · Zbl 1040.90021
[84] Ding, C.; He, X., K-means clustering via principal component analysis, Proceedings of the international conference on machine learning, 29 (2004)
[85] Doshi-Velez, F.; Kim, B., Towards a rigorous science of interpretable machine learning, Technical Report (2017), arXiv preprint 1702.08608
[86] Dreiseitl, S.; Ohno-Machado, L., Logistic regression and artificial neural network classification models: a methodology review, Journal of Biomedical Informatics, 35, 5-6, 352-359 (2002)
[87] Dunbar, M.; Murray, J. M.; Cysique, L. A.; Brew, B. J.; Jeyakumar, V., Simultaneous classification and feature selection via convex quadratic programming with application to HIV-associated neurocognitive disorder assessment, European Journal of Operational Research, 206, 2, 470-478 (2010) · Zbl 1188.90235
[88] Edgeworth, F. Y., On observations relating to several quantities, Hermathena, 6, 13, 279-285 (1887)
[89] Edmunds, T. A.; Bard, J. F., An algorithm for the mixed-integer nonlinear bilevel programming problem, Annals of Operations Research, 34, 1, 149-162 (1992) · Zbl 0751.90054
[90] Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R., Least angle regression, The Annals of Statistics, 32, 2, 407-499 (2004) · Zbl 1091.62054
[91] Elsken, T.; Metzen, J. H.; Hutter, F., Neural architecture search: A survey., Journal of Machine Learning Research, 20, 55, 1-21 (2019) · Zbl 1485.68229
[92] Fanghänel, D.; Dempe, S., Bilevel programming with discrete lower level problems, Optimization, 58, 8, 1029-1047 (2009) · Zbl 1175.90315
[93] de Farias, I.; Zhao, M.; Zhao, H., A special ordered set approach for optimizing a discontinuous separable piecewise linear function, Operations Research Letters, 36, 2, 234-238 (2008) · Zbl 1163.90758
[94] Ferrari-Trecate, G.; Muselli, M.; Liberati, D.; Morari, M., A clustering technique for the identification of piecewise affine systems, Automatica, 39, 2, 205-217 (2003) · Zbl 1011.93508
[95] Fischetti, M.; Fraccaro, M., Machine learning meets mathematical optimization to predict the optimal production of offshore wind parks, Computers & Operations Research, 106, 289-297 (2019) · Zbl 1458.90671
[96] Fischetti, M.; Jo, J., Deep neural networks and mixed integer linear optimization, Constraints, 23, 3, 296-309 (2018) · Zbl 1402.90096
[97] Fischetti, M.; Lodi, A.; Zarpellon, G., Learning MILP resolution outcomes before reaching time-limit, Proceedings of the international conference on integration of constraint programming, artificial intelligence, and operations research, 275-291 (2019) · Zbl 1525.90283
[98] Franceschi, L.; Frasconi, P.; Salzo, S.; Grazzi, R.; Pontil, M., Bilevel programming for hyperparameter optimization and meta-learning, Technical Report (2018), arXiv preprint 1806.04910
[99] Friedman, J.; Hastie, T.; Tibshirani, R., The elements of statistical learning, 1 (2001), Springer Series in Statistics: Springer Series in Statistics New York, NY, USA · Zbl 0973.62007
[100] Fukunaga, K., Introduction to Statistical Pattern Recognition (2013), Elsevier
[101] Ganesh, K.; Narendran, T., Cloves: A cluster-and-search heuristic to solve the vehicle routing problem with delivery and pick-up, European Journal of Operational Research, 178, 3, 699-717 (2007) · Zbl 1148.90303
[102] Gasse, M.; Aussem, A.; Elghazel, H., A hybrid algorithm for Bayesian network structure learning with application to multi-label learning, Expert Systems with Applications, 41, 15, 6755-6772 (2014)
[103] Gaudioso, M.; Gorgone, E.; Labbé, M.; Rodríguez-Chía, A. M., Lagrangian relaxation for SVM feature selection, Computers & Operations Research, 87, 137-145 (2017) · Zbl 1391.90430
[104] Gaudreau, P.; Hayami, K.; Aoki, Y.; Safouhi, H.; Konagaya, A., Improvements to the cluster Newton method for underdetermined inverse problems, Journal of Computational and Applied Mathematics, 283, 122-141 (2015) · Zbl 1311.65070
[105] Ghaddar, B.; Naoum-Sawaya, J., High dimensional data classification and feature selection using support vector machines, European Journal of Operational Research, 265, 3, 993-1004 (2018) · Zbl 1381.62170
[106] Globerson, A.; Roweis, S., Nightmare at test time: robust learning by feature deletion, Proceedings of the international conference on machine learning, 353-360 (2006)
[107] Goldman, S.; Kearns, M., On the complexity of teaching, Journal of Computer and System Sciences, 50, 1, 20-31 (1995) · Zbl 0939.68770
[108] Goodfellow, I.; Bengio, Y.; Courville, A.; Bengio, Y., Deep learning, 1 (2016), MIT Press: MIT Press Cambridge · Zbl 1373.68009
[109] Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Bengio, Y., Generative adversarial nets, (Ghahramani, Z.; Welling, M.; Cortes, C.; Lawrence, N. D.; Weinberger, K. Q., Advances in neural information processing systems (2014), Curran Associates, Inc), 2672-2680
[110] Goodfellow, I. J.; Warde-Farley, D.; Mirza, M.; Courville, A.; Bengio, Y., Maxout networks, Proceedings of the international conference on machine learning, 1319-1327 (2013)
[111] Grossmann, I. E., Review of nonlinear mixed-integer and disjunctive programming techniques, Optimization and Engineering, 3, 3, 227-252 (2002) · Zbl 1035.90050
[112] Gu, S.; Rigazio, L., Towards deep neural network architectures robust to adversarial examples, Technical Report (2014), arXiv preprint 1412.5068
[113] Gümüş, Z. H.; Floudas, C. A., Global optimization of nonlinear bilevel programming problems, Journal of Global Optimization, 20, 1, 1-31 (2001) · Zbl 0987.90074
[114] Günlük, O.; Kalagnanam, J.; Menickelly, M.; Scheinberg, K., Optimal Decision Trees for Categorical Data via Integer Programming, Technical Report (2018), Optimization Online
[115] Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V., Gene selection for cancer classification using support vector machines, Machine Learning, 46, 1-3, 389-422 (2002) · Zbl 0998.68111
[116] Hamm, J.; Noh, Y., K-Beam subgradient descent for minimax optimization, Technical Report (2018), arXiv preprint 1805.11640
[117] Hansen, P.; Jaumard, B., Cluster analysis and mathematical programming, Mathematical Programming, 79, 1-3, 191-215 (1997) · Zbl 0887.90182
[118] Har-Peled, S.; Roth, D.; Zimak, D., Constraint classification for multiclass classification and ranking, (Becker, S.; Thrun, S.; Obermayer, K., Advances in neural information processing systems (2003), MIT Press), 809-816
[119] Hastie, T.; Tibshirani, R., Generalized additive models, Statistical Science, 1, 3, 297-310 (1986) · Zbl 0645.62068
[120] Hastie, T.; Tibshirani, R.; Tibshirani, R. J., Extended comparisons of best subset selection, forward stepwise selection, and the lasso, Technical Report (2017), arXiv preprint 1707.08692
[121] Herbrich, R., Learning kernel classifiers: Theory and algorithms (2001), MIT Press
[122] Herbrich, R.; Graepel, T.; Obermayer, K., Large margin rank boundaries forordinal regression, (Bartlett, P.; Schölkopf, B.; Schuumans, D.; Smola, A. (2000), MIT Press)
[123] Hornik, K., Approximation capabilities of multilayer feedforward networks, Neural Networks, 4, 2, 251-257 (1991)
[124] Hyafil, L.; Rivest, R. L., Constructing optimal binary decision trees is NP-complete, Information Processing Letters, 5, 1, 15-17 (1976) · Zbl 0333.68029
[125] Icarte, R. T.; Illanes, L.; Castro, M. P.; Cire, A. A.; McIlraith, S. A.; Beck, J. C., Training binarized neural networks using MIP and CP, Proceedings of the international conference on principles and practice of constraint programming (2019)
[126] Izenman, A. J., Modern multivariate statistical techniques: Regression, classification and manifold learning, Springer Texts in Statistics, 10 (2008), Springer · Zbl 1155.62040
[127] Jaakkola, T.; Sontag, D.; Globerson, A.; Meila, M., Learning Bayesian network structure using LP relaxations, Proceedings of the international conference on artificial intelligence and statistics, 358-365 (2010)
[128] Jain, A. K.; Murty, M. N.N.; Flynn, P. J., Data clustering: a review, ACM Computing Surveys, 31, 3, 264-323 (1999)
[129] James, G.; Witten, D.; Hastie, T.; Tibshirani, R., An introduction to statistical learning, 112 (2013), Springer · Zbl 1281.62147
[130] Jan, R.-H.; Chern, M.-S., Nonlinear integer bilevel programming, European Journal of Operational Research, 72, 3, 574-587 (1994) · Zbl 0824.90097
[131] Jolliffe, I., Principal component analysis, International encyclopedia of statistical science, 1094-1096 (2011), Springer
[132] Karmitsa, N.; Bagirov, A. M.; Taheri, S., New diagonal bundle method for clustering problems in large data sets, European Journal of Operational Research, 263, 2, 367-379 (2017) · Zbl 1380.90226
[133] Karpathy, A.; Toderici, G.; Shetty, S.; Leung, T.; Sukthankar, R.; Fei-Fei, L., Large-scale video classification with convolutional neural networks, Proceedings of the ieee conference on computer vision and pattern recognition, 1725-1732 (2014)
[134] Katz, G.; Barrett, C.; Dill, D. L.; Julian, K.; Kochenderfer, M. J., Reluplex: An efficient SMT solver for verifying deep neural networks, Proceedings of the international conference on computer aided verification, 97-117 (2017) · Zbl 1494.68167
[135] Kawano, S.; Fujisawa, H.; Takada, T.; Shiroishi, T., Sparse principal component regression with adaptive loading, Computational Statistics & Data Analysis, 89, 192-203 (2015) · Zbl 1468.62098
[136] Kelley, C. T., Iterative methods for optimization (1999), Society for Industrial and Applied Mathematics · Zbl 0934.90082
[137] Keshvari, A., Segmented concave least squares: A nonparametric piecewise linear regression, European Journal of Operational Research, 266, 2, 585-594 (2018) · Zbl 1403.90546
[138] Khalil, E. B.; Bodic, P. L.; Song, L.; Nemhauser, G.; Dilkina, B., Learning to branch in mixed integer programming, Proceedings of the AAAI conference on artificial intelligence, 724-731 (2016)
[139] Khalil, E. B.; Dilkina, B.; Nemhauser, G. L.; Ahmed, S.; Shao, Y., Learning to run heuristics in tree search, Proceedings of the international joint conference on artificial intelligence, 659-666 (2017)
[140] Khalil, E. B.; Gupta, A.; Dilkina, B., Combinatorial attacks on binarized neural networks, Technical Report (2018), arXiv preprint 1810.03538
[141] Kingma, D. P.; Ba, J., Adam: A method for stochastic optimization, Technical Report (2014), arXiv preprint 1412.6980
[142] Klabjan, D.; Harmon, M., Activation ensembles for deep neural networks, Proceeding of the IEEE international conference on big data, 206-214 (2019)
[143] Klastorin, T. D., The p-median problem for cluster analysis: A comparative test using the mixture model approach, Management Science, 31, 1, 84-95 (1985) · Zbl 0612.62086
[144] Klatzer, T.; Pock, T., Continuous hyper-parameter learning for support vector machines, Proceedings of the computer vision winter workshop, 39-47 (2015)
[145] Kramer, S.; Widmer, G.; Pfahringer, B.; De Groeve, M., Prediction of ordinal classes using regression trees, Fundamenta Informaticae, 47, 1-2, 1-13 (2001) · Zbl 1016.68079
[146] Kraus, M.; Feuerriegel, S.; Oztekin, A., Deep learning in business analytics and operations research: models, applications and managerial implications, European Journal of Operational Research, 281, 3, 628-641 (2020)
[147] Krizhevsky, A.; Sutskever, I.; Hinton, G. E., Imagenet classification with deep convolutional neural networks, (Pereira, F.; Burges, C. J.C.; Bottou, L.; Weinberger, K. Q., Advances in neural information processing systems (2012), Curran Associates, Inc), 1097-1105
[148] Kurakin, A.; Goodfellow, I.; Bengio, S., Adversarial machine learning at scale, Technical Report (2016), arXiv preprint 1611.01236
[149] Kwatera, R. K.; Simeone, B., Clustering heuristics for set covering, Annals of Operations Research, 43, 5, 295-308 (1993) · Zbl 0784.90062
[150] Lanckriet, G. R.G.; Ghaoui, L. E.; Bhattacharyya, C.; Jordan, M. I., A robust minimax approach to classification, Journal of Machine Learning Research, 3, 555-582 (2002) · Zbl 1084.68657
[151] LeCun, Y.; Chopra, S.; Hadsell, R.; Ranzato, M.; Huang, F., A tutorial on energy-based learning, (Bakir, G.; Hofman, T.; Scholkopt, B.; Smola, A.; Taskar, B. (2006), MIT Press)
[152] LeCun, Y., Generalization and network design strategies, Connectionism in Perspective, 19, 143-155 (1989)
[153] Leofante, F.; Narodytska, N.; Pulina, L.; Tacchella, A., Automated verification of neural networks: Advances, challenges and perspectives, Technical Report (2018), arXiv preprint 1805.09938
[154] Lewis, M.; Wang, H.; Kochenberger, G., Exact solutions to the capacitated clustering problem: A comparison of two models, Annals of Data Science, 1, 1, 15-23 (2014)
[155] Liang, T.; Poggio, T.; Rakhlin, A.; Stokes, J., Fisher-Rao metric, geometry, and complexity of neural networks, Proceeding of the international conference on artificial intelligence and statistics, 888-896 (2019)
[156] Lin, M.; Chen, Q.; Yan, S., Network in network, Technical Report (2013), arXiv preprint 1312.4400
[157] Liu, J.; Zhu, X., The teaching dimension of linear learners, The Journal of Machine Learning Research, 17, 1, 5631-5655 (2016) · Zbl 1392.68353
[158] Lodi, A.; Zarpellon, G., On learning and branching: A survey, TOP, 25, 2, 207-236 (2017) · Zbl 1372.90003
[159] Lombardi, M.; Milano, M.; Bartolini, A., Empirical decision model learning, Artificial Intelligence, 244, 343-367 (2017) · Zbl 1404.68113
[160] Lowd, D.; Meek, C., Adversarial learning, Proceedings of the international conference on knowledge discovery in data mining, 641-647 (2005)
[161] MacQueen, J., Some methods for classification and analysis of multivariate observations, Proceedings of the berkeley symposium on mathematical statistics and probability, 281-297 (1967) · Zbl 0214.46201
[162] Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A., Towards deep learning models resistant to adversarial attacks, Technical Report (2017), arXiv preprint 1706.06083
[163] Mai, F.; Fry, M. J.; Ohlmann, J. W., Model-based capacitated clustering with posterior regularization, European Journal of Operational Research, 271, 2, 594-605 (2018) · Zbl 1403.90519
[164] Maldonado, S.; Pérez, J.; Weber, R.; Labbé, M., Feature selection for support vector machines via mixed integer linear programming, Information Sciences, 279, 163-175 (2014) · Zbl 1354.68226
[165] Mei, S.; Zhu, X., Using machine teaching to identify optimal training-set attacks on machine learners., Proceedings of the AAAI conference on artificial intelligence, 2871-2877 (2015)
[166] Mielke, P. W.; Berry, K. J., Permutation-based multivariate regression analysis: The case for least sum of absolute deviations regression, Annals of Operations Research, 74, 259 (1997) · Zbl 0893.62067
[167] Miller, A., Subset selection in regression (2002), Chapman and Hall/CRC · Zbl 1051.62060
[168] (In press)
[169] Miyashiro, R.; Takano, Y., Mixed integer second-order cone programming formulations for variable selection in linear regression, European Journal of Operational Research, 247, 3, 721-731 (2015) · Zbl 1346.90616
[170] Montúfar, G., Notes on the number of linear regions of deep neural networks, Technical Report (2017), Max Planck Institute for Mathematics in the Sciences
[171] Moore, G.; Bergeron, C.; Bennett, K. P., Model selection for primal SVM, Machine Learning, 85, 1-2, 175-208 (2011) · Zbl 1237.68159
[172] Mortenson, M. J.; Doherty, N. F.; Robinson, S., Operational research from taylorism to terabytes: A research agenda for the analytics age, European Journal of Operational Research, 241, 3, 583-595 (2015) · Zbl 1339.90007
[173] Mulvey, J. M.; Crowder, H. P., Cluster analysis: An application of Lagrangian relaxation, Management Science, 25, 4, 329-340 (1979) · Zbl 0415.90085
[174] Natarajan, B. K., Sparse approximate solutions to linear systems, SIAM Journal on Computing, 24, 2, 227-234 (1995) · Zbl 0827.68054
[175] Negreiros, M.; Palhano, A., The capacitated centred clustering problem, Computers & Operations Research, 33, 6, 1639-1663 (2006) · Zbl 1087.90088
[176] Nie, S.; De Campos, C. P.; Ji, Q., Learning bounded tree-width Bayesian networks via sampling, Proceedings of the european conference on symbolic and quantitative approaches to reasoning and uncertainty, 387-396 (2015), Springer · Zbl 1465.68226
[177] Nie, S.; Mauá, D. D.; De Campos, C. P.; Ji, Q., Advances in learning Bayesian networks of bounded treewidth, Advances in neural information processing systems, 2285-2293 (2014)
[178] Olafsson, S.; Li, X.; Wu, S., Operations research and data mining, European Journal of Operational Research, 187, 3, 1429-1448 (2008) · Zbl 1137.90776
[179] Parviainen, P.; Farahani, H. S.; Lagergren, J., Learning bounded tree-width Bayesian networks using integer linear programming, Artificial intelligence and statistics, 751-759 (2014)
[180] Patil, K. R.; Zhu, J.; Kopeć, L.u.; Love, B. C., Optimal teaching for limited-capacity human learners, (Ghahramani, Z.; Welling, M.; Cortes, C.; Lawrence, N. D.; Weinberger, K. Q., Advances in neural information processing systems 27 (2014), Curran Associates, Inc), 2465-2473
[181] Payne, H. J.; Meisel, W. S., An algorithm for constructing optimal binary decision trees, IEEE Transactions on Computers, 26, 9, 905-916 (1977) · Zbl 0363.68131
[182] Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Éouard, D., Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, 12, 2825-2830 (2011) · Zbl 1280.68189
[183] Piramuthu, S., Evaluating feature selection methods for learning in data mining applications, European Journal of Operational Research, 156, 2, 483-494 (2004) · Zbl 1056.90091
[184] Poggio, T.; Kawaguchi, K.; Liao, Q.; Miranda, B.; Rosasco, L.; Boix, X.; Mhaskar, H., Theory of Deep Learning III: Explaining the non-overfitting puzzle, Technical Report (2017), arXiv preprint 1801.00173
[185] Reris, R.; Brooks, J. P., Principal component analysis and optimization: a tutorial, Technical Report (2015), Virginia Commonwealth University
[186] Robbins, H.; Monro, S., A stochastic approximation method, Herbert robbins selected papers, 102-109 (1985), Springer
[187] Rovatti, R.; D’Ambrosio, C.; Lodi, A.; Martello, S., Optimistic MILP modeling of non-linear optimization problems, European Journal of Operational Research, 239, 1, 32-45 (2014) · Zbl 1339.90250
[188] Sağlam, B.; Salman, F. S.; Sayın, S.; Türkay, M., A mixed-integer programming approach to the clustering problem with an application in customer segmentation, European Journal of Operational Research, 173, 3, 866-879 (2006) · Zbl 1131.90434
[189] Santi, É.; Aloise, D.; Blanchard, S. J., A model for clustering data from heterogeneous dissimilarities, European Journal of Operational Research, 253, 3, 659-672 (2016) · Zbl 1347.62120
[190] Scanagatta, M.; Corani, G.; de Campos, C. P.; Zaffalon, M., Learning treewidth-bounded bayesian networks with thousands of variables, (Lee, D. D.; Sugiyama, M.; Luxburg, U. V.; Guyon, I.; Garnett, R., Advances in neural information processing systems (2016), Curran Associates, Inc), 1462-1470
[191] Scheuerer, S.; Wendolsky, R., A scatter search heuristic for the capacitated clustering problem, European Journal of Operational Research, 169, 2, 533-547 (2006) · Zbl 1079.90180
[192] Schöbel, A., Locating least-distant lines in the plane, European Journal of Operational Research, 106, 1, 152-159 (1998)
[193] Serra, T.; Tjandraatmadja, C.; Ramalingam, S., Bounding and counting linear regions of deep neural networks, Proceeding of the international conference on machine learning, 4558-4566 (2018)
[194] Shaham, U.; Yamada, Y.; Negahban, S., Understanding adversarial training: Increasing local stability of supervised models through robust optimization, Neurocomputing, 307, 195-204 (2018)
[195] Shashua, A.; Levin, A., Ranking with large margin principle: Two approaches, (Becker, S.; Thrun, S.; Obermayer, K., Advances in neural information processing systems (2003), MIT Press), 961-968
[196] Shinohara, A.; Miyano, S., Teachability in computational learning, New Generation Computing, 8, 4, 337-347 (1991) · Zbl 0712.68084
[197] Smola, A. J.; Schölkopf, B., A tutorial on support vector regression, Statistics and Computing, 14, 3, 199-222 (2004)
[198] Solomonoff, R. J., An inductive inference machine, IRE convention record, section on information theory, 2, 56-62 (1957)
[199] Song, H.; Triguero, I.; Özcan, E., A review on the self and dual interactions between machine learning and optimisation, Progress in Artificial Intelligence, 8, 2, 143-165 (2019)
[200] Steinhardt, J.; Koh, P. W.W.; Liang, P. S., Certified defenses for data poisoning attacks, (Guyon, I.; Luxburg, U. V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; Garnett, R., Advances in neural information processing systems (2017), Curran Associates, Inc), 3517-3529
[201] Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; Fergus, R., Intriguing properties of neural networks, Technical Report (2013), arXiv preprint 1312.6199
[202] Tamura, R.; Kobayashi, K.; Takano, Y.; Miyashiro, R.; Nakata, K.; Matsui, T., Best subset selection for eliminating multicollinearity, Journal of the Operations Research Society of Japan, 60, 3, 321-336 (2017) · Zbl 1382.90068
[203] Tamura, R.; Kobayashi, K.; Takano, Y.; Miyashiro, R.; Nakata, K.; Matsui, T., Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor, Journal of Global Optimization, 73, 2, 431-446 (2019) · Zbl 1421.90093
[204] Taylan, P.; Weber, G.-W.; Beck, A., New approaches to regression by generalized additive models and continuous optimization for modern applications in finance, science and technology, Optimization, 56, 5-6, 675-698 (2007) · Zbl 1123.62055
[205] Tiwari, S.; Wee, H.; Daryanto, Y., Big data analytics in supply chain management between 2010 and 2016: Insights to industries, Computers & Industrial Engineering, 115, 319-330 (2018)
[206] Tjeng, V.; Tedrake, R., Evaluating robustness of neural networks with mixed integer programming, Technical Report (2017), arXiv preprint 1711.07356
[207] Toriello, A.; Vielma, J. P., Fitting piecewise linear continuous functions, European Journal of Operational Research, 219, 1, 86-95 (2012) · Zbl 1244.90166
[208] Tramèr, F.; Kurakin, A.; Papernot, N.; Goodfellow, I.; Boneh, D.; McDaniel, P., Ensemble adversarial training: Attacks and defenses, Technical Report (2017), arXiv preprint 1705.07204
[209] Vapnik, V., Statistical learning theory, 3 (1998), Wiley: Wiley New York · Zbl 0935.62007
[210] Vapnik, V., The nature of statistical learning theory (2013), Springer Science & Business Media · Zbl 0934.62009
[211] Verwer, S.; Zhang, Y., Learning decision trees with flexible constraints and objectives using integer optimization, Proceedings of the international conference on ai and or techniques in constraint programming for combinatorial optimization problems, 94-103 (2017) · Zbl 1489.68259
[212] Verwer, S.; Zhang, Y.; Ye, Q. C., Auction optimization using regression trees and linear models as integer programs, Artificial Intelligence, 244, 368-395 (2017) · Zbl 1404.68122
[213] Vielma, J. P.; Ahmed, S.; Nemhauser, G., Mixed-integer models for nonseparable piecewise-linear optimization: Unifying framework and extensions, Operations Research, 58, 2, 303-315 (2010) · Zbl 1226.90046
[214] Václavík, R.; Novák, A.; Šucha, P.; Hanzálek, Z., Accelerating the branch-and-price algorithm using machine learning, European Journal of Operational Research, 271, 3, 1055-1069 (2018) · Zbl 1403.90372
[215] Wang, G.; Gunasekaran, A.; Ngai, E. W.; Papadopoulos, T., Big data analytics in logistics and supply chain management: Certain investigations for research and applications, International Journal of Production Economics, 176, 98-110 (2016)
[216] Wang, H.; Ding, C.; Huang, H., Multi-label linear discriminant analysis, Proceedings of the european conference on computer vision, 126-139 (2010)
[217] Wang, L.; Zhu, J.; Zou, H., The doubly regularized support vector machine, Statistica Sinica, 16, 2, 589 (2006) · Zbl 1126.68070
[218] Wang, Y.; Chaudhuri, K., Data poisoning attacks against online learning, Technical Report (2018), arXiv preprint 1808.08994
[219] Wang, Y.; Zhang, D.; Liu, Y.; Dai, B.; Lee, L. H., Enhancing transportation systems via deep learning: A survey, Transportation Research Part C: Emerging Technologies, 99, 144-163 (2019)
[220] Wistuba, M.; Rawat, A.; Pedapati, T., A survey on neural architecture search, Technical Report (2019), arXiv preprint arXiv:1905.01392
[221] Wright, S. J., Optimization algorithms for data analysis, (Mahoney, M.; Duchi, J.; Gilbert, A., The mathematics of data (2018), American Mathematical Society), 49-98 · Zbl 1448.68018
[222] Yuan, C.; Malone, B., Learning optimal Bayesian networks: A shortest path perspective, Journal of Artificial Intelligence Research, 48, 23-65 (2013) · Zbl 1361.68182
[223] Zhu, J., Machine teaching for bayesian learners in the exponential family, (Burges, C. J.C.; Bottou, L.; Welling, M.; Ghahramani, Z.; Weinberger, K. Q., Advances in neural information processing systems (2013), Curran Associates, Inc), 1905-1913
[224] Zhu, J.; Rosset, S.; Tibshirani, R.; Hastie, T. J., 1-Norm support vector machines, (Thrun, S.; Saul, L. K.; Schölkopf, B., Advances in neural information processing systems (2004), MIT Press), 49-56
[225] Zhu, X., Machine teaching: An inverse problem to machine learning and an approach toward optimal education., Proceedings of the AAAI conference on artificial intelligence, 4083-4087 (2015)
[226] Zhu, X.; Singla, A.; Zilles, S.; Rafferty, A. N., An overview of machine teaching, Technical Report (2018), arXiv preprint 1801.05927
[227] Zinkevich, M., Online convex programming and generalized infinitesimal gradient ascent, Proceedings of the international conference on machine learning, 928-936 (2003)
[228] Zou, H.; Hastie, T., Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67, 2, 301-320 (2005) · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.