Li, Qingna; Li, Zhen; Zemkoho, Alain Bilevel hyperparameter optimization for support vector classification: theoretical analysis and a solution method. (English) Zbl 1508.90102 Math. Methods Oper. Res. 96, No. 3, 315-350 (2022). Summary: Support vector classification (SVC) is a classical and well-performed learning method for classification problems. A regularization parameter, which significantly affects the classification performance, has to be chosen and this is usually done by the cross-validation procedure. In this paper, we reformulate the hyperparameter selection problem for support vector classification as a bilevel optimization problem in which the upper-level problem minimizes the average number of misclassified data points over all the cross-validation folds, and the lower-level problems are the \(l_1\)-loss SVC problems, with each one for each fold in T-fold cross-validation. The resulting bilevel optimization model is then converted to a mathematical program with equilibrium constraints (MPEC). To solve this MPEC, we propose a global relaxation cross-validation algorithm (GR-CV) based on the well-know Sholtes-type global relaxation method (GRM). It is proven to converge to a C-stationary point. Moreover, we prove that the MPEC-tailored version of the Mangasarian-Fromovitz constraint qualification (MFCQ), which is a key property to guarantee the convergence of the GRM, automatically holds at each feasible point of this MPEC. Extensive numerical results verify the efficiency of the proposed approach. In particular, compared with other methods, our algorithm enjoys superior generalization performance over almost all the data sets used in this paper. MSC: 90C33 Complementarity and equilibrium problems and variational inequalities (finite dimensions) (aspects of mathematical programming) 90C90 Applications of mathematical programming 49M20 Numerical methods of relaxation type Keywords:support vector classification; hyperparameter selection; bilevel optimization; mathematical program with equilibrium constraints; C-stationarity Software:SNOPT; Pegasos PDFBibTeX XMLCite \textit{Q. Li} et al., Math. Methods Oper. Res. 96, No. 3, 315--350 (2022; Zbl 1508.90102) Full Text: DOI arXiv References: [1] Anitescu M (2000) On solving mathematical programs with complementarity constraints as nonlinear programs. Preprint ANL/MCS-\(P864-1200\), Argonne National Laboratory, Argonne, IL 3 [2] Bennett KP, Hu J, Ji XY, Kunapuli G, Pang J-S (2006) Model selection via bilevel optimization. In: The 2006 IEEE International Joint Conference on Neural Network Proceedings, pp 1922-1929 . IEEE [3] Bennett KP, Kunapuli G, Hu J, Pang J-S (2008) Bilevel optimization and machine learning. In: IEEE World Congress on Computational Intelligence, pp 25-47 [4] Chapelle, O.; Vapnik, V.; Bousquet, O.; Mukherjee, S., Choosing multiple parameters for support vector machines, Mach Learn, 46, 1, 131-159 (2002) · Zbl 0998.68101 [5] Chauhan, VK; Dahiya, K.; Sharma, A., Problem formulations and solvers in linear SVM: a review, Artif Intell Rev, 52, 2, 803-855 (2019) [6] Colson, B.; Marcotte, P.; Savard, G., An overview of bilevel optimization, Ann Oper Res, 153, 1, 235-256 (2007) · Zbl 1159.90483 [7] Cortes, C.; Vapnik, V., Support-vector networks, Mach Learn, 20, 3, 273-297 (1995) · Zbl 0831.68098 [8] Couellan, N.; Wang, WJ, Bi-level stochastic gradient for large scale support vector machine, Neurocomputing, 153, 300-308 (2015) [9] Cristianini, N.; Shawe-Taylor, J., An Introduction to Support Vector Machines and Other Kernel-based Learning Methods (2000), Cambridge: Cambridge University Press, Cambridge · Zbl 0994.68074 [10] Crockett C, Fessler JA (2021) Bilevel methods for image reconstruction. arXiv preprint arXiv:2109.09610 [11] Dempe, S., Foundations of Bilevel Programming (2002), New York: Springer, New York · Zbl 1038.90097 [12] Dempe, S., Annotated bibliography on bilevel programming and mathematical programs with equilibrium constraints, Optimization, 52, 3, 333-359 (2003) · Zbl 1140.90493 [13] Dempe, S.; Zemkoho, AB, On the Karush-Kuhn-Tucker reformulation of the bilevel optimization problem, Nonlinear Analysis: Theory, Methods Appl, 75, 3, 1202-1218 (2012) · Zbl 1254.90222 [14] Dempe, S.; Zemkoho, AB, Bilevel Optimization Advances and Next Challenges (2020), New York: Springer, New York · Zbl 1470.90001 [15] Dong Y-L, Xia Z-Q, Wang M-Z (2007) An MPEC model for selecting optimal parameter in support vector machines. In: The First International Symposium on Optimization and Systems Biology, pp 351-357 [16] Duan, KB; Keerthi, SS; Poo, AN, Evaluation of simple performance measures for tuning SVM hyperparameters, Neurocomputing, 51, 41-59 (2003) [17] Facchinei, F.; Pang, J-S, Finite-dimensional Variational Inequalities and Complementarity Problems (2007), NewYork: Springer, NewYork [18] Fischer A, Zemkoho AB, Zhou S (2021) Semismooth Newton-type method for bilevel optimization: global convergence and extensive numerical experiments. Optimization Methods & Software, 1-35 [19] Flegel ML (2005) Constraint qualifications and stationarity concepts for mathematical programs with equilibrium constraints. PhD thesis, Universität Würzburg [20] Fletcher, R.; Leyffer, S.; Ralph, D.; Scholtes, S., Local convergence of SQP methods for mathematical programs with equilibrium constraints, SIAM J Optim, 17, 1, 259-286 (2006) · Zbl 1112.90098 [21] Fukushima, M.; Tseng, P., An implementable active-set algorithm for computing a B-stationary point of a mathematical program with linear complementarity constraints, SIAM J Optim, 12, 3, 724-739 (2002) · Zbl 1005.65064 [22] Galli L, Lin C-J (2021) A study on truncated newton methods for linear classification. IEEE Transactions on Neural Networks and Learning Systems [23] Gill, PE; Murray, W.; Saunders, MA, User’s Guide for Snopt version 6 (2002), California: A Fortran Package for Large-Scale Nonlinear Programming. University of California, California [24] Guo, L.; Lin, G-H; Ye, JJ, Solving mathematical programs with equilibrium constraints, J Optim Theory Appl, 166, 1, 234-256 (2015) · Zbl 1327.90233 [25] Harder, F.; Mehlitz, P.; Wachsmuth, G., Reformulation of the M-stationarity conditions as a system of discontinuous equations and its solution by a semismooth Newton method, SIAM J Optim, 31, 2, 1459-1488 (2021) · Zbl 1467.49024 [26] Hoheisel, T.; Kanzow, C.; Schwartz, A., Theoretical and numerical comparison of relaxation methods for mathematical programs with complementarity constraints, Math Program, 137, 1, 257-288 (2013) · Zbl 1262.65065 [27] Hsieh C-J, Chang K-W, Lin C-J, Keerthi SS, Sundararajan S (2008) A dual coordinate descent method for large-scale linear SVM. In Proceedings of the 25th International Conference on Machine Learning, pp 408-415 [28] Huang, X.; Shi, L.; Suykens, JA, Support vector machine classifier with pinball loss, IEEE Trans Pattern Anal Mach Intell, 36, 5, 984-997 (2013) [29] Jara-Moroni, F.; Pang, J-S; Wächter, A., A study of the difference-of-convex approach for solving linear programs with complementarity constraints, Math Program, 169, 1, 221-254 (2018) · Zbl 1397.90264 [30] Júdice, JJ, Algorithms for linear programming with linear complementarity constraints, TOP, 20, 1, 4-25 (2012) · Zbl 1267.90108 [31] Keerthi S, Sindhwani V, Chapelle O (2006) An efficient method for gradient-based adaptation of hyperparameters in SVM models. Advances in neural information processing systems 19 [32] Kunapuli, G., A Bilevel Optimization Approach to Machine Learning (2008), New York: Rensselaer Polytechnic Institute, New York [33] Kunapuli, G.; Bennett, KP; Hu, J.; Pang, J-S, Classification model selection via bilevel programming, Optim Methods Softw, 23, 4, 475-489 (2008) · Zbl 1151.90541 [34] Kunapuli, G.; Bennett, KP; Hu, J.; Pang, J-S, Bilevel model selection for support vector machines, Data mining mathe programming, 45, 129-158 (2008) · Zbl 1145.68488 [35] Kunisch, K.; Pock, T., A bilevel optimization approach for parameter learning in variational models, SIAM J Imag Sci, 6, 2, 938-983 (2013) · Zbl 1280.49053 [36] Lee, Y-C; Pang, J-S; Mitchell, JE, Global resolution of the support vector machine regression parameters selection problem with LPCC, EURO J Comput Optim, 3, 3, 197-261 (2015) · Zbl 1323.90055 [37] Li, JL; Huang, RS; Jian, JB, A superlinearly convergent QP-free algorithm for mathematical programs with equilibrium constraints, Appl Math Comput, 269, 885-903 (2015) · Zbl 1410.90216 [38] Lin, G-H; Xu, MW; Ye, JJ, On solving simple bilevel programs with a nonconvex lower level program, Math Program, 144, 1, 277-305 (2014) · Zbl 1301.65050 [39] Luo, G., A review of automatic selection methods for machine learning algorithms and hyper-parameter values, Netw Model Anal Health Inform Bioinform, 5, 1, 1-16 (2016) [40] Luo, Z-Q; Pang, J-S; Ralph, D., Mathematical Programs with Equilibrium Constraints (1996), Cambridge: Cambridge University Press, Cambridge · Zbl 0898.90006 [41] Mangasarian, OL, Misclassification minimization, J Global Optim, 5, 4, 309-323 (1994) · Zbl 0814.90081 [42] Mejía-de-Dios J-A, Mezura-Montes E (2019) A metaheuristic for bilevel optimization using tykhonov regularization and the quasi-newton method. In: 2019 IEEE Congress on Evolutionary Computation (CEC), pp 3134-3141 [43] Momma M, Bennett KP (2002) A pattern search method for model selection of support vector regression. In: Proceedings of the 2002 SIAM International Conference on Data Mining, pp 261-274 [44] Moore G, Bergeron C, Bennett KP. Gradient-type methods for primal SVM model selection · Zbl 1237.68159 [45] Moore G, Bergeron C, Bennett KP (2009) Nonsmooth bilevel programming for hyperparameter selection. In: 2009 IEEE International Conference on Data Mining Workshops, pp 374-381 [46] Ochs, P.; Ranftl, R.; Brox, T.; Pock, T., Techniques for gradient-based bilevel optimization with non-smooth lower level problems, J Mathe Imag Vision, 56, 2, 175-194 (2016) · Zbl 1352.65155 [47] Ochs P, Ranftl R, Brox T, Pock T (2015) Bilevel optimization with nonsmooth lower level problems. In: International Conference on Scale Space and Variational Methods in Computer Vision, pp 654-665 · Zbl 1444.94018 [48] Okuno T, Takeda A, Kawana A (2018) Hyperparameter learning for bilevel nonsmooth optimization. arXiv preprint arXiv:1806.01520 [49] Scholtes, S., Convergence properties of a regularization scheme for mathematical programs with complementarity constraints, SIAM J Optim, 11, 4, 918-936 (2001) · Zbl 1010.90086 [50] Shalev-Shwartz, S.; Singer, Y.; Srebro, N.; Cotter, A., Pegasos: Primal estimated sub-gradient solver for SVM, Math Program, 127, 1, 3-30 (2011) · Zbl 1211.90239 [51] Vapnik, V., The Nature of Statistical Learning Theory (2013), New York: Springer, New York [52] Wang H, Shao Y, Zhou S, Zhang C, Xiu N (2021) Support vector machine classifier via \({L}_{0/1}\) soft-margin loss. IEEE Transactions on Pattern Analysis and Machine Intelligence [53] Wu, J.; Zhang, LW; Zhang, Y., An inexact Newton method for stationary points of mathematical programs constrained by parameterized quasi-variational inequalities, Numer Algorithms, 69, 4, 713-735 (2015) · Zbl 1323.65080 [54] Yan, YQ; Li, QN, An efficient augmented Lagrangian method for support vector machine, Optim Methods Softw, 35, 4, 855-883 (2020) · Zbl 1454.90091 [55] Ye, JJ, Necessary and sufficient optimality conditions for mathematical programs with equilibrium constraints, J Math Anal Appl, 307, 1, 350-369 (2005) · Zbl 1112.90062 [56] Ye, JJ; Zhu, DL, New necessary optimality conditions for bilevel programs by combining the MPEC and value function approaches, SIAM J Optim, 20, 4, 1885-1905 (2010) · Zbl 1279.90187 [57] Yu, B.; Mitchell, JE; Pang, J-S, Solving linear programs with complementarity constraints using branch-and-cut, Math Program Comput, 11, 2, 267-310 (2019) · Zbl 1434.90160 [58] Yu T, Zhu H (2020) Hyper-parameter optimization: A review of algorithms and applications. arXiv preprint arXiv:2003.05689 [59] Zemkoho, AB; Zhou, SL, Theoretical and numerical comparison of the karush-kuhn-tucker and value function reformulations in bilevel optimization, Comput Optim Appl, 78, 2, 625-674 (2021) · Zbl 1469.90146 [60] Zhang T (2004) Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the Twenty-first International Conference on Machine Learning, p 116 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.