# zbMATH — the first resource for mathematics

Nonlinear optimization and support vector machines. (English) Zbl 1398.65126
Summary: Support Vector Machine (SVM) is one of the most important class of machine learning models and algorithms, and has been successfully applied in various fields. Nonlinear optimization plays a crucial role in SVM methodology, both in defining the machine learning models and in designing convergent and efficient algorithms for large-scale training problems. In this paper we present the convex programming problems underlying SVM focusing on supervised binary classification. We analyze the most important and used optimization methods for SVM training problems, and we discuss how the properties of these problems can be incorporated in designing useful algorithms.

##### MSC:
 65K05 Numerical mathematical programming methods 90C25 Convex programming 90C30 Nonlinear programming
##### Software:
bmrm; GPDT; LIBLINEAR; Pegasos; SSVM; TRON
Full Text:
##### References:
 [1] Astorino, A; Fuduli, A, Support vector machine polyhedral separability in semisupervised learning, J Optim Theory Appl, 164, 1039-1050, (2015) · Zbl 1320.90077 [2] Bertsekas DP (1999) Nonlinear programming. Athena Scientific, Belmont [3] Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin [4] Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on computational learning theory, COLT ’92. ACM, New York, pp 144-152 · Zbl 06276209 [5] Byrd, RH; Chin, GM; Neveitt, W; Nocedal, J, On the use of stochastic Hessian information in optimization methods for machine learning, SIAM J Optim, 21, 977-995, (2011) · Zbl 1245.65062 [6] Carrizosa, E; Romero Morales, D, Supervised classification and mathematical optimization, Comput Oper Res, 40, 150-165, (2013) · Zbl 1349.68135 [7] Cassioli, A; Chiavaioli, A; Manes, C; Sciandrone, M, An incremental least squares algorithm for large scale linear classification, Eur J Oper Res, 224, 560-565, (2013) · Zbl 1292.90198 [8] Chang, CC; Hsu, CW; Lin, CJ, The analysis of decomposition methods for support vector machines, IEEE Trans Neural Netw Learn Syst, 11, 1003-1008, (2000) [9] Chang, KW; Hsieh, CJ; Lin, CJ, Coordinate descent method for large-scale l2-loss linear support vector machines, J Mach Learn Res, 9, 1369-1398, (2008) · Zbl 1225.68157 [10] Chapelle, O, Training a support vector machine in the primal, Neural Comput, 19, 1155-1178, (2007) · Zbl 1123.68101 [11] Chen, PH; Fan, RE; Lin, CJ, A study on smo-type decomposition methods for support vector machines, IEEE Trans Neural Netw, 17, 893-908, (2006) [12] Chiang WL, Lee MC, Lin CJ (2016) Parallel dual coordinate descent method for large-scale linear classification in multi-core environments. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16. ACM, New York, pp 1485-1494 [13] Cortes, C; Vapnik, V, Support-vector networks, Mach Learn, 20, 273-297, (1995) · Zbl 0831.68098 [14] Dai, HY; Fletcher, R, New algorithms for singly linearly constrained quadratic programs subject to lower and upper bounds, Math Programm, 106, 403-421, (2006) · Zbl 1134.90030 [15] Fan, RE; Chang, KW; Hsieh, CJ; Wang, XR; Lin, CJ, Liblinear: a library for large linear classification, J Mach Learn Res, 9, 1871-1874, (2008) · Zbl 1225.68175 [16] Fan, RE; Chen, PH; Lin, CJ, Working set selection using second order information for training support vector machines, J Mach Learn Res, 6, 1889-1918, (2005) · Zbl 1222.68198 [17] Ferris, MC; Munson, TS, Semismooth support vector machines, Math Programm B, 101, 185-204, (2004) · Zbl 1076.90041 [18] Ferris, MC; Munson, TS, Interior-point methods for massive support vector machines, SIAM J Optim, 13, 783-804, (2002) · Zbl 1039.90092 [19] Fine, S; Scheinberg, K, Efficient svm training using low-rank kernel representations, J Mach Learn Res, 2, 243-264, (2001) · Zbl 1037.68112 [20] Fletcher R (1987) Practical methods of optimization, 2nd edn. Wiley, New York · Zbl 0905.65002 [21] Franc, V; Sonnenburg, S, Optimized cutting plane algorithm for large-scale risk minimization, J Mach Learn Res, 10, 2157-2192, (2009) · Zbl 1235.68151 [22] Franc V, Sonnenburg S (2008) Optimized cutting plane algorithm for support vector machines. In: Proceedings of the 25th international conference on machine learning, ICML ’08. ACM, New York, pp 320-327 · Zbl 0997.68108 [23] Fung G, Mangasarian OL (2001) Proximal support vector machine classifiers. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’01. ACM New York, pp 77-86 · Zbl 1101.68758 [24] Gaudioso, M; Gorgone, E; Labbé, M; Rodríguez-Chía, AM, Lagrangian relaxation for svm feature selection, Comput OR, 87, 137-145, (2017) · Zbl 1391.90430 [25] Gertz, EM; Griffin, JD, Using an iterative linear solver in an interior-point method for generating support vector machines, Comput Optim Appl, 47, 431-453, (2010) · Zbl 1208.90184 [26] Glasmachers, T; Igel, C, Maximum-gain working set selection for svms, J Mach Learn Res, 7, 1437-1466, (2006) · Zbl 1222.90040 [27] Goldfarb, D; Scheinberg, K, Numerically stable ldlt factorizations in interior point methods for convex quadratic programming, IMA J Numer Anal, 28, 806-826, (2008) · Zbl 1155.65350 [28] Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore [29] Hsieh CJ, Chang KW, Lin CJ, Keerthi SS, Sundararajan S (2008) A dual coordinate descent method for large-scale linear svm. In: Proceedings of the 25th international conference on machine learning, ICML ’08. ACM, New York, pp 408-415 [30] Hsu, CW; Lin, CJ, A comparison of methods for multiclass support vector machines, IEEE Trans Neural Netw, 13, 415-425, (2002) [31] Hsu, CW; Lin, CJ, A simple decomposition method for support vector machines, Mach Learn, 46, 291-314, (2002) · Zbl 0998.68108 [32] Teo, CH; Vishwanthan, SVN; Smola, AJ; Le, QV, Bundle methods for regularized risk minimization, J Mach Learn Res, 11, 311-365, (2010) · Zbl 1242.68253 [33] Joachims T (1999) Advances in kernel methods. Chapter making large-scale support vector machine learning practical. MIT Press, Cambridge, pp 169-184 [34] Joachims T ( 2006) Training linear svms in linear time. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’06. ACM, New York, pp 217-226 · Zbl 0193.45201 [35] Joachims, T; Finley, T; Yu, CNJ, Cutting-plane training of structural svms, Mach Learn, 77, 27-59, (2009) · Zbl 1235.68161 [36] Joachims, T; Yu, CNJ, Sparse kernel svms via cutting-plane training, Mach Learn, 76, 179-193, (2009) [37] Keerthi, SS; DeCoste, D, A modified finite Newton method for fast solution of large scale linear svms, J Mach Learn Res, 6, 341-361, (2005) · Zbl 1222.68231 [38] Keerthi, SS; Gilbert, EG, Convergence of a generalized smo algorithm for svm classifier design, Mach Learn, 46, 351-360, (2002) · Zbl 0998.68109 [39] Keerthi, SS; Lin, CJ, Asymptotic behaviors of support vector machines with Gaussian kernel, Neural Comput, 15, 1667-1689, (2003) · Zbl 1086.68569 [40] Kimeldorf, GS; Wahba, G, A correspondence between Bayesian estimation on stochastic processes and smoothing by splines, Ann Math Stat, 41, 495-502, (1970) · Zbl 0193.45201 [41] Kiwiel, KC, Breakpoint searching algorithms for the continuous quadratic knapsack problem, Math Program, 112, 473-491, (2008) · Zbl 1190.90121 [42] Le, QV; Smola, AJ; Vishwanathan, S; Platt, JC (ed.); Koller, D (ed.); Singer, Y (ed.); Roweis, ST (ed.), Bundle methods for machine learning, 1377-1384, (2008), New York [43] Lee MC, Chiang WL, Lin CJ (2015) Fast matrix-vector multiplications for large-scale logistic regression on shared-memory systems. In: Aggarwal C, Zhou Z-H, Tuzhilin A, Xiong H, Wu X (eds) ICDM. IEEE Computer Society, pp 835-840 [44] Lee, YJ; Mangasarian, OL, SSVM: a smooth support vector machine for classification, Comput Optim Appl, 20, 5-22, (2001) · Zbl 1017.90105 [45] Lin, C-J; Lucidi, S; Palagi, L; Risi, A; Sciandrone, M, A decomposition algorithm model for singly linearly constrained problems subject to lower and upper bounds, J Optim Theory Appl, 141, 107-126, (2009) · Zbl 1168.90556 [46] Lin, CJ, Formulations of support vector machines: a note from an optimization point of view, Neural Comput, 13, 307-317, (2001) · Zbl 0973.68201 [47] Lin, CJ, On the convergence of the decomposition method for support vector machines, IEEE Trans Neural Netw, 12, 1288-1298, (2001) [48] Lin, CJ, Asymptotic convergence of an SMO algorithm without any assumptions, IEEE Trans Neural Netw, 13, 248-250, (2002) [49] Lin, CJ, A formal analysis of stopping criteria of decomposition methods for support vector machines, IEEE Trans Neural Netw, 13, 1045-1052, (2002) [50] Lin, CJ; Morè, JJ, Newton’s method for large bound-constrained optimization problems, SIAM J Optim, 9, 1100-1127, (1999) · Zbl 0957.65064 [51] Lucidi, S; Palagi, L; Risi, A; Sciandrone, M, A convergent decomposition algorithm for support vector machines, Comput Optim Appl, 38, 217-234, (2007) · Zbl 1172.90443 [52] Lucidi, S; Palagi, L; Risi, A; Sciandrone, M, A convergent hybrid decomposition algorithm model for svm training, IEEE Trans Neural Netw, 20, 1055-1060, (2009) [53] Mangasarian OL (1994) Nonlinear programming. Classics in applied mathematics. Society for Industrial and Applied Mathematics. ISBN: 9780898713411 · Zbl 1391.90430 [54] Mangasarian, OL, A finite Newton method for classification, Optim Methods Softw, 17, 913-929, (2002) · Zbl 1065.90078 [55] Mangasarian, OL, Exact 1-norm support vector machines via unconstrained convex differentiable minimization, J Mach Learn Res, 7, 1517-1530, (2006) · Zbl 1211.68329 [56] Mangasarian, OL; Musicant, DR, Successive overrelaxation for support vector machines, IEEE Trans Neural Netw, 10, 1032-1037, (1999) [57] Mangasarian, OL; Musicant, DR, Lagrangian support vector machines, J Mach Learn Res, 1, 161-177, (2001) · Zbl 0997.68108 [58] Mavroforakis, ME; Theodoridis, S, A geometric approach to support vector machine (SVM) classification, IEEE Trans Neural Netw, 17, 671-682, (2006) [59] Osuna E, Freund R, Girosi F (1997) An improved training algorithm for support vector machines. In: Neural networks for signal processing VII. Proceedings of the 1997 IEEE signal processing society workshop, pp 276-285 · Zbl 1222.68198 [60] Osuna E, Freund R, Girosit F (1997) Training support vector machines: an application to face detection. In Proceedings of IEEE computer society conference on computer vision and pattern recognition, pp 130-136 · Zbl 1039.90092 [61] Palagi, L; Sciandrone, M, On the convergence of a modified version of the svmlight algorithm, Optim Methods Softw, 20, 315-332, (2005) · Zbl 1072.90042 [62] Pardalos, PM; Kovoor, N, An algorithm for a singly constrained class of quadratic programs subject to upper and lower bounds, Math Program, 46, 321-328, (1990) · Zbl 0711.90061 [63] Platt JC (1999) Advances in kernel methods. Chapter fast training of support vector machines using sequential minimal optimization. MIT Press, Cambridge, pp 185-208 [64] Scholkopf B, Smola AJ (2001) Learning with Kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge [65] Shalev-Shwartz, S; Singer, Y; Srebro, N; Cotter, A, Pegasos: primal estimated sub-gradient solver for svm, Math Program, 127, 3-30, (2011) · Zbl 1211.90239 [66] Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, New York [67] Suykens, JAK; Vandewalle, J, Least squares support vector machine classifiers, Neural Process Lett, 9, 293-300, (1999) [68] Glasmachers T, Dogan U (2013) Accelerated coordinate descent with adaptive coordinate frequencies. In: Asian conference on machine learning, ACML 2013, Canberra, ACT, Australia, pp 72-86 [69] Serafini, T; Zanni, L, On the working set selection in gradient projection-based decomposition techniques for support vector machines, Optim Methods Softw, 20, 583-596, (2005) · Zbl 1116.90115 [70] Tsang, IW; Kwok, JT; Cheung, PM, Core vector machines: fast svm training on very large data sets, J Mach Learn Res, 6, 363-392, (2005) · Zbl 1222.68320 [71] Vapnik VN (1998) Statistical learning theory. Wiley-Interscience, New York [72] Wang, PW; Lin, CJ, Iteration complexity of feasible descent methods for convex optimization, J Mach Learn Res, 15, 1523-1548, (2014) · Zbl 1319.90051 [73] Wang, Z; Crammer, K; Vucetic, S, Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale svm training, J Mach Learn Res, 13, 3103-3131, (2012) · Zbl 1433.68383 [74] Woodsend, K; Gondzio, J, Hybrid mpi/openmp parallel linear support vector machine training, J Mach Learn Res, 10, 1937-1953, (2009) · Zbl 1235.68205 [75] Woodsend, K; Gondzio, J, Exploiting separability in large-scale linear support vector machine training, Comput Optim Appl, 49, 241-269, (2011) · Zbl 1219.90210
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.