×

Incremental accelerated gradient methods for SVM classification: study of the constrained approach. (English) Zbl 1331.68182

Summary: We investigate constrained first order techniques for training support vector machines (SVM) for online classification tasks. The methods exploit the structure of the SVM training problem and combine ideas of incremental gradient technique, gradient acceleration and successive simple calculations of Lagrange multipliers. Both primal and dual formulations are studied and compared. Experiments show that the constrained incremental algorithms working in the dual space achieve the best trade-off between prediction accuracy and training time. We perform comparisons with an unconstrained large scale learning algorithm (Pegasos stochastic gradient) to emphasize that our choice can remain competitive for large scale learning due to the very special structure of the training problem.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
65K05 Numerical mathematical programming methods
90C20 Quadratic programming
90C30 Nonlinear programming
90C90 Applications of mathematical programming
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Balakrishna P, Raman S, Santosa B, Trafalis TB (2008) Support vector regression for determining the minimum zone sphericity. Int J Adv Manuf Technol 35:916-923 · doi:10.1007/s00170-006-0774-1
[2] Bertsekas DP (1996) A New Class of Incremental Gradient Methods for Least Squares Problems, Technical Report, Department of Electrical Engineering and Computer Science. MIT, Cambridge · Zbl 1211.90239
[3] Bertsekas DP (2010) Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey, Technical Report LIDS-2848. Laboratory for Information and Decision Systems, MIT, Cambridge
[4] Bordes A, Ertekin S, Weston J, Bottou L (2005) Fast Kernel classifiers with online and active learning. J Mach Learn Res 6:1579-1619 · Zbl 1222.68152
[5] Bottou L, Bousquet O (2008) The tradeoffs of large scale learning, advances in neural information processing systems, 20. MIT Press, Cambridge
[6] Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2:121-167 · doi:10.1023/A:1009715923555
[7] Chapelle O (2007) Training a support vector machine in the primal. Neural Comp 19:1155-1178 · Zbl 1123.68101 · doi:10.1162/neco.2007.19.5.1155
[8] Chang C-C, Lin C-J (2010) LIBSVM—A Library for Support Vector Machines. http://www.csie.ntu.edu.tw/cjlin/libsvmtools/datasets/ Department of Computer Science National Taiwan University, Taipei 106, Taiwan · Zbl 1267.65062
[9] Couellan NP, Trafalis TB (2013) Online SVM learning with an incremental primal-dual technique. Optim Method Softw 28(2):256-275 · Zbl 1267.65062 · doi:10.1080/10556788.2011.633705
[10] Couellan NP, Trafalis TB (2013) An incremental primal-dual method for nonlinear programming with special structure, Optim Lett 7(1): 51-62 · Zbl 1261.90054
[11] Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machine. Cambridge University Press, Cambridge · Zbl 0994.68074
[12] Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895-1924 · doi:10.1162/089976698300017197
[13] Frank A, Asuncion A (2010) UCI Machine Learning Repository. http://archive.ics.uci.edu/ml, University of California, School of Information and Computer Science, Irvine · Zbl 1039.68104
[14] Gonzaga CC, Karas EW (2008) Optimal Steepest descent algorithms for unconstrained convex problems: fine tuning Nesterov’s method. Federal University of Paran, Brazil Technical Report
[15] Gonzaga CC, Karas EW, Rossetto DR (2011) An Optimal Algorithm for Constrained Differentiable Convex Optimization. Technical Report, Federal University of Paraná, Brazil · Zbl 1288.65087
[16] Joachims T (2006) Training Linear SVMs in Linear Time. In: Proceedings of the ACM conference on knowledge discovery and data mining (KDD). ACM, USA
[17] Lee YJ, Mangasarian OL (2001) RSVM: Reduced Support Vector Machines. In: Proceedings of the SIAM international conference on data mining. SIAM, Philadelphia
[18] Lee YJ, Mangasarian OL, Wolberg WH (2000) Breast Cancer Survival and Chemotherapy: A Support Vector Machine Analysis, Data Mining Institute Technical Report 99-10, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol 55. American Mathematical Society, pp 1-10 · Zbl 1133.62362
[19] Mangasarian OL, Musicant DR (1999) Successive overrelaxation for support vector machines. IEEE Trans Neural Netw 10:1032-1037 · doi:10.1109/72.788643
[20] Matlab (1994-2010) http://www.mathworks.com, The Math-Works, Inc., Natwick
[21] Nadeau C, Bengio Y (2003) Inference for the generalization error. Mach Learn 52(3):239-281 · Zbl 1039.68104 · doi:10.1023/A:1024068626366
[22] Nesterov Y (2004) Introductory lectures on convex optimization. Applied optimization. Kluwer Academic Publishers, Boston · Zbl 1086.90045 · doi:10.1007/978-1-4419-8853-9
[23] Platt JC (1999) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B and Burges, Christopher JC and Smola Alexander J (eds) Advances in kernel methods. MIT Press, Cambridge, pp 185-208
[24] Schölkopf B, Smola A (2002) Learning with Kernels. MIT, Cambridge
[25] Shalev-Shwartz SS, Singer Y, Srebro N, Cotter A (2011) Pegasos: primal estimated sub-gradient solver for SVM. Math Prog Ser B 127:3-30 · Zbl 1211.90239 · doi:10.1007/s10107-010-0420-4
[26] Son H-J, Trafalis TB (2006) Detection of tornados using an incremental revised support vector machine with filters. Int Conf Comput Sci 3:506-513
[27] Sra S, Nowozin S, Wright SJ (eds) (2011) Optimization for Machine Learning. MIT Press, Cambridge
[28] Trafalis TB, Adrianto I, Richman MB (2010) Machine Learning Techniques for Imbalanced Data: An Application for Tornado Detection, In: Dagli CH (ed) Part IV: General Engineering Systems from: Intelligent Engineering Systems through Artificial Neural Networks, vol 20
[29] Trafalis TB, Couellan NP, Li P-I, Stumpf G, White A (1997) Affine scaling neural network training algorithm for prediction of tornados. In: Intelligent engineering systems through artificial neural networks. New York, pp 213-218
[30] Tseng P (1998) An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM J Optim 8(2):506-531 · Zbl 0922.90131 · doi:10.1137/S1052623495294797
[31] Vapnik V (1998) Statistical learning theory. Wiley, New York · Zbl 0935.62007
[32] Zhang X, Saha A, Vishwanathan SVN (2010) Lower Bounds on Rate of Convergence of Cutting Plane Methods. In: Proceeding of advances in neural information processing systems 23: 24th annual conference on neural information processing systems 2010. Vancouver, British Columbia
[33] Zhou T, Tao D, Wu X (2010) NESVM: a fast gradient method for support vector machines In: Proceedings of the 2010 IEEE international conference on data mining. IEEE Computer Society, Washington, DC, pp 679-688
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.