×

zbMATH — the first resource for mathematics

A semismooth Newton method for support vector classification and regression. (English) Zbl 1423.90276
Summary: Support vector machine is an important and fundamental technique in machine learning. In this paper, we apply a semismooth Newton method to solve two typical SVM models: the L2-loss SVC model and the \(\epsilon \)-L2-loss SVR model. The semismooth Newton method is widely used in optimization community. A common belief on the semismooth Newton method is its fast convergence rate as well as high computational complexity. Our contribution in this paper is that by exploring the sparse structure of the models, we significantly reduce the computational complexity, meanwhile keeping the quadratic convergence rate. Extensive numerical experiments demonstrate the outstanding performance of the semismooth Newton method, especially for problems with huge size of sample data (for news20.binary problem with 19,996 features and 1,355,191 samples, it only takes 3 s). In particular, for the \(\epsilon \)-L2-loss SVR model, the semismooth Newton method significantly outperforms the leading solvers including DCD and TRON.

MSC:
90C53 Methods of quasi-Newton type
Software:
LIBLINEAR; SSVM
PDF BibTeX XML Cite
Full Text: DOI arXiv
References:
[1] Al-Mubaid, H.; Umair, SA, A new text categorization technique using distributional clustering and learning logic, IEEE Trans. Knowl. Data Eng., 18, 1156-1165, (2006)
[2] Barzilai, J.; Borwein, JM, Two-point step size gradient methods, IMA J. Numer. Anal., 8, 141-148, (1988) · Zbl 0638.65055
[3] Basak, D.; Pal, S.; Patranabis, DC, Support vector regression, Neural Inf. Process.-Lett. Rev., 11, 203-224, (2007)
[4] Boser, B.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceeding COLT ’92 Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144-152. ACM, Pittsburgh (1992)
[5] Bottou, L.; Curtis, FE; Nocedal, J., Optimization methods for large-scale machine learning, SIAM Rev., 60, 223-311, (2018) · Zbl 1397.65085
[6] Chang, KW; Hsieh, CJ; Lin, CJ, Coordinate descent method for large-scale L2-loss linear support vector machines, J. Mach. Learn. Res., 9, 1369-1398, (2008) · Zbl 1225.68157
[7] Chen, Z.; Qi, L., A semismooth Newton method for tensor eigenvalue complementarity problem, Comput. Optim. Appl., 65, 109-126, (2016) · Zbl 1377.90096
[8] Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley, Hoboken (1983) · Zbl 0582.49001
[9] Cortes, C.; Vapnik, V., Support-vector networks., Mach. Learn., 20, 273-297, (1995) · Zbl 0831.68098
[10] Cruz, JYB; Ferreira, OP; Prudente, LF, On the global convergence of the inexact semi-smooth Newton method for absolute value equation, Comput. Optim. Appl., 65, 1-16, (2016) · Zbl 1353.90155
[11] Facchinei, F.; Kanzow, C.; Karl, S.; etal., The semismooth Newton method for the solution of quasi-variational inequalities, Comput. Optim. Appl., 62, 85-109, (2015) · Zbl 1331.90083
[12] Fan, RE; Chang, KW; Hsieh, CJ; etal., LIBLINEAR: a library for large linear classification, J. Mach. Learn. Res., 9, 1871-1874, (2008) · Zbl 1225.68175
[13] Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics, New York (2001) · Zbl 0973.62007
[14] Gu, W.; Chen, WP; Ko, CH; etal., Two smooth support vector machines for \(\epsilon \)-insensitive regression, Comput. Optim. Appl., 70, 1-29, (2018)
[15] Hestenes, MR; Stiefel, E., Methods of conjugate gradients for solving linear systems, J. Res. Nat. Bur. Stand., 49, 409-436, (1952) · Zbl 0048.09901
[16] Ho, CH; Lin, CJ, Large-scale linear support vector regression, J. Mach. Learn. Res., 13, 3323-3348, (2012) · Zbl 1433.68349
[17] Hsia, C.Y., Zhu, Y., Lin, C.J.: A study on trust region update rules in Newton methods for large-scale linear classification. In: Workshop and Conference Proceedings, pp. 1-16 (2017)
[18] Hsieh, C.J., Chang, K.W., Lin, C.J., et al.: A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 25th International Conference on Machine Learning, pp. 408-415. ACM (2008)
[19] Johnson, R.; Zhang, T., Accelerating stochastic gradient descent using predictive variance reduction, Adv. Neural Inf. Process. Syst., 26, 315-323, (2013)
[20] Keerthi, SS; DeCoste, D., A modified finite Newton method for fast solution of large scale linear SVMs, J. Mach. Learn. Res., 6, 341-361, (2005) · Zbl 1222.68231
[21] Labusch, K.; Barth, E.; Martinetz, E., Simple method for high-performance digit recognition based on sparse coding, IEEE Trans. Neural Netw., 19, 1985-1989, (2008)
[22] Lee, YJ; Mangasarian, OL, SSVM: a smooth support vector machine for classification, Comput. Optim. Appl., 20, 5-22, (2001) · Zbl 1017.90105
[23] Li, QN; Qi, HD, A sequential semismooth Newton method for the nearest low-rank correlation matrix problem, SIAM J. Optim., 21, 1641-1666, (2011) · Zbl 1236.49070
[24] Li, XD; Sun, DF; Toh, KC, A highly efficient semismooth Newton augmented Lagrangian method for solving Lasso problems, SIAM J. Optim., 28, 433-458, (2018) · Zbl 1392.65062
[25] Lin, C.J., Weng, R.C., Keerthi, S.S.: Trust region newton methods for large-scale logistic regression. In: Proceedings of the 24th International Conference on Machine Learning, pp. 561-568. ACM (2007)
[26] Luo, Z.Y., Sun, D.F., Toh, K.C., et al.: Solving the OSCAR and SLOPE models using a semismooth Newton-based augmented Lagrangian method (2018). arXiv preprint arXiv:1803.10740
[27] Mangasarian, OL, A finite newton method for classification, Optim. Methods Softw., 17, 913-929, (2002) · Zbl 1065.90078
[28] Mifflin, R., Semismooth and semiconvex functions in constrained optimization, SIAM J. Control Optim., 15, 959-972, (1977) · Zbl 0376.90081
[29] Qi, HD, A semismooth Newton method for the nearest Euclidean distance matrix problem, SIAM J. Matrix Anal. Appl., 34, 67-93, (2013) · Zbl 1266.49052
[30] Qi, H.D., Shen, J., Xiu, N.H.: A sequential majorization method for approximating weighted time series of finite rank. Stat. Interface (2017) · Zbl 06944671
[31] Qi, HD; Sun, DF, A quadratically convergent newton method for computing the nearest correlation matrix, SIAM J. Matrix Anal. Appl., 28, 360-385, (2006) · Zbl 1120.65049
[32] Qi, L.: C-differentiability, C-differential operators and generalized Newton methods. Applied Mathematics Report AMR96/5, University of New South Wales, Sydney, Australia (1996)
[33] Qi, L.; Sun, J., A nonsmooth version of Newton’s method, Math. Program., 58, 353-367, (1993) · Zbl 0780.90090
[34] Robbins, H.; Monro, S., A stochastic approximation method, Ann. Math. Stat., 22, 400-407, (1951) · Zbl 0054.05901
[35] Rockafellar, R.T., Wets, R.J.B.: Variational analysis. In: Sobolev and BV Spaces, MPS-SIAM Series on Optimization, vol. 30, pp. 324-326 (1998) · Zbl 0888.49001
[36] Tan, C., Ma, S., Dai, Y.H., et al,: Barzilai-Borwein step size for stochastic gradient descent. In: The 13th Annual Conference on Neural Information Processing Systems (NIPS). Curran Associates Inc., pp. 685-693 (2016)
[37] Vapnik, V.: The nature of statistical learning. Springer, New York (1995) · Zbl 0833.62008
[38] Vapnik, V.; Golowich, SE; Smola, AJ, Support vector method for function approximation, regression estimation and signal processing, Adv. Neural Inf. Process. Syst., 9, 281-287, (1970)
[39] Vapnik, V.N., Kotz, S.: Estimation of Dependences Based on Empirical Data. Springer-Verlag, New York (1982) · Zbl 0499.62005
[40] Wu, K.; Yap, KH, Fuzzy SVM for content-based image retrieval: a pseudo-label support vector machine framework, IEEE Comput. Intell. Mag., 1, 10-16, (2006)
[41] Yuan, Y.B., Huang, T.Z.: A polynomial smooth support vector machine for classification. In: International Conference on Advanced Data Mining and Applications. Springer, Berlin, Heidelberg, pp. 157-164 (2005)
[42] Yuan, Y.C., Sun, D.F., Toh, K.C.: An efficient semismooth Newton based algorithm for convex clustering (2018). arXiv preprint arXiv:1802.07091
[43] Zhang, L.; Zhou, W., On the sparseness of 1-norm support vector machines, Neural Netw., 23, 373-385, (2010) · Zbl 1396.68109
[44] Zhao, X.Y.: A Semismooth Newton-CG Augmented Lagrangian Method for Large Scale Linear and Convex Quadratic SDPS. Ph D (2009)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.