×

zbMATH — the first resource for mathematics

An efficient augmented Lagrangian method for support vector machine. (English) Zbl 1454.90091
Summary: Support vector machine (SVM) has proved to be a successful approach for machine learning. Two typical SVM models are the L1-loss model for support vector classification (SVC) and \(\epsilon\)-L1-loss model for support vector regression (SVR). Due to the non-smoothness of the L1-loss function in the two models, most of the traditional approaches focus on solving the dual problem. In this paper, we propose an augmented Lagrangian method for the L1-loss model, which is designed to solve the primal problem. By tackling the non-smooth term in the model with Moreau-Yosida regularization and the proximal operator, the subproblem in augmented Lagrangian method reduces to a non-smooth linear system, which can be solved via the quadratically convergent semismooth Newton’s method. Moreover, the high computational cost in semismooth Newton’s method can be significantly reduced by exploring the sparse structure in the generalized Jacobian. Numerical results on various datasets in LIBLINEAR show that the proposed method is competitive with the most popular solvers in both speed and accuracy.
MSC:
90C30 Nonlinear programming
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Basak, D.; Pal, S.; Patranabis, D., Support vector regression, Neural Inform. Process. Lett. Rev., 11, 10, 203-224 (2007)
[2] Boser, B.E., Guyon, I.M., and Vapnik, V.N., A training algorithm for optimal margin classifiers, COLT’92: Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, Pittsburgh, PA, 27-29 July 1992, pp. 144-152.
[3] Chapelle, O., Training a support vector machine in the primal, Neural Comput., 19, 5, 1155-1178 (2007) · Zbl 1123.68101
[4] Chauhan, V. K.; Dahiya, K.; Sharma, A., Problem formulations and solvers in linear SVM: A review, Artif. Intell. Rev., 52, 2, 803-855 (2019)
[5] Clarke, F. H., Optimization and Nonsmooth Analysis (1933), John Wiley & Sons: John Wiley & Sons, New York · Zbl 0727.90045
[6] Collins, M.; Globerson, A.; Koo, T.; Carreras, X.; Bartlett, P., Exponentiated gradient algorithms for conditional random elds and max-margin Markov networks, J. Mach. Learn. Res., 9, 1775-1822 (2008) · Zbl 1225.68167
[7] Cortes, C.; Vapnik, V. N., Support-vector networks, Mach. Learn., 20, 3, 273-297 (1995) · Zbl 0831.68098
[8] Drucker, H.; Burges, C. J.C.; Kaufman, L.; Smola, A. J.; Vapnik, V. N., Support vector regression machines, Adv. Neural. Inf. Process. Syst., 28, 779-784 (1997)
[9] Fan, R.; Chang, K.; Hsieh, C.; Wang, X.; Lin, C., LIBLINEAR: A library for large linear classification, JMLR, 9, 1871-1874 (2008) · Zbl 1225.68175
[10] Gu, W.; Chen, W. P.; Ko, C. H., Two smooth support vector machines for ε-insensitive regression, Comput. Optim. Appl., 70, 1-29 (2018) · Zbl 1418.90255
[11] Hestenes, M. R.; Stiefel, E., Methods of conjugate gradients for solving linear systems, J. Res. Natl. Bur. Stand., 49, 409-436 (1952) · Zbl 0048.09901
[12] Ho, C. H.; Lin, C.-J., Large-scale linear support vector regression, J. Mach. Learn. Res., 13, Nov, 3323-3348 (2012) · Zbl 1433.68349
[13] Hsieh, C.J., Chang, K.W., Lin, C.-J., Keerthi, S.S., and Sundararajan, S., A dual coordinate descent method for large-scale linear SVM, Proceedings of the 25th International Conference on Machine Learning, Helsinki, pp. 408-415.
[14] Ito, N.; Takeda, A.; Toh, K.-C., A unified formulation and fast accelerated proximal gradient method for classification, JMLR, 18, 1, 510-558 (2017)
[15] Joachims, T., Making large-scale support vector machine learning practical, in Advances in Kernel Methods - Support Vector Learning, B. Scholkopf, C. Burges, and A. Smola, eds., MIT Press, Cambridge, MA, 1999, pp. 169-184.
[16] Joachims, T., Training linear SVMs in linear time, Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Philadelphia, 2006, pp. 217-226.
[17] Kanzow, C., Inexact semismooth Newton methods for large-scale complementarity problems, Optim. Methods Softw., 19, 3-4, 309-325 (2004) · Zbl 1141.90558
[18] Kivinen, J.; Smola, A. J.; Williamson, R. C., Online learning with kernels, IEEE Trans. Signal Process., 52, 8, 2165-2176 (2002) · Zbl 1369.68281
[19] Li, Q. N.; Qi, H.-D., A sequential semismooth Newton method for the nearest low-rank correlation matrix problem, SIAM J. Optim., 21, 4, 1641-1666 (2011) · Zbl 1236.49070
[20] Li, X. D.; Sun, D. F.; Toh, K.-C., A highly efficient semismooth Newton augmented Lagrangian method for solving Lasso problems, SIAM J. Optim., 28, 1, 433-458 (2018) · Zbl 1392.65062
[21] Liu, Y.-J.; Sun, D. F.; Toh, K.-C., An implementable proximal point algorithmic framework for nuclear norm minimization, Math. Program., 133, 12, 399-436 (2012) · Zbl 1262.90125
[22] Luo, Z.Y., Sun, D.F., and Toh, K.-C., Solving the OSCAR and SLOPE models using a semismooth Newton-based augmented Lagrangian method, preprint (2018). Available at arXiv, math. OC/1803.10740. · Zbl 1434.68430
[23] Mifflin, R., Semismooth and semiconvex functions in constrained optimization, SIAM J. Control Optim., 15, 6, 959-972 (1977) · Zbl 0376.90081
[24] Moreau, J. J., Proximite dt dualite dans un espace hilbertien, Bulletin de la Societe Mathematique de France, 93, 273-299 (1965) · Zbl 0136.12101
[25] Nie, F., Huang, Y., Wang, X., and Huang, H., New primal SVM solver with linear computational cost for big data classifications, Proceedings of the 31st International Conference on Machine Learning, Beijing, Vol. 32, 2014, pp. II-505-II-513.
[26] Niu, D., Wang, C., Tang, P., Wang, Q., and Song, E., A sparse semismooth Newton based augmented Lagrangian method for large-scale support vector machines, 2019. Available at arXiv, math. OC/1910.01312.
[27] Platt, J.C., Fast training of support vector machines using sequential minimal optimization, in Advances in Kernel Methods - Support Vector Learning, B. Scholkopf, C. Burges, and A. Smola, eds., MIT Press, Cambridge, MA, 1998, pp. 185-208.
[28] Qi, H.-D., A semismooth Newton method for the nearest Euclidean distance matrix problem, SIAM J. Matrix Anal. Appl., 34, 1, 67-93 (2013) · Zbl 1266.49052
[29] Qi, L.; Sun, J., A nonsmooth version of Newton’s method, Math. Program., 58, 1-3, 353-367 (1993) · Zbl 0780.90090
[30] Qi, H.-D.; Sun, D. F., A quadratically convergent Newton method for computing the nearest correlation matrix, SIAM J. Matrix Anal. Appl., 28, 2, 360-385 (2006) · Zbl 1120.65049
[31] Qi, H.-D.; Yuan, X. M., Computing the nearest Euclidean distance matrix with low embedding dimensions, Math. Program., 147, 1-2, 351-389 (2014) · Zbl 1304.49051
[32] Rockafellar, R. T., Augmented Lagrangians and applications of the proximal point algorithm in convex programming, Math. Oper. Res., 1, 2, 97-116 (1976) · Zbl 0402.90076
[33] Rockafellar, R. T., Monotone operators and the proximal point algorithm, SIAM J. Control Optim., 14, 5, 877-898 (1976) · Zbl 0358.90053
[34] Shalev-Shwartz, S., Singer, Y., and Srebro, N., Pegasos: Primal estimated sub-gradient solver for SVM, International Conference on Machine Learning, Oregon State University, Corvallis, 2007. · Zbl 1211.90239
[35] Smola, A.J., Regression estimation with support vector learning machines, Master thesis, Technische Universität München, 1996.
[36] Smola, A.J., Vishwanathan, S.V.N., and Le, Q., Bundle methods for machine learning, in Advances in Neural Information Processing Systems 20, J.C. Platt, D. Koller, Y. Singer, and S. Roweis, eds., MIT Press, Cambridge, MA, 2008, pp. 1377-1384.
[37] Sun, J., On monotropic piecewise quadratic programming, PhD thesis, Department of Mathematics, University of Washington, 1986.
[38] Sun, D. F.; Sun, J., Semismooth matrix-valued functions, Math. Oper. Res., 27, 150-169 (2002) · Zbl 1082.49501
[39] Sun, D. F.; Toh, K.-C.; Yang, L. Q., An efficient inexact ABCD method for least squares semidefinite programming, SIAM J. Optim., 26, 2, 1072-1100 (2016) · Zbl 1346.90658
[40] Sun, D.F., Toh, K.-C., and Yuan, Y.C., Convex clustering: Model, theoretical guarantee and efficient algorithm, preprint (2018). Available at arXiv, cs. LG/1810.02677.
[41] Sun, D.F., Toh, K.-C., Yuan, Y.C., and Zhao, X.Y., SDPNAL+: A Matlab software for semidefinite programming with bound constraints (version 1.0), Optim. Methods Softw. (2019), pp. 1-29. Available at arXiv, math. OC/1710.10604. · Zbl 1432.90104
[42] Takáč, M., Bijral, A., Richtárik, P., and Srebro, N., Mini-batch primal and dual methods for SVMs, Proceedings of the 30th International Conference on Machine Learning, Atlanta, 2013.
[43] Yang, L. Q.; Sun, D. F.; Toh, K.-C., SDPNAL+: A majorized semismooth Newton-CG augmented Lagrangian method for semidefinite programming with nonnegative constraints, Math. Program. Comput., 7, 3, 331-366 (2015) · Zbl 1321.90085
[44] Yin, J.; Li, Q. N., A semismooth Newton method for support vector classification and regression, Comput. Optim. Appl., 73, 2, 477-508 (2019) · Zbl 1423.90276
[45] Yosida, K., Functional Analysis (1964), Springer Verlag: Springer Verlag, Berlin · Zbl 0126.11504
[46] Zhai, F. Z.; Li, Q. N., A Euclidean distance matrix model for protein molecular conformation, J. Global Optim. (2019)
[47] Zhang, T., Solving large scale linear prediction problems using stochastic gradient descent algorithms, Proceedings of the 21th International Conference on Machine Learning, Banff, Alberta, 2004.
[48] Zhao, X.Y., A semismooth Newton-CG augmented Lagrangian method for large scale linear and convex quadratic SDPS, Ph.D. diss., National University of Singapore, 2009.
[49] Zhao, X. Y.; Sun, D. F.; Toh, K.-C., A Newton-CG augmented Lagrangian method for semidefinite programming, SIAM J. Optim., 20, 1737-1765 (2010) · Zbl 1213.90175
[50] Zhong, P.; Fukushima, M., Regularized nonsmooth Newton method for multi-class support vector machines, Optim. Methods Softw., 22, 1, 225-236 (2007) · Zbl 1188.68256
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.