Stream-suitable optimization algorithms for some soft-margin support vector machine variants. (English) Zbl 1430.62272

Summary: Soft-margin support vector machines (SVMs) are an important class of classification models that are well known to be highly accurate in a variety of settings and over many applications. The training of SVMs usually requires that the data be available all at once, in batch. The Stochastic majorization-minimization (SMM) algorithm framework allows for the training of SVMs on streamed data instead. We utilize the SMM framework to construct algorithms for training hinge loss, squared-hinge loss, and logistic loss SVMs. We prove that our three SMM algorithms are each convergent and demonstrate that the algorithms are comparable to some state-of-the-art SVM-training methods. An application to the famous MNIST data set is used to demonstrate the potential of our algorithms.


62R07 Statistical aspects of big data and data science
62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T05 Learning and adaptive systems in artificial intelligence


Pegasos; Rcpp; R; MNIST; LIBLINEAR
Full Text: DOI


[1] Abe, S. (2010). Support Vector Machines for Pattern Classification. London: Springer. · Zbl 1191.68549
[2] Bohning, D.; Lindsay, BR, Monotonicity of quadratic-approximation algorithms, Annals of the Institute of Mathematical Statistics, 40, 641-663, (1988) · Zbl 0723.65150
[3] Boyd, S., & Vandenberghe, L. (2004). Convex Optimization. Cambridge: Cambridge University Press. · Zbl 1058.90049
[4] Cappe, O.; Moulines, E., On-line expectation-maximizatoin algorithm for latent data models, Journal of the Royal Statistical Society B, 71, 593-613, (2009) · Zbl 1250.62015
[5] Chouzenoux, E.; Idier, J.; Moussaoui, S., A majorize-minimize strategy for subspace otpimization applied to image restoration, IEEE Transactions on Image Processing, 20, 1517-1528, (2011) · Zbl 1372.94051
[6] Chouzenoux, E.; Jezierska, A.; Pesquet, J-C; Talbot, H., A majorize-minimize subspace approach for \(l_2-l_0\) image regularization, SIAM Journal of Imaging Science, 6, 563-591, (2013) · Zbl 1281.65030
[7] Chouzenoux, E.; Pesquet, J-C, A stochastic majorize-minimize subspace algorithm for online penalized least squares estimation, IEEE Transactions on Signal Processing, 65, 4770-4783, (2017) · Zbl 1414.94135
[8] Cortes, C.; Vapnik, V., Support-vector networks, Machine Learning, 20, 273-297, (1995) · Zbl 0831.68098
[9] Pierro, AR, On the relation between the ISRA and the EM algorithm for positron emission tomography, IEEE Transactions on Medical Imaging, 12, 328-333, (1993)
[10] Eddelbuettel, D. (2013). Seamless R and C++ Integration with Rcpp. New York: Springer. · Zbl 1283.62001
[11] Fan, R-E; Chang, K-W; Hsieh, C-J; Wang, X-R; Lin, C-J, LIBLINEAR: a library for large linear classification, Journal of Machine Learning Research, 9, 1871-1874, (2008) · Zbl 1225.68175
[12] Groenen, PJF; Nalbantov, G.; Bioch, JC, SVM-Maj: a majorization approach to linear support vector machines with different hinge errors, Advances in Data Analysis and Classification, 2, 17-43, (2008) · Zbl 1151.90551
[13] Helleputte, T. (2017). LiblineaR: Linear Predictive Models Based on the LIBLINEAR C/C++ Library
[14] Hsia, C-Y; Zhu, Y.; Lin, C-J, A study on trust region update rules in Newton methods for large-scale linear classification, Proceedings of Machine Learning Research, 77, 33-48, (2017)
[15] Jolliffe, I. T. (2002). Principal Component Analysis. New York: Springer. · Zbl 1011.62064
[16] Kim, S., Pasupathy, R., & Henderson, S. G. (2015). Handbook of Simulation Optimization, chapter A guide to sample average approximation (pp. 207-243). New York: Springer.
[17] Lange, K. (2016). MM Optimization Algorithms. Philadelphia: SIAM. · Zbl 1357.90002
[18] LeCun, Y. (1998). The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/
[19] Lin, C-J; Weng, RC; Keerthi, SS, Trust region Newton method for large-scale logistic regression, Journal of Machine Learning Research, 9, 627-650, (2008) · Zbl 1225.68195
[20] Mairal, J. (2013). Stochastic majorization-minimization algorithms for large-scale optimization. In Advances in Neural Information Processing Systems (pp. 2283-2291)
[21] Mairal, J., Incremental majorization-minimization optimization with application to large-scale machine learning, SIAM Journal of Optimization, 25, 829-855, (2015) · Zbl 1320.90047
[22] McAfee, A.; Brynjolfsson, E.; Davenport, TH, Big data: the management revolution, Harvard Business Review, 90, 60-68, (2012)
[23] Navia-Vasquez, A.; Perez-Cruz, F.; Artes-Rodriguez, A.; Figueiras-Vidal, AR, Weighted least squares training of support vector classifiers leading to compact and adaptive schemes, IEEE Transactions on Neural Networks, 12, 1047-1059, (2001)
[24] Nguyen, H. D. & McLachlan, G. J. (2017). Iteratively-reweighted least-squares fitting of support vector machines: a majorization-minimization algorithm approach. In Proceedings of the 2017 Future Technologies Conference (FTC)
[25] Pollard, D., Asymptotics for least absolute deviation regression estimators, Econometric Theory, 7, 186-199, (1991)
[26] R Core Team (2016). R: a language and environment for statistical computing. R Foundation for Statistical Computing
[27] Razaviyayn, M.; Hong, M.; Luo, Z-Q, A unified convergence analysis of block successive minimization methods for nonsmooth optimization, SIAM Journal of Optimization, 23, 1126-1153, (2013) · Zbl 1273.90123
[28] Razaviyayn, M.; Sanjabi, M.; Luo, Z-Q, A stochastic successive minimization method for nonsmooth nonconvex optimization with applications to transceiver design in wireless communication networks, Mathematical Programming Series B, 157, 515-545, (2016) · Zbl 1357.90101
[29] Scholkopf, B., & Smola, A. J. (2002). Learning with Kernels. Cambridge: MIT Press. · Zbl 1019.68094
[30] Shalev-Shwartz, S.; Singer, Y.; Srebro, N.; Cotter, A., Pegasos: primal estimated sub-gradient solver for SVM, Mathematical Programming Series B, 127, 3-30, (2011) · Zbl 1211.90239
[31] Shawe-Taylor, J.; Sun, S., A review of optimization methodologies in support vector machines, Neurocomputing, 74, 3609-3618, (2011)
[32] Steinwart, I., & Christmann, A. (2008). Support Vector Machine. New York: Springer. · Zbl 1203.68171
[33] Titterington, DM, Recursive parameter estimation using incomplete data, Journal of the Royal Statistical Society B, 46, 257-267, (1984) · Zbl 0556.62061
[34] Zhang, T. (2004). Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the twenty-first international conference on Machine learning
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.