an:07238980
Zbl 07238980
Huang, Lingxiao; Jin, Yifei; Li, Jian
SVM via saddle point optimization: new bounds and distributed algorithms
EN
Eppstein, David (ed.), 16th Scandinavian symposium and workshops on algorithm theory. SWAT 2018, June 18--20, 2018,
Malm?? University, Malm??, Sweden. Wadern: Schloss Dagstuhl -- Leibniz Zentrum f??r Informatik (ISBN 978-3-95977-068-2). LIPIcs -- Leibniz International Proceedings in Informatics 101, Article 25, 13 p. (2018).
2018
a
68Wxx
\(\nu\)-SVM; hard-margin SVM; saddle point optimization; distributed algorithm
Summary: We study two important SVM variants: hard-margin SVM (for linearly separable cases) and \(\nu\)-SVM (for linearly non-separable cases). We propose new algorithms from the perspective of saddle point optimization. Our algorithms achieve \((1-\epsilon)\)-approximations with running time \(\tilde{O}(nd+n\sqrt{d/\epsilon})\) for both variants, where \(n\) is the number of points and \(d\) is the dimensionality. To the best of our knowledge, the current best algorithm for \(\nu\)-SVM is based on quadratic programming approach which requires \(\Omega(n^2 d)\) time in worst case [\textit{T. Joachims}, Making large-scale SVM learning practical. Techn. Rep., Technische Universit??t Dortmund. (1998; \url{doi:10.17877/DE290R-5098}); \textit{J.C. Platt}, ``Fast training of support vector machines using sequential minimal optimization'', in: Advances in kernel methods: support vector learning. Cambridge: MIT Press. 185--208 (1999)]. In the paper, we provide the first nearly linear time algorithm for \(\nu\)-SVM. The current best algorithm for hard margin SVM achieved by Gilbert algorithm [\textit{B. G??rtner} and \textit{M. Jaggi}, in: Proceedings of the 25th annual symposium on computational geometry, SCG 2009, Aarhus, Denmark, June 8--10, 2009. New York, NY: Association for Computing Machinery (ACM). 33--42 (2009; Zbl 1380.68396)] requires \(O(nd/\epsilon)\) time. Our algorithm improves the running time by a factor of \(\sqrt{d}/\sqrt{\epsilon}\). Moreover, our algorithms can be implemented in the distributed settings naturally. We prove that our algorithms require \(\tilde{O}(k(d +\sqrt{d/\epsilon}))\) communication cost, where \(k\) is the number of clients, which almost matches the theoretical lower bound. Numerical experiments support our theory and show that our algorithms converge faster on high dimensional, large and dense data sets, as compared to previous methods.
For the entire collection see [Zbl 1390.68019].
Zbl 1380.68396