×

zbMATH — the first resource for mathematics

Pegasos: primal estimated sub-gradient solver for SVM. (English) Zbl 1211.90239
Summary: We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy \({\varepsilon}\) is \({\tilde{O}(1 / \varepsilon)}\), where each iteration operates on a single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require \({\Omega(1 / \varepsilon^2)}\) iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with \(1/\lambda \), where \(\lambda \) is the regularization parameter of SVM. For a linear kernel, the total run-time of our method is \({\tilde{O}(d/(\lambda \varepsilon))}\), where \(d\) is a bound on the number of non-zero features in each example. Since the run-time does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach also extends to non-linear kernels while working solely on the primal objective function, though in this case the runtime does depend linearly on the training set size. Our algorithm is particularly well suited for large text classification problems, where we demonstrate an order-of-magnitude speedup over previous SVM learning methods.

MSC:
90C30 Nonlinear programming
Software:
Pegasos; SVMlight
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Amari S.: Natural gradient works efficiently in learning. Neural Comput. 10, 251–276 (1998) · doi:10.1162/089976698300017746
[2] Bordes A., Ertekin S., Weston J., Bottou L.: Fast kernel classifiers with online and active learning. J. Mach. Learn. Res. 6, 1579–1619 (2005) · Zbl 1222.68152
[3] Bottou L.: Online Algorithms and Stochastic Approximations. In: Saad, D. (eds) Online learning and neural networks, Cambridge University Press, Cambridge (1998) · Zbl 0968.68127
[4] Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Advances in Neural Information Processing Systems 20, pp. 161–168 (2008)
[5] Bottou L., LeCun Y.: Large scale online learning. In: Thrun, S., Saul, L., Schölkopf, B. (eds) Advances in Neural Information Processing Systems 16, MIT Press, Cambridge (2004)
[6] Bottou L., Murata N.: Stochastic approximations and efficient learning. In: Arbib, M.A. (eds) The Handbook of Brain Theory and Neural Networks, The MIT Press, Cambridge (2002)
[7] Boyd S., Vandenberghe L.: Convex Optimization, 2nd edn. Cambridge University Press, Cambridge (2004) · Zbl 1058.90049
[8] Censor Y., Zenios S.: Parallel Optimization: Theory, Algorithms, and Applications. Oxford University Press, New York (1997) · Zbl 0945.90064
[9] Cesa-Bianchi N., Conconi A., Gentile C.: On the generalization ability of on-line learning algorithms. IEEE Trans. Inf, Theory 50(9), 2050–2057 (2004) · Zbl 1295.68182 · doi:10.1109/TIT.2004.833339
[10] Chapelle, O.: Training a support vector machine in the primal. Neural Comput. 19(5), 1155–1178 (2007). doi: 10.1162/neco.2007.19.5.1155 . http://www.mitpressjournals.org/doi/abs/10.1162/neco.2007.19.5.1155 · Zbl 1123.68101
[11] Crammer K., Dekel O., Keshet J., Shalev-Shwartz S., Singer Y.: Online passive aggressive algorithms. J. Mach. Learn. Res. 7, 551–585 (2006) · Zbl 1222.68177
[12] Cristianini N., Shawe-Taylor J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000) · Zbl 0994.68074
[13] Do, C., Le, Q., Foo, C.: Proximal regularization for online and batch learning. In: Proceedings of the 26th International Conference on Machine Learning (2009)
[14] Duda R.O., Hart P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973) · Zbl 0277.68056
[15] Fine S., Scheinberg K.: Efficient SVM training using low-rank kernel representations. J. Mach. Lear. Res. 2, 242–264 (2001) · Zbl 1037.68112
[16] Freund Y., Schapire R.E.: Large margin classification using the perceptron algorithm. Mach. Learn. 37(3), 277–296 (1999) · Zbl 0944.68535 · doi:10.1023/A:1007662407062
[17] Hazan, E., Kalai, A., Kale, S., Agarwal, A.: Logarithmic regret algorithms for online convex optimization. In: Proceedings of the Nineteenth Annual Conference on Computational Learning Theory (2006) · Zbl 1143.90371
[18] Hsieh, C., Chang, K., Lin, C., Keerthi, S., Sundararajan, S.: A dual coordinate descent method for large-scale linear SVM. In: ICML, pp. 408–415 (2008)
[19] Hush, D., Kelly, P., Scovel, C., Steinwart, I.: Qp algorithms with guaranteed accuracy and run time for support vector machines. J. Mach. Learn. Res. (2006) · Zbl 1222.68221
[20] Joachims T.: Making large-scale support vector machine learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds) Advances in Kernel Methods–Support Vector Learning., MIT Press, Cambridge (1998)
[21] Joachims, T.: Training linear SVMs in linear time. In: Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), pp. 216–226 (2006)
[22] Kakade, S., Tewari, A.: On the generalization ability of online strongly convex programming algorithms. In: Advances in Neural Information Processing Systems 22 (2009)
[23] Kimeldorf G., Wahba G.: Some results on tchebycheffian spline functions. J. Math. Anal. Appl. 33, 82–95 (1971) · Zbl 0201.39702 · doi:10.1016/0022-247X(71)90184-3
[24] Kivinen J., Smola A.J., Williamson R.C.: Online learning with kernels. IEEE Trans. Signal Process. 52(8), 2165–2176 (2002) · Zbl 1369.68281 · doi:10.1109/TSP.2004.830991
[25] Kushner H., Yin G.: Stochastic Approximation Algorithms and Applications. Springer, New York (1997) · Zbl 0914.60006
[26] Murata N.: A statistical study of on-line learning. In: Saad, D. (eds) Online Learning and Neural Networks, Cambridge University Press, Cambridge (1998) · Zbl 0966.68170
[27] Murata N., Amari S.: Statistical analysis of learning dynamics. Signal Process. 74(1), 3–28 (1999) · Zbl 0922.68094 · doi:10.1016/S0165-1684(98)00206-0
[28] Nesterov, Y.: Primal-dual subgradient methods for convex problems. Tech. rep., Center for Operations Research and Econometrics (CORE), Catholic University of Louvain (UCL) (2005) · Zbl 1191.90038
[29] Platt J.C.: Fast training of Support Vector Machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds) Advances in Kernel Methods–Support Vector Learning, MIT Press, Cambridge (1998)
[30] Rockafellar R.: Convex Analysis. Princeton University Press, Princeton (1970) · Zbl 0193.18401
[31] Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. In: Proceedings of the 24th International Conference on Machine Learning, pp. 807–814 (2007) · Zbl 1211.90239
[32] Shalev-Shwartz, S., Srebro, N.: SVM optimization: inverse dependence on training set size. In: Proceedings of the 25th International Conference on Machine Learning, pp. 928–935 (2008)
[33] Smola, A., Vishwanathan, S., Le, Q.: Bundle methods for machine learning. In: Advances in Neural Information Processing Systems 21 (2007)
[34] Spall J.C.: Introduction to Stochastic Search and Optimization. Wiley, New York (2003) · Zbl 1088.90002
[35] Sridharan, K., Srebro, N., Shalev-Shwartz, S.: Fast rates for regularized objectives. In: Advances in Neural Information Processing Systems 22 (2009)
[36] Vapnik V.N.: Statistical Learning Theory. Wiley, New York (1998) · Zbl 0935.62007
[37] Zhang, T.: Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the Twenty-First International Conference on Machine Learning (2004)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.