zbMATH — the first resource for mathematics

The common-directions method for regularized empirical risk minimization. (English) Zbl 07064038
Summary: State-of-the-art first- and second-order optimization methods are able to achieve either fast global linear convergence rates or quadratic convergence, but not both of them. In this work, we propose an interpolation between first- and second-order methods for regularized empirical risk minimization that exploits the problem structure to efficiently combine multiple update directions. Our method attains both optimal global linear convergence rate for first-order methods, and local quadratic convergence. Experimental results show that our method outperforms state-of-the-art first- and second-order optimization methods in terms of the number of data accesses, while is competitive in training time.
68T05 Learning and adaptive systems in artificial intelligence
Full Text: Link
[1] Bernhard E. Boser, Isabelle Guyon, and Vladimir Vapnik. A training algorithm for optimal margin classifiers. InProceedings of the Fifth Annual Workshop on Computational Learning Theory, pages 144-152. ACM Press, 1992.
[2] Stephen Boyd and Lieven Vandenberghe.Convex Optimization. Cambridge University Press, 2004. · Zbl 1058.90049
[3] S´ebastien Bubeck. Convex optimization: Algorithms and complexity.Foundations and Trends in Machine Learning, 8(3-4):231-357, 2015.
[4] S´ebastien Bubeck, Yin Tat Lee, and Mohit Singh. A geometric alternative to Nesterov’s accelerated gradient descent. Technical report, 2015. arXiv preprint arXiv:1506.08187.
[5] Kai-Wei Chang and Dan Roth. Selective block minimization for faster convergence of limited memory large-scale linear models.InProceedings of the Seventeenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011.
[6] John E. Dennis, Jr and Jorge J. Mor´e. Quasi-Newton methods, motivation and theory. SIAM Review, 19(1):46-89, 1977.
[7] Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. LIBLINEAR: a library for large linear classification.Journal of Machine Learning Research, 9: 1871-1874, 2008. URLhttp://www.csie.ntu.edu.tw/ cjlin/papers/liblinear.pdf. · Zbl 1225.68175
[8] Reeves Fletcher and Colin M Reeves. Function minimization by conjugate gradients.The Computer Journal, 7(2):149-154, 1964. · Zbl 0132.11701
[9] Euhanna Ghadimi, Hamid Reza Feyzmahdavian, and Mikael Johansson. Global convergence of the heavy-ball method for convex optimization. Technical report, 2014. arXiv preprint arXiv:1412.7457.
[10] William W Hager and Hongchao Zhang. A survey of nonlinear conjugate gradient methods. Pacific Journal of Optimization, 2(1):35-58, 2006. · Zbl 1117.90048
[11] S. Sathiya Keerthi and Dennis DeCoste. A modified finite Newton method for fast solution of large scale linear SVMs.Journal of Machine Learning Research, 6:341-361, 2005. · Zbl 1222.68231
[12] James E. Kelley, Jr. The cutting-plane method for solving convex programs.Journal of the Society for Industrial and Applied Mathematics, 1960.
[13] Ching-Pei Lee, Po-Wei Wang, Weizhu Chen, and Chih-Jen Lin. Limited-memory commondirections method for distributed optimization and its application on empirical risk minimization. InProceedings of SIAM International Conference on Data Mining, 2017.
[14] Laurent Lessard, Benjamin Recht, and Andrew Packard. Analysis and design of optimization algorithms via integral quadratic constraints.SIAM Journal on Optimization, 26(1): 57-95, 2016. · Zbl 1329.90103
[15] Chih-Jen Lin and Jorge J. Mor´e. Newton’s method for large-scale bound constrained problems.SIAM Journal on Optimization, 9:1100-1127, 1999. · Zbl 0957.65064
[16] Chih-Jen Lin, Ruby C. Weng, and S. Sathiya Keerthi. Trust region Newton method for large-scale logistic regression.Journal of Machine Learning Research, 9:627-650, 2008. URLhttp://www.csie.ntu.edu.tw/ cjlin/papers/logistic.pdf. · Zbl 1225.68195
[17] Dong C. Liu and Jorge Nocedal. On the limited memory BFGS method for large scale optimization.Mathematical Programming, 45(1):503-528, 1989. · Zbl 0696.90048
[18] Olvi L. Mangasarian. A finite Newton method for classification.Optimization Methods and Software, 17(5):913-929, 2002. · Zbl 1065.90078
[19] Yurii E. Nesterov. A method of solving a convex programming problem with convergence rateO(1/k2).Soviet Mathematics Doklady, 27:372-376, 1983.
[20] Yurii E. Nesterov.Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, 2003. · Zbl 1086.90045
[21] Yurii E. Nesterov. Gradient methods for minimizing composite functions.Mathematical Programming, 140(1):125-161, 2013. · Zbl 1287.90067
[22] Jorge Nocedal and Stephen Wright.Numerical Optimization. Springer, second edition, 2006. · Zbl 1104.65059
[23] Boris T. Polyak.Introduction to Optimization. Translation Series in Mathematics and Engineering. 1987.
[24] Boris Teodorovich Polyak. Some methods of speeding up the convergence of iteration methods.USSR Computational Mathematics and Mathematical Physics, 4(5):1-17, 1964.
[25] Choon Hui Teo, S.V.N. Vishwanathan, Alex Smola, and Quoc V. Le. Bundle methods for regularized risk minimization.Journal of Machine Learning Research, 11:311-365, 2010. · Zbl 1242.68253
[26] Vladimir Vapnik.The Nature of Statistical Learning Theory. Springer-Verlag, New York, NY, 1995. · Zbl 0833.62008
[27] Hsiang-Fu Yu, Cho-Jui Hsieh, Kai-Wei Chang, and Chih-Jen Lin.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.