zbMATH — the first resource for mathematics

A heuristic adaptive fast gradient method in stochastic optimization problems. (English. Russian original) Zbl 1451.90107
Comput. Math. Math. Phys. 60, No. 7, 1108-1115 (2020); translation from Zh. Vychisl. Mat. Mat. Fiz. 60, No. 7, 1143-1150 (2020).
Summary: A fast adaptive heuristic stochastic gradient descent method is proposed. It is shown that this algorithm has a higher convergence rate in practical problems than currently popular optimization methods. Furthermore, a justification of this method is given, and difficulties that prevent obtaining optimal estimates for the proposed algorithm are described.
90C15 Stochastic programming
90C52 Methods of reduced gradient type
AdaGrad; Adam; CIFAR
Full Text: DOI
[1] Nesterov, Yu. E., Introduction to Convex Optimization (2010), Moscow: Mosk. Tsentr Nepreryvnogo Matematicheskogo Obrazovaniya, Moscow
[2] Goodfellow, I.; Bengio, Y.; Courville, A., Deep Learning (2016), Cambridge, Mass.: MIT Press, Cambridge, Mass. · Zbl 1373.68009
[3] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems (2012), pp. 1097-1105.
[4] Gasnikov, A. V.; Dvurechensky, P. E.; Usmanova, I. N., “On the nontriviality of fast randomized methods,” Trudy Mosk. Fiz.-, Teknh. Inst., 8, 67-100 (2016)
[5] A. V. Gasnikov, Modern Numerical Optimization Methods: The Universal Gradient Descent Method (Mosc. Fiziko-tekhnicheskii Institut, 2018). arXiv:1711.00394
[6] Bach, F.; Levy, K. Y., A universal algorithm for variational inequalities adaptive to smoothness and noise (2019)
[7] Vaswani, S.; Mishkin, A.; Laradji, I.; Schmidt, M.; Gidel, G.; Lacoste-Julien, S., Painless stochastic gradient: Interpolation, line-search, and convergence rates (2019)
[8] Nocedal, J.; Wright, S., Numerical Optimization (2006) · Zbl 1104.65059
[9] Ward, R.; Wu, X.; Bottou, L., AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization (2019)
[10] Duchi, J.; Hazan, E.; Singer, Y., Adaptive subgradient methods for online learning and stochastic optimization, J. Mach. Learning Res., 12, 2121-2159 (2011) · Zbl 1280.68164
[11] Q. Deng, Y. Cheng, and G. Lan, “Optimal adaptive and accelerated stochastic gradient descent,” 2018. arXiv:1810.00553.
[12] Levy, K. Y.; Yurtsever, A.; Cevher, V., Online adaptive methods, universality and acceleration (2018)
[13] Iusem, A. N.; Jofre, A.; Oliveira, R. I.; Thompson, P., Variance-based extragradient methods with line search for stochastic variational inequalities, SIAM J. Optim., 29, 175-206 (2019) · Zbl 1415.65145
[14] Boucheron, S.; Lugosi, G.; Massart, P., Concentration Inequalities: A Nonasymptotic Theory Of Independence (2013) · Zbl 1279.60005
[15] Panchenko, D., Symmetrization approach to concentration inequalities for empirical processes, Annals Probab., 31, 2068-2081 (2003) · Zbl 1042.60008
[16] Kingma, D. P.; Ba, J., Adam: A method for stochastic optimization (2015)
[17] M. D. Gupta and T. Huang, “Bregman distance to l1 regularized logistic regression,” in 19th International Conference on Pattern Recognition, 2008, pp. 1-4.
[18] O. Devolder, F. Glineur, and Yu. Nesterov, “First-order methods with inexact oracle: The strongly convex case,” CORE Discussion Paper 2013/16. 2013. https://www.uclouvain.be/cps/ucl/doc/core/documents/coredp2013_16web.pdf · Zbl 1317.90196
[19] Devolder, O.; Glineur, F.; Nesterov, Yu., First-order methods of smooth convex optimization with inexact oracle, Math. Program., 146, 37-75 (2014) · Zbl 1317.90196
[20] O. Devolder, “Exactness, inexactness and stochasticity in first-order methods for largescale convex optimization,” PhD thesis, CORE UCL, 2013.
[21] Li, M.; Zhang, T.; Chen, Y.; Smola, A. J., Efficient mini-batch training for stochastic optimization (2014)
[22] A. Juditsky and A. Nemirovski, “Large deviations of vector-valued martingales in 2-smooth normed spaces,” 2008. arXiv:0809.0813
[23] Gasnikov, A. V.; Tyurin, A. I., Fast gradient descent for convex minimization problems with an oracle producing a (δ; L)-model of function at the requested point, Comput. Math. Math. Phys., 59, 1085-1097 (2019) · Zbl 07139606
[24] Robert, C.; Casella, G., Monte Carlo Statistical Methods (2013)
[25] Lan, G.; Nemirovski, A.; Shapiro, A., Validation analysis of mirror descent stochastic approximation method, Math. Program., 134, 425-458 (2012) · Zbl 1273.90154
[26] Griewank, A., “On automatic differentiation,” Math, Program: Recent Developments Appl., 6, 83-107 (1989)
[27] LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P., Gradient-based learning applied to document recognition, Proc. IEEE, 86, 2278-2324 (1998)
[28] A. Krizhevsky, Learning Multiple Layers of Features from Tiny Images, PhD thesis, University of Toronto, 2009.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.