×

A stochastic extra-step quasi-Newton method for nonsmooth nonconvex optimization. (English) Zbl 1494.90061

Summary: In this paper, a novel stochastic extra-step quasi-Newton method is developed to solve a class of nonsmooth nonconvex composite optimization problems. We assume that the gradient of the smooth part of the objective function can only be approximated by stochastic oracles. The proposed method combines general stochastic higher order steps derived from an underlying proximal type fixed-point equation with additional stochastic proximal gradient steps to guarantee convergence. Based on suitable bounds on the step sizes, we establish global convergence to stationary points in expectation and an extension of the approach using variance reduction techniques is discussed. Motivated by large-scale and big data applications, we investigate a stochastic coordinate-type quasi-Newton scheme that allows to generate cheap and tractable stochastic higher order directions. Finally, numerical results on large-scale logistic regression and deep learning problems show that our proposed algorithm compares favorably with other state-of-the-art methods.

MSC:

90C15 Stochastic programming
90C26 Nonconvex programming, global optimization
90C53 Methods of quasi-Newton type
90C06 Large-scale problems in mathematical programming
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Agarwal, N.; Bullins, B.; Hazan, E., Second-order stochastic optimization for machine learning in linear time, J. Mach. Learn. Res., 18, 116, 1-40 (2017) · Zbl 1441.90115
[2] Akiba, T., Suzuki, S., Fukuda, K.: Extremely large minibatch SGD: training Resnet-50 on ImageNet in 15 minutes (2017). http://arxiv.org/abs/1711.04325
[3] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200-1205 (2017) · Zbl 1369.68273
[4] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: Proceedings of the 33rd International Conference on Machine Learning, 699-707 (2016)
[5] Andrew, G., Gao, J.: Scalable training of \(\ell_1\)-regularized log-linear models. In: Proceedings of the 24th International Conference on Machine Learning, 33-40 (2007)
[6] Bach, F.; Jenatton, R.; Mairal, J.; Obozinski, G., Optimization with sparsity-inducing penalties, Found. Trends Mach. Learn., 4, 1, 1-106 (2011) · Zbl 06064248
[7] Bauschke, HH; Combettes, PL, Convex analysis and monotone operator theory in Hilbert spaces. CMS books in mathematics/Ouvrages de Mathématiques de la SMC (2011), New York: Springer, New York · Zbl 1218.47001
[8] Berahas, AS; Bollapragada, R.; Nocedal, J., An investigation of Newton-sketch and subsampled Newton methods, Optim. Methods Softw., 35, 4, 661-680 (2020) · Zbl 1454.90112 · doi:10.1080/10556788.2020.1725751
[9] Berahas, A.S., Nocedal, J., Takác, M.: A multi-batch L-BFGS method for machine learning. In: Advances in Neural Information Processing Systems, pp. 1063-1071 (2016)
[10] Bishop, CM, Pattern recognition and machine learning. information science and statistics. (2006), New York: Springer, New York · Zbl 1107.68072
[11] Bollapragada, R.; Byrd, R.; Nocedal, J., Exact and inexact subsampled Newton methods for optimization, IMA J. Numer. Anal., 39, 1-34 (2018) · Zbl 1462.65077
[12] Botev, A., Ritter, H., Barber, D.: Practical Gauss-Newton optimization for deep learning. In: Proceedings of the 34th International Conference on Machine Learning, 557-565 (2017)
[13] Bottou, L.; Curtis, FE; Nocedal, J., Optimization methods for large-scale machine learning, SIAM Rev., 60, 2, 223-311 (2018) · Zbl 1397.65085
[14] Byrd, RH; Chin, GM; Neveitt, W.; Nocedal, J., On the use of stochastic Hessian information in optimization methods for machine learning, SIAM J. Optim., 21, 3, 977-995 (2011) · Zbl 1245.65062
[15] Byrd, RH; Hansen, SL; Nocedal, J.; Singer, Y., A stochastic quasi-Newton method for large-scale optimization, SIAM J. Optim., 26, 2, 1008-1031 (2016) · Zbl 1382.65166
[16] Candès, EJ; Recht, B., Exact matrix completion via convex optimization, Found. Comput. Math., 9, 6, 717-772 (2009) · Zbl 1219.90124
[17] Censor, Y.; Gibali, A.; Reich, S., The subgradient extragradient method for solving variational inequalities in hilbert space, J. Optim. Theor. Appl., 148, 2, 318-335 (2011) · Zbl 1229.58018
[18] Chandrasekaran, V., Sanghavi, S., Parrilo, P.A., Willsky, A.S.: Sparse and low-rank matrix decompositions. In: 27th Annual Allerton Conference on Communication, Control and Computing, 42: 1493-1498 (2009)
[19] Chang, CC; Lin, CJ, LIBSVM: a library for support vector machines, ACM. Trans. Intell. Syst. Technol., 2, 3, 27 (2011)
[20] Chen, X.; Qi, L., A parameterized Newton method and a quasi-Newton method for nonsmooth equations, Comput. Optim. Appl., 3, 2, 157-179 (1994) · Zbl 0821.65029
[21] Combettes, PL; Pesquet, JC, Proximal splitting methods in signal processing. in: fixed-point algorithms for inverse problems in science and engineering (2011), New York: Springer, New York · Zbl 1242.90160
[22] Combettes, PL; Wajs, VR, Signal recovery by proximal forward-backward splitting, Multiscale Model. Simul., 4, 4, 1168-1200 (2005) · Zbl 1179.94031
[23] Conn, AR; Gould, NIM; Toint, PL, Trust-region methods. MPS/SIAM series on optimization. SIAM (2000), Philadelphia: MPS, Philadelphia · Zbl 0958.65071
[24] Davis, D., Drusvyatskiy, D.: Stochastic subgradient method converges at the rate \(O(k^{-1/4})\) on weakly convex functions (2018). http://arxiv.org/abs/1802.02988
[25] Davis, D.; Drusvyatskiy, D., Stochastic model-based minimization of weakly convex functions, SIAM J. Optim., 29, 1, 207-239 (2019) · Zbl 1415.65136 · doi:10.1137/18M1178244
[26] Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, pp. 1646-1654 (2014)
[27] Deng, L.; Yu, D., Deep learning: methods and applications, Found. Trends Signal Process., 7, 197-387 (2014) · Zbl 1315.68208
[28] Dong, Y., An extension of Luque’s growth condition, Appl. Math. Lett., 22, 9, 1390-1393 (2009) · Zbl 1173.47315
[29] Drusvyatskiy, D.; Lewis, AS, Error bounds, quadratic growth, and linear convergence of proximal methods, Math. Oper. Res., 43, 3, 919-948 (2018) · Zbl 1440.90046
[30] Duchi, J.; Hazan, E.; Singer, Y., Adaptive subgradient methods for online learning and stochastic optimization, J. Mach. Learn. Res., 12, 2121-2159 (2011) · Zbl 1280.68164
[31] Durrett, R., Probability: theory and examples (2019), Cambridge: Cambridge University Press, Cambridge · Zbl 1440.60001
[32] Erdogdu, M.A., Montanari, A.: Convergence rates of sub-sampled Newton methods. In: Advances in Neural Information Processing Systems, pp. 28 (2015)
[33] Fan, RE; Chang, KW; Hsieh, CJ; Wang, XR; Lin, CJ, LIBLINEAR: A library for large linear classification, J. Mach. Learn. Res., 9, 1871-1874 (2008) · Zbl 1225.68175
[34] Fang, C., Li, C.J., Lin, Z., Zhang, T.: SPIDER: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 687-697 (2018)
[35] Ghadimi, S.; Lan, G., Stochastic first- and zeroth-order methods for nonconvex stochastic programming, SIAM J. Optim., 23, 4, 2341-2368 (2013) · Zbl 1295.90026
[36] Ghadimi, S.; Lan, G., Accelerated gradient methods for nonconvex nonlinear and stochastic programming, Math. Program., 156, 1-2, 59-99 (2016) · Zbl 1335.62121
[37] Ghadimi, S.; Lan, G.; Zhang, H., Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization, Math. Program., 155, 1-2, 267-305 (2016) · Zbl 1332.90196
[38] Gower, R., Goldfarb, D., Richtárik, P.: Stochastic block BFGS: Squeezing more curvature out of data. In: Proceedings of the 33rd International Conference on Machine Learning, 1869-1878 (2016)
[39] Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: Training ImageNet in 1 hour (2017). http://arxiv.org/abs/1706.02677
[40] Grosse, R., Martens, J.: A Kronecker-factored approximate Fisher matrix for convolution layers. In: Proceedings of the 33rd International Conference on Machine Learning, 573-582 (2016)
[41] Hastie, T.; Tibshirani, R.; Friedman, J., Data mining, inference, and prediction. the elements of statistical learning Springer series in statistics (2001), New York: Springer-Verlag, New York · Zbl 0973.62007
[42] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778 (2016)
[43] Hsieh, CJ; Sustik, MA; Dhillon, IS; Ravikumar, P., QUIC: quadratic approximation for sparse inverse covariance estimation, J. Mach. Learn. Res., 15, 1, 2911-2947 (2014) · Zbl 1319.65048
[44] Iusem, AN; Jofré, A.; Oliveira, RI; Thompson, P., Extragradient method with variance reduction for stochastic variational inequalities, SIAM J. Optim., 27, 2, 686-724 (2017) · Zbl 1365.65179
[45] Janka, D.; Kirches, C.; Sager, S.; Wächter, A., An SR1/BFGS SQP algorithm for nonconvex nonlinear programs with block-diagonal Hessian matrix, Math. Program. Comput., 8, 4, 435-459 (2016) · Zbl 1391.90575
[46] Johnson, R.; Zhang, T., Accelerating stochastic gradient descent using predictive variance reduction, Adv. in Neural Inf. Process. Syst., 26, 315-323 (2013)
[47] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. http://arxiv.org/abs/1412.6980 (2014)
[48] Kohler, J.M., Lucchi, A.: Sub-sampled cubic regularization for non-convex optimization. In: Proceedings of the 34th International Conference on Machine Learning, 70:. 1895-1904 (2017)
[49] Konečnỳ, J.; Liu, J.; Richtárik, P.; Takáč, M., Mini-batch semi-stochastic gradient descent in the proximal setting, IEEE J. Sel. Topics in Signal Process., 10, 2, 242-255 (2016)
[50] Korpelevich, G., The extragradient method for finding saddle points and other problems, Matecon, 12, 747-756 (1976) · Zbl 0342.90044
[51] LeCun, Y.; Bengio, Y.; Hinton, G., Deep learning, Nature, 521, 7553, 436 (2015)
[52] Lee, JD; Sun, Y.; Saunders, MA, Proximal Newton-type methods for minimizing composite functions, SIAM J. Optim., 24, 3, 1420-1443 (2014) · Zbl 1306.65213
[53] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via SCSG methods. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 2345-2355 (2017)
[54] Lin, H., Mairal, J., Harchaoui, Z.: A universal catalyst for first-order optimization. In: Advances in Neural Information Processing Systems, pp. 3384-3392 (2015)
[55] Lin, T.; Ma, S.; Zhang, S., An extragradient-based alternating direction method for convex minimization, Found. Comput. Math., 17, 1, 35-59 (2017) · Zbl 1364.90259
[56] Liu, DC; Nocedal, J., On the limited memory BFGS method for large scale optimization, Math. Program., 45, 3, 503-528 (1989) · Zbl 0696.90048
[57] Liu, H.; So, AMC; Wu, W., Quadratic optimization with orthogonality constraint: explicit Łojasiewicz exponent and linear convergence of retraction-based line-search and stochastic variance-reduced gradient methods, Math. Program., 178, 215-262 (2018) · Zbl 1433.65111
[58] Liu, X., Hsieh, C.J.: Fast variance reduction method with stochastic batch size. In: Proceedings of the 35th International Conference on Machine Learning, 3185-3194 (2018)
[59] Luo, ZQ; Tseng, P., Error bounds and convergence analysis of feasible descent methods: a general approach, Ann. Oper. Res., 46, 1-4, 157-178 (1993) · Zbl 0793.90076
[60] LIBLINEAR: A library for large linear classification. http://www.csie.ntu.edu.tw/ cjlin/liblinear · Zbl 1225.68175
[61] Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online dictionary learning for sparse coding. In: Proceedings of the 26th International Conference on Machine Learning, 689-696 (2009) · Zbl 1242.62087
[62] Mannel, F., Rund, A.: A hybrid semismooth quasi-Newton method for structured nonsmooth operator equations in Banach spaces (2018). https://imsc.uni-graz.at/mannel/sqn1.pdf · Zbl 07499215
[63] Martens, J.: Deep learning via Hessian-free optimization. In: Proceedings of the 27th International Conference on Machine Learning, 27: 735-742 (2010)
[64] Martens, J., Grosse, R.: Optimizing neural networks with Kronecker-factored approximate curvature. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 2408-2417 (2015)
[65] Mason, L., Baxter, J., Bartlett, P., Frean, M.: Boosting algorithms as gradient descent in function space. In: Proceedings of the 12th International Conference on Neural Information Processing Systems, pp. 512-518 (1999)
[66] Milzarek, A.; Xiao, X.; Cen, S.; Wen, Z.; Ulbrich, M., A stochastic semismooth Newton method for nonsmooth nonconvex optimization, SIAM J. Optim., 29, 4, 2916-2948 (2019) · Zbl 1434.90108
[67] Mokhtari, A.; Eisen, M.; Ribeiro, A., IQN: An incremental quasi-Newton method with local superlinear convergence rate, SIAM J. Optim., 28, 2, 1670-1698 (2018) · Zbl 1401.90121
[68] Mokhtari, A.; Ribeiro, A., RES: regularized stochastic BFGS algorithm, IEEE Trans. Signal Process., 62, 23, 6089-6104 (2014) · Zbl 1394.94405
[69] Mokhtari, A.; Ribeiro, A., Global convergence of online limited memory BFGS, J. Mach. Learn. Res., 16, 3151-3181 (2015) · Zbl 1351.90124
[70] Monteiro, RD; Svaiter, BF, Complexity of variants of tseng’s modified fb splitting and korpelevich’s methods for hemivariational inequalities with applications to saddle-point and convex optimization problems, SIAM J. Optim., 21, 4, 1688-1720 (2011) · Zbl 1245.90155
[71] Moreau, JJ, Proximité et dualité dans un espace hilbertien, Bull. Soc. Math. FR., 93, 273-299 (1965) · Zbl 0136.12101
[72] Moritz, P., Nishihara, R., Jordan, M.: A linearly-convergent stochastic L-BFGS algorithm. In: Proceedings of the 19th Conference on Artificial Intelligence and Statistics, pp. 249-258 (2016)
[73] Mutný, M.: Stochastic second-order optimization via Neumann series (2016). http://arxiv.org/abs/1612.04694
[74] Nesterov, Y., Gradient methods for minimizing composite functions, Math. Program., 140, 1, 125-161 (2013) · Zbl 1287.90067
[75] Nguyen, L.M., van Dijk, M., Phan, D.T., Nguyen, P.H., Weng, T.W., Kalagnanam, J.R.: Finite-sum smooth optimization with SARAH (2019). http://arxiv.org/abs/1901.07648v2 · Zbl 1494.90087
[76] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: SARAH: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning, 2613-2621 (2017)
[77] Nguyen, TP; Pauwels, E.; Richard, E.; Suter, BW, Extragradient method in optimization: convergence and complexity, J. Optim. Theory Appl., 176, 1, 137-162 (2018) · Zbl 1386.49051
[78] Nocedal, J., Updating quasi-Newton matrices with limited storage, Math. Comp., 35, 151, 773-782 (1980) · Zbl 0464.65037
[79] Osawa, K., Tsuji, Y., Ueno, Y., Naruse, A., Yokota, R., Matsuoka, S.: Large-scale distributed second-order optimization using Kronecker-factored approximate curvature for deep convolutional neural networks (2018). http://arxiv.org/abs/1811.12019
[80] Pang, JS; Qi, L., Nonsmooth equations: motivation and algorithms, SIAM J. Optim., 3, 3, 443-465 (1993) · Zbl 0784.90082
[81] Parikh, N.; Boyd, S., Proximal algorithms, Found. Trends Optim., 1, 3, 127-239 (2014)
[82] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in Pytorch. In: Proceedings of the 31th International Conference on Neural Information Processing Systems (2017)
[83] Patrinos, P., Stella, L., Bemporad, A.: Forward-backward truncated Newton methods for convex composite optimization (2014). http://arxiv.org/abs/1402.6655
[84] Pham, NH; Nguyen, LM; Phan, DT; Tran-Dinh, Q., ProxSARAH: an efficient algorithmic framework for stochastic composite nonconvex optimization, J. Mach. Learn. Res., 21, 1-48 (2020) · Zbl 1508.90041
[85] Pilanci, M.; Wainwright, MJ, Newton sketch: a near linear-time optimization algorithm with linear-quadratic convergence, SIAM J. Optim., 27, 1, 205-245 (2017) · Zbl 1456.90125
[86] Poon, C., Liang, J., Schoenlieb, C.: Local convergence properties of SAGA/Prox-SVRG and acceleration. In: Proceedings of the 35th International Conference on Machine Learning, 80: 4124-4132 (2018)
[87] Qi, L., Convergence analysis of some algorithms for solving nonsmooth equations, Math. Oper. Res., 18, 1, 227-244 (1993) · Zbl 0776.65037
[88] Qi, L., On superlinear convergence of quasi-Newton methods for nonsmooth equations, Oper. Res. Lett., 20, 5, 223-228 (1997) · Zbl 0893.90156
[89] Qi, L.; Sun, J., A nonsmooth version of Newton’s method, Math. Program., 58, 3, 353-367 (1993) · Zbl 0780.90090
[90] Reddi, S.J., Hefny, A., Sra, S., Póczos, B., Smola, A.J.: Stochastic variance reduction for nonconvex optimization. In: Proceedings of the 33th International Conference on Machine Learning, 314-323 (2016)
[91] Reddi, S.J., Sra, S., Póczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. In: Advances in Neural Information Processing Systems, 1145-1153 (2016)
[92] Robbins, H.; Monro, S., A stochastic approximation method, Ann. Math. Stat., 22, 400-407 (1951) · Zbl 0054.05901
[93] Robbins, H., Siegmund, D.: A convergence theorem for non negative almost supermartingales and some applications. In: Optimizing Methods in Statistics, pp. 233-257. Academic Press (1971) · Zbl 0286.60025
[94] Rodomanov, A., Kropotov, D.: A superlinearly-convergent proximal Newton-type method for the optimization of finite sums. In: Proceeding of the 33rd International Conference on Machine Learning, 2597-2605 (2016)
[95] Roosta-Khorasani, F.; Mahoney, MW, Sub-sampled Newton methods, Math. Program., 76, 1-34 (2018) · Zbl 1412.49059
[96] Schmidhuber, J., Deep learning in neural networks: an overview, Neural Netw., 61, 85-117 (2015)
[97] Schmidt, M.; Le Roux, N.; Bach, F., Minimizing finite sums with the stochastic average gradient, Math. Program., 162, 1-2, 83-112 (2017) · Zbl 1358.90073
[98] Schraudolph, N.N., Yu, J., Günter, S.: A stochastic quasi-Newton method for online convex optimization. In: Proceedings of the 11th International Conference on Artificial Intelligence and Statistics, 436-443 (2007)
[99] Shalev-Shwartz, S.; Ben-David, S., Understanding machine learning: from theory to algorithms (2014), Cambridge: Cambridge University Press, Cambridge · Zbl 1305.68005
[100] Shalev-Shwartz, S.; Tewari, A., Stochastic methods for \(\ell_1\)-regularized loss minimization, J. Mach. Learn. Res., 12, 1865-1892 (2011) · Zbl 1280.62081
[101] Shalev-Shwartz, S.; Zhang, T., Stochastic dual coordinate ascent methods for regularized loss minimization, J. Mach. Learn. Res., 14, 567-599 (2013) · Zbl 1307.68073
[102] Shalev-Shwartz, S.; Zhang, T., Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization, Math. Program., 155, 1-2, 105-145 (2016) · Zbl 1342.90103
[103] Shi, J.; Yin, W.; Osher, S.; Sajda, P., A fast hybrid algorithm for large-scale \(\ell_1\)-regularized logistic regression, J. Mach. Learn. Res., 11, 713-741 (2010) · Zbl 1242.62078
[104] Shi, Z., Liu, R.: Large scale optimization with proximal stochastic Newton-type gradient descent In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 691-704. Springer, Cham (2015)
[105] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. http://arxiv.org/abs/1409.1556 (2014)
[106] Stella, L.; Themelis, A.; Patrinos, P., Forward-backward quasi-Newton methods for nonsmooth optimization problems, Comput. Optim. Appl., 67, 3, 443-487 (2017) · Zbl 1401.90226
[107] Sun, D.; Han, J., Newton and quasi-Newton methods for a class of nonsmooth equations and related problems, SIAM J. Optim., 7, 2, 463-480 (1997) · Zbl 0872.90087
[108] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on Macine Learning, 1139-1147 (2013)
[109] Themelis, A.; Stella, L.; Patrinos, P., Forward-backward envelope for the sum of two nonconvex functions: further properties and nonmonotone linesearch algorithms, SIAM J. Optim., 28, 3, 2274-2303 (2018) · Zbl 1404.90106
[110] Vapnik, V., The nature of statistical learning theory (2013), New York: Springer Science and Business Media, New York · Zbl 0833.62008
[111] Wang, J.; Zhang, T., Utilizing second order information in minibatch stochastic variance reduced proximal iterations, J. Mach. Learn. Res., 20, 42, 1-56 (2019) · Zbl 1487.90529
[112] Wang, X.; Ma, C.; Li, M., A globally and superlinearly convergent quasi-Newton method for general box constrained variational inequalities without smoothing approximation, J. Global Optim., 50, 4, 675-694 (2011) · Zbl 1254.90261
[113] Wang, X.; Ma, S.; Goldfarb, D.; Liu, W., Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization, SIAM J. Optim., 27, 2, 927-956 (2017) · Zbl 1365.90182
[114] Wang, X.; Yuan, YX, Stochastic proximal quasi-Newton methods for non-convex composite optimization, Optim. Methods Softw., 34, 922-948 (2019) · Zbl 07111868
[115] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: faster stochastic variance reduction algorithms. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems (2019)
[116] Wen, Z.; Yin, W.; Goldfarb, D.; Zhang, Y., A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization, and continuation, SIAM J. Sci. Comput., 32, 4, 1832-1857 (2010) · Zbl 1215.49039
[117] Xiao, L.; Zhang, T., A proximal stochastic gradient method with progressive variance reduction, SIAM J. Optim., 24, 4, 2057-2075 (2014) · Zbl 1321.65016
[118] Xiao, X.; Li, Y.; Wen, Z.; Zhang, L., A regularized semi-smooth Newton method with projection steps for composite convex programs, J. Sci. Comput., 76, 1, 364-389 (2018) · Zbl 1394.90534
[119] Xu, P.; Roosta, F.; Mahoney, MW, Newton-type methods for non-convex optimization under inexact Hessian information, Math. Program., 184, 35-70 (2019) · Zbl 1451.90134
[120] Xu, P., Yang, J., Roosta-Khorasani, F., Ré, C., Mahoney, M.W.: Sub-sampled Newton methods with non-uniform sampling. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 3008-3016 (2016)
[121] Xu, Y.; Yin, W., Block stochastic gradient iteration for convex and nonconvex optimization, SIAM J. Optim., 25, 3, 1686-1716 (2015) · Zbl 1342.93125
[122] Ye, H., Luo, L., Zhang, Z.: Approximate Newton methods and their local convergence. In: Proceedings of the 34th International Conference on Machine Learning, 70: 3931-3939 (2017)
[123] You, Y., Zhang, Z., Hsieh, C.J., Demmel, J., Keutzer, K.: ImageNet training in minutes. In: Proceedings of the 47th International Conference on Parallel Processing, 1-10 (2018)
[124] Yuan, GX; Ho, CH; Lin, CJ, An improved GLMNET for \(\ell_1\)-regularized logistic regression, J. Mach. Learn. Res., 13, 1999-2030 (2012) · Zbl 1432.68404
[125] Zhang, H., Reddi, S.J., Sra, S.: Riemannian SVRG: fast stochastic optimization on Riemannian manifolds. In: Advances in Neural Information Processing Systems, 4592-4600 (2016)
[126] Zhao, R.; Haskell, WB; Tan, VY, Stochastic L-BFGS: improved convergence rates and practical acceleration strategies, IEEE Trans. Signal Process, 66, 1155-1169 (2017) · Zbl 1414.94746
[127] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. J. Mach. Learn. Res. 21, 1-63 (2018) · Zbl 1508.90074
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.