×

Stochastic proximal splitting algorithm for composite minimization. (English) Zbl 1475.90106

Summary: Supported by the recent contributions in multiple domains, the first-order splitting became algorithms of choice for structured nonsmooth optimization. The large-scale noisy contexts make available stochastic information on the objective function and thus, the extension of proximal gradient schemes to stochastic oracles is heavily based on the tractability of the proximal operator corresponding to nonsmooth component, which has been highly exploited in the literature. However, some questions remained about the complexity of the composite models with proximal untractable terms. In this paper we tackle composite optimization problems, assuming only the access to stochastic information on both smooth and nonsmooth components, with a stochastic proximal first-order scheme with stochastic proximal updates. We provide sublinear \(\mathcal{O}(\frac{1}{k})\) convergence rates (in expectation of squared distance to the optimal set) under the strong convexity assumption on the objective function. Also, linear convergence is achieved for convex feasibility problems. The empirical behavior is illustrated by numerical tests on parametric sparse representation models.

MSC:

90C30 Nonlinear programming
90C15 Stochastic programming

Software:

Pegasos
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Asi, H.; Duchi, JC, Stochastic (approximate) proximal point methods: convergence, optimality, and adaptivity, SIAM J. Optim., 29, 3, 2257-2290 (2019) · Zbl 07105236 · doi:10.1137/18M1230323
[2] Bauschke, H.; Deutsch, F.; Hundal, H.; Park, S-H, Accelerating the convergence of the method of alternating projections, Trans. Am. Math. Soc., 355, 9, 3433-3461 (2003) · Zbl 1033.41019 · doi:10.1090/S0002-9947-03-03136-2
[3] Bauschke, HH; Borwein, JM; Li, Wu, Strong conical hull intersection property, bounded linear regularity, jameson’s property (g), and error bounds in convex optimization, Math. Program., 86, 1, 135-160 (1999) · Zbl 0998.90088 · doi:10.1007/s101070050083
[4] Beck, A.; Teboulle, M., A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., 2, 1, 183-202 (2009) · Zbl 1175.94009 · doi:10.1137/080716542
[5] Bianchi, P., Ergodic convergence of a stochastic proximal point algorithm, SIAM J. Optim., 26, 4, 2235-2260 (2016) · Zbl 1355.90062 · doi:10.1137/15M1017909
[6] Davis, D.; Drusvyatskiy, D., Stochastic model-based minimization of weakly convex functions, SIAM J. Optim., 29, 1, 207-239 (2019) · Zbl 1415.65136 · doi:10.1137/18M1178244
[7] Elad, M., Sparse and redundant representations: from theory to applications in signal and image processing (2010), New York: Springer, New York · Zbl 1211.94001 · doi:10.1007/978-1-4419-7011-4
[8] Hallac, D., Leskovec, J., Boyd, S.: Network lasso: clustering and optimization in large graphs. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 387-396 (2015)
[9] Koshal, J.; Nedic, A.; Shanbhag, UV, Regularized iterative stochastic approximation methods for stochastic variational inequality problems, IEEE Trans. Autom. Control, 58, 3, 594-609 (2012) · Zbl 1369.49012 · doi:10.1109/TAC.2012.2215413
[10] Moulines, E., Bach, F.R.: Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In: Advances in Neural Information Processing Systems, pp. 451-459 (2011)
[11] Nedić, A.: Random projection algorithms for convex set intersection problems. In: 49th IEEE Conference on Decision and Control (CDC), pp. 7655-7660. IEEE (2010)
[12] Nedić, A., Random algorithms for convex minimization problems, Math. Program., 129, 2, 225-253 (2011) · Zbl 1229.90128 · doi:10.1007/s10107-011-0468-9
[13] Nemirovski, A.; Juditsky, A.; Lan, G.; Shapiro, A., Robust stochastic approximation approach to stochastic programming, SIAM J. Optim., 19, 4, 1574-1609 (2009) · Zbl 1189.90109 · doi:10.1137/070704277
[14] Nesterov, Y., Gradient methods for minimizing composite functions, Math. Program., 140, 1, 125-161 (2013) · Zbl 1287.90067 · doi:10.1007/s10107-012-0629-5
[15] Nesterov, Y., Introductory Lectures on Convex Optimization: A Basic Course (2013), US: Springer, US · Zbl 1086.90045
[16] Nguyen, L.M., Nguyen, P.H., van Dijk, M., Richtárik, P., Scheinberg, K., Takáč, M.: Sgd and hogwild! convergence without the bounded gradients assumption. arXiv preprint arXiv:1802.03801 (2018)
[17] Pătraşcu, A.: New nonasymptotic convergence rates of stochastic proximal point algorithm for stochastic convex optimization. Optimization, 1-29 (2020)
[18] Patrascu, A.; Necoara, I., Nonasymptotic convergence of stochastic proximal point methods for constrained convex optimization, J. Mach. Learn. Res., 18, 1, 7204-7245 (2017) · Zbl 1475.90048
[19] Rockafellar, RT; Wets, RJ-B, Variational Analysis (2009), Berlin: Springer, Berlin · Zbl 0888.49001
[20] Rockafellar, RT, Convex Analysis (1988), Princeton, New Jersey: Princeton University Press, Princeton, New Jersey
[21] Rockafellar, RT; Wets, RJ-B, On the interchange of subdifferentiation and conditional expectation for convex functionals, Stochastics, 7, 1, 173-182 (1982) · Zbl 0487.49006 · doi:10.1080/17442508208833217
[22] Rosasco, L.; Villa, S.; Vũ, BC, Convergence of stochastic proximal gradient algorithm, Appl. Math. Optim., 82, 1-27 (2019) · Zbl 1465.90101
[23] Ryu, E.K., Boyd, S.: Stochastic proximal iteration: a non-asymptotic improvement upon stochastic gradient descent. Author website, early draft (2016)
[24] Salim, A.; Bianchi, P.; Hachem, W., Snake: a stochastic proximal gradient algorithm for regularized problems over large graphs, IEEE Trans. Autom. Control, 64, 5, 1832-1847 (2019) · Zbl 1482.90156 · doi:10.1109/TAC.2019.2890888
[25] Shalev-Shwartz, S.; Singer, Y.; Srebro, N.; Cotter, A., Pegasos: primal estimated sub-gradient solver for svm, Math. Program., 127, 1, 3-30 (2011) · Zbl 1211.90239 · doi:10.1007/s10107-010-0420-4
[26] Shi, W.; Ling, Q.; Gang, W.; Yin, W., A proximal gradient algorithm for decentralized composite optimization, IEEE Trans. Signal Process., 63, 22, 6013-6023 (2015) · Zbl 1394.94531 · doi:10.1109/TSP.2015.2461520
[27] Stoican, F.; Irofti, P., Aiding dictionary learning through multi-parametric sparse representation, Algorithms, 12, 7, 131 (2019) · doi:10.3390/a12070131
[28] Toulis, P., Tran, D., Airoldi, E.: Towards stability and optimality in stochastic gradient descent. In: Artificial Intelligence and Statistics, pp. 1290-1298 (2016)
[29] Varma, R., Lee, H., Kovacevic, J., Chi, Y.: Vector-valued graph trend filtering with non-convex penalties. IEEE Trans. Signal Inf. Process. Over Netw. 6, 48-62 (2019)
[30] Wang, M.; Bertsekas, DP, Stochastic first-order methods with random constraint projection, SIAM J. Optim., 26, 1, 681-717 (2016) · Zbl 1333.90098 · doi:10.1137/130931278
[31] Wang, X.; Wang, S.; Zhang, H., Inexact proximal stochastic gradient method for convex composite optimization, Comput. Optim. Appl., 68, 3, 579-618 (2017) · Zbl 1390.90432 · doi:10.1007/s10589-017-9932-7
[32] Yankelevsky, Y.; Elad, M., Dual graph regularized dictionary learning, IEEE Trans. Signal Inf. Process. Over Netw., 2, 4, 611-624 (2016) · doi:10.1109/TSIPN.2016.2605763
[33] Zhong, W., Kwok, J.: Accelerated stochastic gradient method for composite regularization. In: Artificial Intelligence and Statistics, pp. 1086-1094 (2014)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.