Nonasymptotic convergence of stochastic proximal point methods for constrained convex optimization.

*(English)*Zbl 06982954Summary: A popular approach for solving stochastic optimization problems is the stochastic gradient descent (SGD) method. Although the SGD iteration is computationally cheap and its practical performance may be satisfactory under certain circumstances, there is recent evidence of its convergence difficulties and instability for unappropriate choice of parameters. To avoid some of the drawbacks of SGD, stochastic proximal point (SPP) algorithms have been recently considered. We introduce a new variant of the SPP method for solving stochastic convex problems subject to (in)finite intersection of constraints satisfying a linear regularity condition. For the newly introduced SPP scheme we prove new nonasymptotic convergence results. In particular, for convex Lipschitz continuous objective functions, we prove nonasymptotic convergence rates in terms of the expected value function gap of order \(\mathcal{O}\left(\frac{1}{k^{1/2}}\right)\), where \(k\) is the iteration counter. We also derive better nonasymptotic
convergence rates in terms of expected quadratic distance from the iterates to the optimal solution for smooth strongly convex objective functions, which in the best case is of order \(\mathcal{O}\left(\frac{1}{k}\right)\). Since these convergence rates can be attained by our SPP algorithm only under some natural restrictions on the stepsize, we also introduce a restarting variant of SPP that overcomes these difficulties and derive the corresponding nonasymptotic convergence rates. Numerical evidence supports the effectiveness of our methods in real problems.

##### MSC:

68T05 | Learning and adaptive systems in artificial intelligence |

##### Keywords:

stochastic convex optimization; intersection of convex constraints; stochastic proximal point; nonasymptotic convergence analysis; rates of convergence
PDF
BibTeX
XML
Cite

\textit{A. Patrascu} and \textit{I. Necoara}, J. Mach. Learn. Res. 18, Paper No. 198, 42 p. (2018; Zbl 06982954)

Full Text:
Link

##### References:

[1] | Y.F. Atchade, G. Fort and E. Moulines, On perturbed proximal gradient algorithms, Journal of Machine Learning Research, 18(1):310–342, 2014. · Zbl 1433.90199 |

[2] | F. Bach, Self-concordant analysis for logistic regression, Electronic Journal of Statistics, 4:384–414, 2010. · Zbl 1329.62324 |

[3] | F. Bach, G. Lanckriet and M. Jordan, Multiple kernel learning, conic duality, and the SMO algorithm, International Conference on Machine Learning (ICML), 2004. |

[4] | D.P. Bertsekas, Incremental proximal methods for large scale convex optimization, Mathematical Programming, 129(2):163–195, 2011. · Zbl 1229.90121 |

[5] | P. Bianchi, Ergodic convergence of a stochastic proximal point algorithm, SIAM Journal on Optimization, 26(4):2235–2260, 2016. 39 · Zbl 1355.90062 |

[6] | D. Blatt and A.O. Hero, III: Energy based sensor network source localization via projection onto convex sets (POCS), IEEE Transactions on Signal Processing, 54(9):3614–3619, 2006. · Zbl 1375.94045 |

[7] | L. Bottou, F.E. Curtis and J. Nocedal, Optimization methods for large-scale machine learning, arXiv:1606.04838, 2016. · Zbl 1397.65085 |

[8] | J. Brodie, I. Daubechies, C. de Mol, D. Giannone and I. Loris, Sparse and stable Markowitz portfolios, Proc. Natl. Acad. Sci, 106:12267–12272, 2009. · Zbl 1203.91271 |

[9] | P.S. Bullen, Handbook of means and their inequalities, Kluwer Academic Publisher, Dordercht, 2003. · Zbl 1035.26024 |

[10] | Y. Censor, W. Chen, P. L. Combettes, R. Davidi and G. T. Herman, On the effectiveness of projection methods for convex feasibility problems with linear inequality constraints, Computational Optimization and Applications, 51(3):1065–1088, 2012. · Zbl 1244.90155 |

[11] | P.L. Combettes and J.C. Pesquet, Stochastic approximations and perturbations in forwardbackward splitting for monotone operators, Pure and Applied Functional Analysis, 1(1):13–37, 2016. · Zbl 1377.90109 |

[12] | A. Defazio, F. Bach and S. Lacoste-Julien, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, Advances in Neural Information Processing Systems (NIPS), 1646–1654, 2014. |

[13] | J. Duchi and Y. Singer, Efficient online and batch learning using forward backward splitting, Journal of Machine Learning Research, 10:2899–2934, 2009. · Zbl 1235.62151 |

[14] | O. Guler, On the convergence of the proximal point algorithm for convex minimization, SIAM Journal on Control and Optimization, 29(2):403–419, 1991. · Zbl 0737.90047 |

[15] | R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems (NIPS), 315–323, 2013. |

[16] | A. Karimi and C. Kammer, A data-driven approach to robust control of multivariable systems by convex optimization, Automatica, 85:227–233, 2017. · Zbl 1375.93041 |

[17] | J. Koshal, A. Nedic and U.V. Shanbhag, Regularized iterative stochastic approximation methods for stochastic variational inequality problems, IEEE Transactions on Automatic Control, 58(3):594–609, 2013. · Zbl 1369.49012 |

[18] | A. Mokhtari and A. Ribeiro, RES: Regularized stochastic BFGS algorithm, IEEE Transactions on Signal Processing, 62(23):6089–6104, 2014. · Zbl 1394.94405 |

[19] | E. Moulines and F.R. Bach, Non-asymptotic analysis of stochastic approximation algorithms for machine learning, Advances in Neural Information Processing Systems (NIPS), 451– 459, 2011. |

[20] | I. Necoara, Random algorithms for convex minimization over intersection of simple sets, submitted to European Control Conference (ECC18), 2017. 40 |

[21] | I. Necoara, V. Nedelcu and I. Dumitrache, Parallel and distributed optimization methods for estimation and control in networks, Journal of Process Control, 21(5):756–766, 2011. |

[22] | I. Necoara, Yu. Nesterov and F. Glineur, Linear convergence of first order methods for non-strongly convex optimization, Mathematical Programming, in press:135, 2017a. · Zbl 1370.90145 |

[23] | I. Necoara, P. Richtarik, and A. Patrascu, Randomized projection methods for convex feasibility problems, Technical Report, UPB:1–30, 2017b. |

[24] | A. Nedic, Random algorithms for convex minimization problems. Mathematical Programming, 129(2):225253, 2011. · Zbl 1229.90128 |

[25] | A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust stochastic approximation approach to stochastic programming, SIAM Journal on Optimization, 19(4):15741609, 2009. · Zbl 1189.90109 |

[26] | Yu. Nesterov,Introductory Lectures on Convex Optimization: A Basic Course, Kluwer Academic Publisher, Boston, 2004. · Zbl 1086.90045 |

[27] | F. Niu, B. Recht, C. Re, and S. J. Wright, HOGWILD!: A lock-free approach to parallelizing stochastic gradient descent, In Advances in Neural Information Processing Systems (NIPS), pages 693–701, 2011. |

[28] | J. C. Platt, Fast training of support vector machines using sequential minimal optimization, In Advances in Kernel Methods - Support Vector Learning, Cambridge, MA, 1998. |

[29] | B. T. Polyak and A. B. Juditsky, Acceleration of stochastic approximation by averaging, SIAM Journal on Control and Optimization, 30(4):838–855, 1992. · Zbl 0762.62022 |

[30] | R.T. Rockafellar and R.J.-B. Wets, Variational Analysis, Springer-Verlag, Berlin Heidelberg, 1998. · Zbl 0888.49001 |

[31] | L. Rosasco, S. Villa, and B. C. Vu, Convergence of stochastic proximal gradient algorithm, arXiv:1403.5074, 2014. |

[32] | L. Rosasco, S. Villa, and B. C. Vu,A first-order stochastic primal-dual algorithm with correction step, Numerical Functional Analysis and Optimization, 38(5):602–626, 2017. · Zbl 1369.47074 |

[33] | N. L. Roux, M. Schmidt, and F. Bach, A stochastic gradient method with an exponential convergence rate for finite training sets, In Advances in Neural Information Processing Systems (NIPS), 2672–2680, 2012. |

[34] | E. Ryu and S. Boyd, Stochastic proximal iteration: A non-asymptotic improvement upon stochastic gradient descent, www.math.ucla.edu/eryu/papers/spi.pdf, 2016. |

[35] | S. Shalev-Shwartz and T. Zhang, Stochastic dual coordinate ascent methods for regularized loss, Journal of Machine Learning Research, 14(1):567–599, 2013. · Zbl 1307.68073 |

[36] | S. Sonnenburg, G. Ratscha, C. Schafer, and B. Scholkopf,Large scale multiple kernel learning, Journal of Machine Learning Research, 7:15311565, 2006. 41 |

[37] | P. Toulis, D. Tran, and E. M. Airoldi, Towards stability and optimality in stochastic gradient descent, In International Conference on Artificial Intelligence and Statistics (AISTATS), 1290–1298, 2016. |

[38] | N. Denizcan Vanli, M. Gurbuzbalaban, and A. Ozdaglar, Global convergence rate of proximal incremental aggregated gradient methods, arXiv:1608.01713, 2016. · Zbl 1390.90524 |

[39] | W. Xu,Towards optimal one pass large scale learning with averaged stochastic gradient descent, CoRR, abs/1107.2490, 2011. |

[40] | T. Yang and Q. Lin, Stochastic subgradient methods with linear convergence for polyhedral convex optimization, arXiv:1510.01444, 2016. |

[41] | A. Yurtsever, B. C. Vu, and V. Cevher, Stochastic three-composite convex minimization, In Advances in Neural Information Processing Systems (NIPS), 4322–4330, 2016. |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.