# zbMATH — the first resource for mathematics

Stochastic methods for composite and weakly convex optimization problems. (English) Zbl 06989166

##### MSC:
 65K10 Numerical optimization and variational techniques
##### Software:
 [1] J.-P. Aubin and A. Cellina, Differential Inclusions: Set-Valued Maps and Viability Theory, Springer-Verlag, Berlin, 1984. · Zbl 0538.34007 [2] M. Benaïm, J. Hofbauer, and S. Sorin, Stochastic approximations and differential inclusions, SIAM J. Control Optim., 44 (2005), pp. 328–348. · Zbl 1087.62091 [3] D. P. Bertsekas, Stochastic optimization problems with nondifferentiable cost functionals, J. Optim. Theory Appl., 12 (1973), pp. 218–231. · Zbl 0248.90043 [4] J. Bolte, A. Daniilidis, A. S. Lewis, and M. Shiota, Clarke subgradients of stratifiable functions, SIAM J. Optim., 18 (2007), pp. 556–572. · Zbl 1142.49006 [5] J. Bolte, A. Daniilidis, O. Ley, and L. Mazet, Characterizations of Łojasiewicz inequalities: Subgradient flows, talweg, convexity, Trans. Amer. Math. Soc., 362 (2010), pp. 3319–3363. · Zbl 1202.26026 [6] V. Borkar, Stochastic Approximation, Cambridge University Press, Cambridge, UK, 2008. · Zbl 1181.62119 [7] J. Burke, Descent methods for composite nondifferentiable optimization problems, Math. Program., 33 (1985), pp. 260–279. · Zbl 0581.90084 [8] E. Candès, T. Strohmer, and V. Voroninski, PhaseLift: Exact and stable signal recovery from magnitude measurements via convex programming, Comm. Pure Appl. Math., 66 (8), 2013. [9] E. J. Candès, X. Li, and M. Soltanolkotabi, Phase retrieval via Wirtinger flow: Theory and algorithms, IEEE Trans. Inform. Theory, 61 (2015), pp. 1985–2007. · Zbl 1359.94069 [10] Y. Chen and E. Candès, Solving random quadratic systems of equations is nearly as easy as solving linear systems, Comm. Pure Appl. Math., 2015, to appear. [11] A. R. Conn, N. I. M. Gould, and P. L. Toint, Trust Region Methods, MOS-SIAM Ser. Optim., Philadelphia, 2000. [12] D. Davis and D. Drusvyatskiy, Stochastic subgradient method converges at the rate $${O}(k^{-1/4})$$ on weakly convex functions, preprint, [cs.LG], 2018. [13] D. Davis and D. Drusvyatskiy, Stochastic model-based minimization of weakly convex functions, preprint, [cs.LG], 2018. [14] D. Davis and B. Grimmer, Proximally guided stochastic subgradient method for nonsmooth, nonconvex problems, preprint, [cs.LG], 2017. [15] D. Davis, D. Drusvyatskiy, S. Kakade, and J. D. Lee, Stochastic subgradient method converges on tame functions, preprint, [cs.LG], 2018. [16] A. Dembo, Probability Theory: STAT310/MATH230, lecture notes, Stanford University, Stanford, CA, 2016, . [17] P. A. Dorofeyev, Some properties of the generalized gradient method, Comput. Math. Math. Phys., 25 (1985), pp. 117–122. [18] D. Drusvyatskiy, The proximal point method revisited, preprint, [math.OC], 2018, . [19] D. Drusvyatskiy and A. Lewis, Error bounds, quadratic growth, and linear convergence of proximal methods, Math. Oper. Res., 43 (2018), pp. 919–948. [20] D. Drusvyatskiy and C. Paquette, Efficiency of minimizing compositions of convex functions and smooth maps, preprint, [math.OC], 2016. [21] D. Drusvyatskiy, A. Ioffe, and A. Lewis, Nonsmooth optimization using Taylor-like models: Error bounds, convergence, and termination criteria, preprint, [math.OS], 2016. [22] J. C. Duchi and F. Ruan, Stochastic methods for composite and weakly convex optimization problems, preprint, [math.OC], 2017. [23] J. C. Duchi and F. Ruan. Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval, Inf. Inference, 2018, to appear. [24] J. C. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, J. Mach. Learning Res., 12 (2011), pp. 2121–2159. · Zbl 1280.68164 [25] Y. M. Ermoliev, On the stochastic quasi-gradient method and stochastic quasi-Feyer sequences, Kibernetika, 2 (1969), pp. 72–83. [26] Y. M. Ermoliev and V. Norkin, Stochastic generalized gradient method for nonconvex nonsmooth stochastic optimization, Cybernet. Systems Anal., 34 (1998), pp. 196–215. · Zbl 0930.90074 [27] Y. M. Ermoliev and V. Norkin, Solution of nonconvex nonsmooth stochastic optimization problems, Cybernet. Systems Anal., 39 (2003), pp. 701–715. · Zbl 1066.90071 [28] R. Fletcher, A model algorithm for composite nondifferentiable optimization problems, in Nondifferential and Variational Techniques in Optimization, D. C. Sorensen and R. J.-B. Wets, eds., Math. Program. Studies 17, Springer, Berlin, 1982, pp. 67–76. · Zbl 0478.90063 [29] R. Fletcher and G. A. Watson, First and second order conditions for a class of nondifferentiable optimization problems, Math. Program., 18 (1980), pp. 291–307. · Zbl 0433.90066 [30] A. M. Gupal, Stochastic Methods for Solving Nonsmooth Extremal Problems, Naukova Dumka, Kiev, 1979 (In Ukrainian). [31] M. Hardt and T. Ma, Identity matters in deep learning, in Proceedings of the Fifth International Conference on Learning Representations, 2017. [32] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, 2nd ed., Springer, New York, 2009. · Zbl 1273.62005 [33] J. Hiriart-Urruty and C. Lemaréchal, Convex Analysis and Minimization Algorithms I & II, Springer, New York, 1993. [34] A. D. Ioffe, Critical values of set-valued maps with stratifiable graphs. extensions of Sard and Smale-Sard theorems, Proc. Amer. Math. Soc., 136 (2008), pp. 3111–3119. · Zbl 1191.49015 [35] A. D. Ioffe, Variational Analysis of Regular Mappings, Springer, New York, 2017. · Zbl 1381.49001 [36] M. Kunze, Non-Smooth Dynamical Systems, Lect. Notes Math. 1744, Springer, New York, 2000. · Zbl 0965.34026 [37] H. J. Kushner and G. Yin, Stochastic Approximation and Recursive Algorithms and Applications, 2nd ed., Springer, New York, 2003. · Zbl 1026.62084 [38] A. S. Lewis and S. J. Wright, A proximal method for composite minimization, preprint, [math.OC], 2018. [39] L. Ljung, Analysis of recursive stochastic algorithms, IEEE Trans. Automat. Control, 22 (1977), pp. 551–575. · Zbl 0362.93031 [40] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust stochastic approximation approach to stochastic programming, SIAM J. Optim., 19 (2009), pp. 1574–1609. · Zbl 1189.90109 [41] Y. Nesterov, Introductory Lectures on Convex Optimization, Kluwer Academic Publishers, Dordrecht, the Netherlands, 2004. · Zbl 1086.90045 [42] J. Nocedal and S. J. Wright, Numerical Optimization, Springer, New York, 2006. · Zbl 1104.65059 [43] R. Poliquin and R. T. Rockafellar, Prox-regular functions in variational analysis, Trans. AMS, 348 (1996), pp. 1805–1838. · Zbl 0861.49015 [44] H. Robbins and D. Siegmund, A convergence theorem for non-negative almost supermartingales and some applications, in Optimizing Methods in Statistics, Academic Press, New York, 1971, pp. 233–257. [45] R. T. Rockafellar, Measurable dependence of convex sets and functions on parameters, J. Math. Anal. Appl., 28 (1969), pp. 4–25. · Zbl 0202.33804 [46] R. T. Rockafellar, Monotone operators and the proximal point algorithm, SIAM J. Control Optim., 14 (1976), pp. 877–898. · Zbl 0358.90053 [47] R. T. Rockafellar and R. J. B. Wets, Variational Analysis, Springer, New York, 1998. · Zbl 0888.49001 [48] A. Ruszczyński, A linearization method for nonsmooth stochastic programming problems, Math. Oper. Res., 12 (1987), pp. 32–49. · Zbl 0624.90078 [49] Y. Schechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. Miao, and M. Segev, Phase retrieval with application to optical imaging, IEEE Signal Process. Magazine, 2015, pp. 87–109. [50] W. Su, S. Boyd, and E. Candes, A differential equation for modeling nesterovs accelerated gradient method: Theory and insights, in Advances in Neural Information Processing Systems 27, 2014, pp. 2510–2518. [51] R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B. Stat. Methodol., 58 (1996), pp. 267–288. · Zbl 0850.62538 [52] M. Udell, K. Mohan, D. Zeng, J. Hong, S. Diamond, and S. Boyd, Convex optimization in Julia, in First Workshop on High Performance Technical Computing in Dynamic Languages, IEEE, Piscataway, NJ, 2014, pp. 18–28. [53] M. Wang, J. Liu, and E. Fang, Accelerating stochastic composition optimization, in Advances in Neural Information Processing Systems 29, 2016, pp. 1714–1722. [54] M. Wang, E. Fang, and H. Liu, Stochastic compositional gradient descent: Algorithms for minimizing compositions of expected-value functions, Math. Program. Ser. A, 161 (2017), pp. 419–449. · Zbl 1356.90099 [55] H. Whitney, A function not constant on a connected set of critical points, Duke Math. J., 1 (1935), pp. 514–517. · Zbl 0013.05801 [56] A. Wibisono, A. Wilson, and M. I. Jordan, A variational perspective on accelerated methods in optimization, Proc. Natl. Acad. Sci. USA, 113 (2016), pp. 7351–7358.