×

Recent advances in stochastic Riemannian optimization. (English) Zbl 1512.90149

Grohs, Philipp (ed.) et al., Handbook of variational methods for nonlinear geometric data. Cham: Springer. 527-554 (2020).
Summary: Stochastic and finite-sum optimization problems are central to machine learning. Numerous specializations of these problems involve nonlinear constraints where the parameters of interest lie on a manifold. Consequently, stochastic manifold optimization algorithms have recently witnessed rapid growth, also in part due to their computational performance. This chapter outlines numerous stochastic optimization algorithms on manifolds, ranging from the basic stochastic gradient method to more advanced variance reduced stochastic methods. In particular, we present a unified summary of convergence results. Finally, we also provide several basic examples of these methods to machine learning problems, including learning parameters of Gaussians mixtures, principal component analysis, and Wasserstein barycenters.
For the entire collection see [Zbl 07115003].

MSC:

90C15 Stochastic programming
90C48 Programming in abstract spaces
62R30 Statistics on manifolds
62H25 Factor analysis and principal components; correspondence analysis

Software:

RTRMC; Saga; Manopt
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2009) · Zbl 1147.65043
[2] Amari, S.I.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251-276 (1998) · doi:10.1162/089976698300017746
[3] Arnaudon, M., Barbaresco, F., Yang, L.: Medians and means in Riemannian geometry: existence, uniqueness and computation. In: Matrix Information Geometry, pp. 169-197. Springer, Berlin (2013) · Zbl 1319.58008
[4] Babanezhad, R., Laradji, I.H., Shafaei, A., Schmidt, M.: Masaga: a linearly-convergent stochastic first-order method for optimization on manifolds. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 344-359. Springer, Berlin (2018)
[5] Bécigneul, G., Ganea, O.E.: Riemannian adaptive optimization methods (2018). Preprint. arXiv:1810.00760
[6] Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Nashua (1999) · Zbl 1015.90077
[7] Bhatia, R.: Positive Definite Matrices. Princeton University Press, Princeton (2007) · Zbl 1133.15017
[8] Bonnabel, S.: Stochastic gradient descent on Riemannian manifolds. IEEE Trans. Autom. Control 58(9), 2217-2229 (2013) · Zbl 1369.90110 · doi:10.1109/TAC.2013.2254619
[9] Boumal, N., Absil, P.A.: RTRMC: a Riemannian trust-region method for low-rank matrix completion. In: Advances in Neural Information Processing Systems, pp. 406-414 (2011)
[10] Boumal, N., Mishra, B., Absil, P.A., Sepulchre, R.: Manopt, a matlab toolbox for optimization on manifolds. J. Mach. Learn. Res. 15(1), 1455-1459 (2014) · Zbl 1319.90003
[11] Boumal, N., Absil, P.A., Cartis, C.: Global rates of convergence for nonconvex optimization on manifolds. IMA J. Numer. Anal. 39(1), 1-33 (2019) · Zbl 1483.65092
[12] Cherian, A., Sra, S.: Riemannian dictionary learning and sparse coding for positive definite matrices. IEEE Trans. Neur. Net. Lear. Syst. 28(12), 2859-2871 (2017) · doi:10.1109/TNNLS.2016.2601307
[13] Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646-1654 (2014)
[14] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: near-optimal non-convex optimization via stochastic path-integrated differential estimator. In: Advances in Neural Information Processing Systems, pp. 687-697 (2018)
[15] Ghadimi, S., Lan, G.: Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341-2368 (2013) · Zbl 1295.90026 · doi:10.1137/120880811
[16] Guadarrama: Fitting large-scale gaussian mixtures with accelerated gradient descent. Master’s Thesis, University of Edinburgh (2018)
[17] Hosseini, R., Sra, S.: Matrix manifold optimization for Gaussian mixtures. In: Advances in Neural Information Processing Systems, pp. 910-918 (2015)
[18] Hosseini, R., Sra, S.: An alternative to EM for Gaussian mixture models: batch and stochastic Riemannian optimization. Math. Program. (2019) · Zbl 1441.62168
[19] Huang, W., Gallivan, K.A., Absil, P.A.: A broyden class of quasi-Newton methods for riemannian optimization. SIAM J. Optim. 25(3), 1660-1685 (2015) · Zbl 1461.65156 · doi:10.1137/140955483
[20] Jost, J.: Riemannian Geometry and Geometric Analysis. Springer, Berlin (2011) · Zbl 1227.53001 · doi:10.1007/978-3-642-21298-7
[21] Kasai, H., Mishra, B.: Inexact trust-region algorithms on Riemannian manifolds. In: Advances in Neural Information Processing Systems, pp. 4249-4260 (2018)
[22] Kasai, H., Sato, H., Mishra, B.: Riemannian stochastic recursive gradient algorithm. In: International Conference on Machine Learning, pp. 2516-2524 (2018)
[23] Kasai, H., Sato, H., Mishra, B.: Riemannian stochastic quasi-Newton algorithm with variance reduction and its convergence analysis. In: Twenty-First International Conference on Artificial Intelligence and Statistics, vol. 84, pp. 269-278 (2018)
[24] Kasai, H., Jawanpuria, P., Mishra, B.: Riemannian adaptive stochastic gradient algorithms on matrix manifolds. In: International Conference on Machine Learning, pp. 3262-3271 (2019)
[25] Kumar Roy, S., Mhammedi, Z., Harandi, M.: Geometry aware constrained optimization techniques for deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4460-4469 (2018)
[26] Liu, H., So, A.M.C., Wu, W.: Quadratic optimization with orthogonality constraint: explicit łojasiewicz exponent and linear convergence of retraction-based line-search and stochastic variance-reduced gradient methods. Math. Program. 1-48 (2018)
[27] Meyer, G., Bonnabel, S., Sepulchre, R.: Linear regression under fixed-rank constraints: a Riemannian approach. In: International Conference on Machine Learning (2011) · Zbl 1280.68185
[28] Mishra, B., Kasai, H., Jawanpuria, P., Saroop, A.: A Riemannian gossip approach to subspace learning on Grassmann manifold. Mach. Learn. 108(10), 1783-1803 (2019) · Zbl 1493.68309 · doi:10.1007/s10994-018-05775-x
[29] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: SARAH: a novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613-2621 (2017)
[30] Nguyen, L.M., van Dijk, M., Phan, D.T., Nguyen, P.H., Weng, T.W., Kalagnanam, J.R.: Optimal finite-sum smooth non-convex optimization with SARAH. arXiv preprint arXiv: 1901.07648 (2019)
[31] Nielsen, F., Bhatia, R.: Matrix Information Geometry. Springer, Berlin (2013) · Zbl 1252.94003 · doi:10.1007/978-3-642-30232-9
[32] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314-323 (2016)
[33] Zhang, H., Reddi, S., Sra, S.: Riemannian SVRG: Fast stochastic optimization on Riemannian manifolds. In: Advances in Neural Information Processing Systems, pp. 4592-4600 (2016)
[34] Rudi, A., Ciliberto, C., Marconi, G., Rosasco, L.: Manifold structured prediction. In: Advances in Neural Information Processing Systems, pp. 5610-5621 (2018)
[35] Sala, F., De Sa, C., Gu, A., Re, C.: Representation tradeoffs for hyperbolic embeddings. In: International Conference on Machine Learning, vol. 80, pp. 4460-4469 (2018)
[36] Sato, H., Kasai, H., Mishra, B.: Riemannian stochastic variance reduced gradient algorithm with retraction and vector transport. SIAM J. Optim. 29(2), 1444-1472 (2019) · Zbl 1421.90084 · doi:10.1137/17M1116787
[37] Sra, S., Hosseini, R.: Conic geometric optimization on the manifold of positive definite matrices. SIAM J. Optim. 25(1), 713-739 (2015) · Zbl 1316.65065 · doi:10.1137/140978168
[38] Sra, S., Hosseini, R., Theis, L., Bethge, M.: Data modeling with the elliptical gamma distribution. In: Artificial Intelligence and Statistics, pp. 903-911 (2015) · Zbl 1466.62102
[39] Vandereycken, B.: Low-rank matrix completion by Riemannian optimization. SIAM J. Optim. 23(2), 1214-1236 (2013) · Zbl 1277.15021 · doi:10.1137/110845768
[40] Weber, M., Sra, S.: Riemannian Frank-Wolfe methods with application to the Karcher and Wasserstein means. arXiv: 1710.10770 (2018)
[41] Xu, Z., Gao, X.: On truly block eigensolvers via Riemannian optimization. In: International Conference on Artificial Intelligence and Statistics, pp. 168-177 (2018)
[42] Yuan, X., Huang, W., Absil, P.A., Gallivan, K.A.: A Riemannian quasi-Newton method for computing the karcher mean of symmetric positive definite matrices. Technical Reporet FSU17-02, Florida State University (2017)
[43] Zhang, H., Sra, S.: First-order methods for geodesically convex optimization. In: Conference on Learning Theory, pp. 1617-1638 (2016)
[44] Zhang, J., Zhang, H., Sra, S.: R-SPIDER: A fast Riemannian stochastic optimization algorithm with curvature independent rate. arXiv: 1811.04194 (2018)
[45] Zhou, P.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.