##
**Alternating minimization methods for strongly convex optimization.**
*(English)*
Zbl 1478.90092

Summary: We consider alternating minimization procedures for convex and non-convex optimization problems with the vector of variables divided into several blocks, each block being amenable for minimization with respect to its variables while maintaining other variables blocks constant. In the case of two blocks, we prove a linear convergence rate for an alternating minimization procedure under the Polyak-Łojasiewicz (PL) condition, which can be seen as a relaxation of the strong convexity assumption. Under the strong convexity assumption in the many-blocks setting, we provide an accelerated alternating minimization procedure with linear convergence rate depending on the square root of the condition number as opposed to just the condition number for the non-accelerated method. We also consider the problem of finding an approximate non-negative solution to a linear system of equations \(Ax=y\) with alternating minimization of Kullback-Leibler (KL) divergence between \(Ax\) and \(y\).

### MSC:

90C25 | Convex programming |

### Keywords:

convex optimization; nonconvex optimization; alternating minimization; block-coordinate method; complexity analysis### References:

[1] | J. Altschuler, J. Weed and P. Rigollet, Near-linear time approxfimation algorithms for optimal transport via Sinkhorn iteration, Proceedings of the 31st International Conference on Neural Information Processing, ACM, New York (2017), 1961-1971. |

[2] | A. Andresen and V. Spokoiny, Convergence of an alternating maximization procedure, J. Mach. Learn. Res. 17 (2016), Paper No. 63. · Zbl 1360.62082 |

[3] | A. S. Anikin, A. V. Gasnikov, P. E. Dvurechensky, A. I. Tyurin and A. V. Chernov, Dual approaches to the minimization of strongly convex functionals with a simple structure under affine constraints, Comput. Math. Math. Phys. 57 (2017), no. 8, 1262-1276. · Zbl 1380.49046 |

[4] | A. Bayandina, P. Dvurechensky, A. Gasnikov, F. Stonyakin and A. Titov, Mirror descent and convex optimization problems with non-smooth inequality constraints, Large-Scale and Distributed Optimization, Lecture Notes in Math. 2227, Springer, Cham (2018), 181-213. · Zbl 1421.90112 |

[5] | A. Beck, On the convergence of alternating minimization for convex programming with applications to iteratively reweighted least squares and decomposition schemes, SIAM J. Optim. 25 (2015), no. 1, 185-209. · Zbl 1358.90094 |

[6] | A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci. 2 (2009), no. 1, 183-202. · Zbl 1175.94009 |

[7] | J.-D. Benamou, G. Carlier, M. Cuturi, L. Nenna and G. Peyré, Iterative Bregman projections for regularized transportation problems, SIAM J. Sci. Comput. 37 (2015), no. 2, A1111-A1138. · Zbl 1319.49073 |

[8] | D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, Prentice-Hall, Upper Saddle River, 1989. · Zbl 0743.65107 |

[9] | C. L. Byrne, Iterative reconstruction algorithms based on cross-entropy minimization, Image Models (and Their Speech Model Cousins), Springer, New York (1996), 1-11. · Zbl 0925.62030 |

[10] | C. L. Byrne, Iterative Optimization in Inverse Problems, CRC Press, Boca Raton, 2014. · Zbl 1285.65035 |

[11] | J. Carroll and J.-J. Chang, Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition, Psychometrika 35 (1970), no. 3, 283-319. · Zbl 0202.19101 |

[12] | A. Chambolle, P. Tan and S. Vaiter, Accelerated alternating descent methods for Dykstra-like problems, J. Math. Imaging Vision 59 (2017), no. 3, 481-497. · Zbl 1382.65167 |

[13] | A. Chernov, P. Dvurechensky and A. Gasnikov, Fast primal-dual gradient method for strongly convex minimization problems with linear constraints, Discrete Optimization and Operations Research, Lecture Notes in Comput. Sci. 9869, Springer, Cham (2016), 391-403. · Zbl 1391.90471 |

[14] | A. Cichocki, N. Lee, I. V. Oseledets, A. H. Phan, Q. Zhao and D. Mandic, Low-rank tensor networks for dimensionality reduction and large-scale optimization problems: Perspectives and challenges Part 1, preprint (2016), https://arxiv.org/abs/1609.00893. |

[15] | I. Csiszár and G. E. Tusnády, Information geometry and alternating minimization procedures, Recent Results in Estimation Theory and Related Topics, Oldenbourg Verlag, München (1984), 205-237. · Zbl 0547.60004 |

[16] | M. Cuturi, Sinkhorn distances: Lightspeed computation of optimal transport, Proceedings of the 26th International Conference on Neural Information Processing, ACM, New York (2013), 2292-2300. |

[17] | M. Cuturi and A. Doucet, Fast computation of Wasserstein barycenters, Proceedings of the 31st International Conference on Machine Learning, Proc. Mach. Learn. Res. (PMLR) 32, ML Research Press, Beijing (2014), 685-693. |

[18] | M. Danilova, P. Dvurechensky, A. Gasnikov, E. Gorbunov, S. Guminov, D. Kamzolov and I. Shibaev, Recent theoretical advances in non-convex optimization, preprint (2020), https://arxiv.org/abs/2012.06188. |

[19] | J. Diakonikolas and L. Orecchia, Alternating randomized block coordinate descent, Proceedings of the 35th International Conference on Machine Learning, Proc. Mach. Learn. Res. (PMLR) 80, ML Research Press, Stockholm (2018), 1224-1232. |

[20] | D. Dvinskikh, A. Ogaltsov, A. Gasnikov, P. Dvurechensky and V. Spokoiny, Adaptive gradient descent for convex and non-convex stochastic optimization, preprint (2020), https://arxiv.org/abs/1911.08380v5. |

[21] | P. Dvurechensky, D. Dvinskikh, A. Gasnikov, C. A. Uribe and A. Nedić, Decentralize and randomize: Faster algorithm for Wasserstein barycenters, Advances in Neural Information Processing Systems 31-NeurIPS 2018, NeurIPS, San Diego (2018), 10783-10793. |

[22] | P. Dvurechensky, A. Gasnikov and A. Kroshnin, Computational optimal transport: Complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm, Proceedings of the 35th International Conference on Machine Learning, Proc. Mach. Learn. Res. (PMLR) 80, ML Research Press, Stockholm (2018), 1367-1376. |

[23] | P. Dvurechensky, A. Gasnikov, E. A. Nurminski and F. S. Stonyakin, Advances in Low-Memory Subgradient Optimization, Numerical Nonsmooth Optimization, Springer, Cham (2020), 19-59. |

[24] | P. Dvurechensky, A. Gasnikov, S. Omelchenko and A. Tiurin, A stable alternative to Sinkhorn’s algorithm for regularized optimal transport, Mathematical Optimization Theory and Operations Research, Springer, Cham (2020), 406-423. · Zbl 1460.90029 |

[25] | A. V. Gasnikov, P. E. Dvurechensky, F. S. Stonyakin and A. A. Titov, An adaptive proximal method for variational inequalities, Comput. Math. Math. Phys. 59 (2019), no. 5, 836-841. · Zbl 1477.47066 |

[26] | R. Gordon, R. Bender and G. T. Herman, Algebraic reconstruction techniques (art) for three-dimensional electron microscopy and x-ray photography, J. Theoret. Biol. 29 (1970), no. 3, 471-481. |

[27] | L. Grasedyck, D. Kressner and C. Tobler, A literature survey of low-rank tensor approximation techniques, GAMM-Mitt. 36 (2013), no. 1, 53-78. · Zbl 1279.65045 |

[28] | S. Guminov, P. Dvurechensky, N. Tupitsa and A. Gasnikov, Accelerated alternating minimization, accelerated Sinkhorn’s algorithm and accelerated iterative Bregman projections, preprint (2019), https://arxiv.org/abs/1906.03622. |

[29] | S. V. Guminov, Y. E. Nesterov, P. E. Dvurechensky and A. V. Gasnikov, Accelerated primal-dual gradient descent with linesearch for convex, nonconvex, and nonsmooth optimization problems, Dokl. Math. 99 (2019), no. 2, 125-128. · Zbl 1418.90192 |

[30] | N. Hao, Lior Horesh and M. Kilmer, Nonnegative tensor decomposition, Compressed Sensing & Sparse Filtering, Springer, Berlin (2014), 123-148. · Zbl 1352.65164 |

[31] | M. Hong, M. Razaviyayn, Z. Luo and J. Pang, A unified algorithmic framework for block-structured optimization involving big data: With applications in machine learning and signal processing, IEEE Signal Proc. Mag. 33 (2016), no. 1, 57-77. |

[32] | M. Hong, X. Wang, M. Razaviyayn and Z.-Q. Luo, Iteration complexity analysis of block coordinate descent methods, Math. Program. 163 (2017), no. 1-2, 85-114. · Zbl 1367.49010 |

[33] | Y. Hu, Y. Koren and C. Volinsky, Collaborative filtering for implicit feedback datasets, 8th IEEE International Conference on Data Mining, IEEE Press, Piscataway (2008), 263-272. |

[34] | A. Ivanova, P. Dvurechensky, A. Gasnikov and D. Kamzolov, Composite optimization for the resource allocation problem, Optim. Methods Softw. (2020), 10.1080/10556788.2020.1712599. · Zbl 1489.90116 · doi:10.1080/10556788.2020.1712599 |

[35] | A. Ivanova, D. Pasechnyuk, P. Dvurechensky, A. Gasnikov and E. Vorontsova, Numerical methods for the resource allocation problem in networks, Comp. Math. and Math. Phys. 61 (2021), no. 2, 147-179. · Zbl 1460.90051 |

[36] | D. Kamzolov, A. Gasnikov and P. Dvurechensky, Optimal combination of tensor optimization methods, Optimization and Applications, Springer, Cham (2020), 166-183. · Zbl 1511.90388 |

[37] | H. Karimi, J. Nutini and M. Schmidt, Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition, Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Comput. Sci. 9851, Springer, Cham (2016), 795-811. |

[38] | J. A. Kelner, Y. T. Lee, L. Orecchia and A. Sidford, An almost-linear-time algorithm for approximate max flow in undirected graphs, and its multicommodity generalizations, Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms, ACM, New York (2014), 217-226. · Zbl 1423.05177 |

[39] | S. Khenissi and O. Nasraoui, Modeling and counteracting exposure bias in recommender systems, Master’s Thesis, University of Louisville, 2019. |

[40] | T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Rev. 51 (2009), no. 3, 455-500. · Zbl 1173.65029 |

[41] | R. Krawtschenko, C. A. Uribe, A. Gasnikov and P. Dvurechensky, Distributed optimization with quantization for computing Wasserstein barycenters, preprint (2020), https://arxiv.org/abs/2010.14325. |

[42] | A. Kroshnin, N. Tupitsa, D. Dvinskikh, P. Dvurechensky, A. Gasnikov and C. Uribe, On the complexity of approximating Wasserstein barycenters, Proceedings of the 36th International Conference on Machine Learning, Proc. Mach. Learn. Res. (PMLR) 97, ML Research Press, Long Beach (2019), 3530-3540. |

[43] | K. Lange, M. Bahn and R. Little, A theoretical study of some maximum likelihood algorithms for emission and transmission tomography, IEEE Trans. Med. Imag. 6 (1987), no. 2, 106-114. |

[44] | T. Lin, N. Ho, X. Chen, M. Cuturi and M. I. Jordan, On the complexity of approximating multimarginal optimal transport, preprint (2019), https://arxiv.org/abs/1910.00152. |

[45] | T. Lin, N. Ho, X. Chen, M. Cuturi and M. I. Jordan, Fixed-support Wasserstein barycenters: Computational hardness and fast algorithm, preprint (2020), https://arxiv.org/abs/2002.04783. |

[46] | T. Lin, N. Ho and M. Jordan, On efficient optimal transport: An analysis of greedy and accelerated mirror descent algorithms, Proceedings of the 36th International Conference on Machine Learning, Proc. Mach. Learn. Res. (PMLR) 97, ML Research Press, Long Beach (2019), 3982-3991. |

[47] | Z.-Q. Luo and P. Tseng, Error bounds and convergence analysis of feasible descent methods: A general approach, Ann. Oper. Res. 46 (1993), 157-178. · Zbl 0793.90076 |

[48] | Y. Nesterov, A method of solving a convex programming problem with convergence rate \(o(1/k^2)\), Sov. Math. Dokl. 27 (1983), 372-376. · Zbl 0535.90071 |

[49] | Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Springer, New York, 2014. |

[50] | Y. Nesterov, Lectures on Convex Optimization, 2nd ed., Springer, New York, 2018. · Zbl 1427.90003 |

[51] | Y. Nesterov, A. Gasnikov, S. Guminov and P. Dvurechensky, Primal-dual accelerated gradient methods with small-dimensional relaxation oracle, Optim. Methods Softw. (2020), 10.1080/10556788.2020.1731747. · Zbl 1489.90124 · doi:10.1080/10556788.2020.1731747 |

[52] | Y. Nesterov and B. T. Polyak, Cubic regularization of Newton method and its global performance, Math. Program. 108 (2006), 177-205. · Zbl 1142.90500 |

[53] | J. Nutini, M. Schmidt, I. Laradji, M. Friedlander and H. Koepke, Coordinate descent converges faster with the gauss-southwell rule than random selection, Proceedings of the 32nd International Conference on Machine Learning, Proc. Mach. Learn. Res. (PMLR) 37, ML Research Press, Lille (2015), 1632-1641. |

[54] | J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, SIAM, Philadelphia, 2000. · Zbl 0949.65053 |

[55] | I. Perros, R. Chen, R. Vuduc and J. Sun, Sparse hierarchical tucker factorization and its application to healthcare, Proceedings of the 15th IEEE International Conference on Data Mining—ICDM 2015, IEEE Press, Piscataway (2016), 943-948. |

[56] | B. T. Polyak, Introduction to Optimization, Optimization Software, New York, 1987. · Zbl 0625.62093 |

[57] | A. Saha and A. Tewari, On the nonasymptotic convergence of cyclic coordinate descent methods, SIAM J. Optim. 23 (2013), no. 1, 576-601. · Zbl 1270.90032 |

[58] | S. Sra and I. S. Dhillon, Generalized nonnegative matrix approximations with bregman divergences, Proceedings of the 18th International Conference on Neural Information Processing, MIT Press, Cambridge (2006), 283-290. |

[59] | F. Stonyakin, A. Gasnikov, P. Dvurechensky, M. Alkousa and A. Titov, Generalized mirror prox for monotone variational inequalities: Universality and inexact oracle, preprint (2018), https://arxiv.org/abs/1806.05140. |

[60] | W. Su, S. Boyd and E. J. Candès, A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights, J. Mach. Learn. Res. 17 (2016), Paper No. 153. · Zbl 1391.90667 |

[61] | R. Sun and M. Hong, Improved iteration complexity bounds of cyclic block coordinate descent for convex problems, Proceedings of the 28th International Conference on Neural Information Processing Systems, MIT Press, Cambridge (2015), 1306-1314. |

[62] | N. Tupitsa, P. Dvurechensky, A. Gasnikov and C. A. Uribe, Multimarginal optimal transport by accelerated alternating minimization, IEEE 59th Conference on Decision and Control, IEEE Press, Piscataway (2020), 6132-6137. |

[63] | N. Tupitsa, A. Gasnikov, P. Dvurechensky and S. Guminov, Strongly convex optimization for the dual formulation of optimal transport, Mathematical Optimization Theory and Operations Research, Springer, Cham (2020), 192-204. |

[64] | C. A. Uribe, D. Dvinskikh, P. Dvurechensky, A. Gasnikov and A. Nedić, Distributed computation of Wasserstein barycenters over networks, 2018 IEEE Conference on Decision and Control, IEEE Press, Piscataway (2018), 6544-6549. |

[65] | Y. Vardi, L. A. Shepp and L. Kaufman, A statistical model for positron emission tomography, J. Amer. Statist. Assoc. 80 (1985), no. 389, 8-37. · Zbl 0561.62094 |

[66] | C. R. Vogel, Computational Methods for Inverse Problems, Front. Appl. Math. 23, SIAM, Philadelphia, 2002. · Zbl 1008.65103 |

[67] | N. Ye, F. Roosta-Khorasani and T. Cui, Optimization methods for inverse problems, 2017 MATRIX Annals, MATRIX Book Ser. 2, Springer, Cham (2019), 121-140. |

[68] | Y. Zhou, H. Zhang and Y. Liang, Geometrical properties and accelerated gradient solvers of non-convex phase retrieval, 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), IEEE Press, Piscataway (2016), 331-335. |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.