×

zbMATH — the first resource for mathematics

Coordinate descent algorithms. (English) Zbl 1317.49038
Summary: Coordinate descent algorithms solve optimization problems by successively performing approximate minimization along coordinate directions or coordinate hyperplanes. They have been used in applications for many years, and their popularity continues to grow because of their usefulness in data analysis, machine learning, and other areas of current interest. This paper describes the fundamentals of the coordinate descent approach, together with variants and extensions and their convergence properties, mostly with reference to convex objectives. We pay particular attention to a certain problem structure that arises frequently in machine learning applications, showing that efficient implementations of accelerated coordinate descent algorithms are possible for problems of this type. We also present some parallel variants and discuss their convergence properties under several models of parallel execution.

MSC:
49M20 Numerical methods of relaxation type
90C25 Convex programming
PDF BibTeX XML Cite
Full Text: DOI arXiv
References:
[1] Attouch, H; Bolte, J; Redont, P; Soubeyran, A, Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-lojasiewicz inequality, Math. Oper. Res., 35, 438-457, (2010) · Zbl 1214.65036
[2] Beck, A; Teboulle, M, A fast iterative shrinkage-threshold algorithm for linear inverse problems, SIAM J. Imaging Sci., 2, 183-202, (2009) · Zbl 1175.94009
[3] Beck, A; Tetruashvili, L, On the convergence of block coordinate descent methods, SIAM J. Optim., 23, 2037-2060, (2013) · Zbl 1297.90113
[4] Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999) · Zbl 1015.90077
[5] Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall Inc, Englewood Cliffs (1989) · Zbl 0743.65107
[6] Bolte, J; Sabach, S; Teboulle, M, Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Math. Program. Ser. A, 146, 1-36, (2014) · Zbl 1297.90125
[7] Bouman, CA; Sauer, K, A unified approach to statistical tomography using coordinate descent optimization, IEEE Trans. Image Process., 5, 480-492, (1996)
[8] Boyd, S; Parikh, N; Chu, E; Peleato, B; Eckstein, J, Distributed optimization and statistical learning via the alternating direction methods of multipliers, Found. Trends Mach. Learn., 3, 1-122, (2011) · Zbl 1229.90122
[9] Bradley, J.K., Kyrola, A., Bickson, D., Guestrin, C.: Parallel coordunate descent for \(ℓ _1\)-regularized loss minimization. In: Proceedings of the 28 International Conference on Machine Learning (ICML 2011) (2011)
[10] Breheny, P; Huang, J, Coordinate descent algroithms for nonconvex penalized regression, with applications to biological feature selection, Ann. Appl. Stat., 5, 232-252, (2011) · Zbl 1220.62095
[11] Canutescu, AA; Dunbrack, RL, Cyclic coordinate descent: a robotics algorithm for protein loop closure, Protein Sci., 12, 963-972, (2003)
[12] Chang, K; Hsieh, C; Lin, C, Coordinate descent method for large-scale l2-loss linear support vector machines, J. Mach. Learn. Res., 9, 1369-1398, (2008) · Zbl 1225.68157
[13] Eckstein, J; Bertsekas, DP, On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators, Math. Program., 55, 293-318, (1992) · Zbl 0765.90073
[14] Eckstein, J., Yao, W.: Understanding the convergence of the alternating direction method of multipliers: theoretical and computational perspectives. Technical Report, RUTCOR, Rutgers University (2014) · Zbl 1330.90074
[15] Fercoq, O., Qu, Z., Richtarik, P., Takac, M.: Fast distributed coordinate descent for non-strongly convex losses (2014). arxiv:1405.5300
[16] Fercoq, O., Richtarik, P.: Accelerated, parallel, and proximal coordinate descent. Technical Report, School of Mathematics, University of Edinburgh (2013). arXiv:1312.5799 · Zbl 1327.65108
[17] Florian, M; Chen, Y, A coordinate descent method for the bilevel O-D matrix adjustment problem, Int. Trans. Oper. Res., 2, 165-179, (1995) · Zbl 0868.90038
[18] Friedman, J; Hastie, T; Tibshirani, R, Sparse inverse covariance estimation with the graphical lasso, Biostatistics, 9, 432-441, (2008) · Zbl 1143.62076
[19] Friedman, JH; Hastie, T; Tibshirani, R, Regularization paths for generalized linear models via coordinate descent, J. Stat. Softw., 33, 1-22, (2010)
[20] Jaggi, M., Smith, V., Takác, M., Terhorst, J., Krishnan, S., Hoffman, T., Jordan, M.I.: Communication-efficient distributed dual coordinate ascent. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 3068-3076. Curran Associates (2014) · Zbl 1143.62076
[21] Jain, P., Netrapalli, P., Sanghavi, S.: Low-rank matrix completion using alternating minimization. Technical Report (2012). arXiv:1212.0467 · Zbl 1293.65073
[22] Kaczmarz, S, Angenäherte auflösung von systemen linearer gleichungen, Bulletin International de l’Academie Polonaise des Sciences et des Lettres, 35, 355-357, (1937) · Zbl 0017.31703
[23] Lee, Y.T., Sidford, A.: Efficient accelerated coordinate descent methods and faster algorihtms for solving linear systems. In: 54th Annual Symposium on Foundations of Computer Science, pp. 147-156 (2013) · Zbl 1229.90122
[24] Leventhal, D; Lewis, AS, Randomized methods for linear constraints: convergence rates and conditioning, Math. Oper. Res., 35, 641-654, (2010) · Zbl 1216.15006
[25] Lin, Q., Lu, Z., Xiao, L.: An accelerated proximal coordinate gradient method and its application to empirical risk minimization. Technical Report, Microsoft Research (2014). arXiv:1407.1296 · Zbl 1329.65127
[26] Liu, H., Palatucci, M., Zhang, J.: Lockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery. In: Proceedings of the 26th Annual International Conference on Machine Learning. ICML ’09, pp. 649-656. ACM, New York, NY, USA (2009) · Zbl 0017.31703
[27] Liu, J., Wright, S.J.: Asynchronous stochastic coordinate descent: parallelism and convergence properties. Technical Report, University of Wisconsin, Madison. (2014). (To appear in SIAM Journal on Optimization). arXiv:1403.3862 · Zbl 1358.90098
[28] Liu, J., Wright, S.J., Ré, C., Bittorf, V., Sridhar, S.: An asynchronous parallel stochastic coordinate descent algorithm. Technical Report, Computer Sciences Department, University of Wisconsin-Madison (2013). (To appear in Journal of Machine Learning Research). arXiv:1311.1873 · Zbl 1337.68286
[29] Liu, J., Wright, S.J., Sridhar, S.: An accelerated randomized Kaczmarz algorithm. Technical Report, Computer Sciences Department, University of Wisconsin-Madison (2013). (To appear in Mathematics of Computation). arXiv 1310.2887 · Zbl 1143.62076
[30] Luo, ZQ; Tseng, P, On the convergence of the coordinate descent method for convex differentiable minimization, J. Optim. Theory Appl., 72, 7-35, (1992) · Zbl 0795.90069
[31] Luo, ZQ; Tseng, P, Error bounds and convergence analysis of feasible descent methods: a general approach, Ann. Oper. Res., 46, 157-178, (1993) · Zbl 0793.90076
[32] Marecek, J., Richtarik, P., Takac, M.: Distributed block coordinate descent for minimizing partially separable functions. Technical Report arXiv:1406.0238 (2014) · Zbl 1330.65089
[33] Mazumder, R; Friedman, JH; Hastie, T, Sparsenet: coordinate descent with nonconvex penalties, J. Am. Stat. Assoc., 106, 1125-1138, (2011) · Zbl 1229.62091
[34] Necoara, I., Clipici, D.: Distributed random coordinate descent method for composite minimization. Technical Report 1-41, University Politehnica Bucharest (2013) · Zbl 1329.90108
[35] Nesterov, Y, A method for unconstrained convex problem with the rate of convergence \(O(1/k^2)\), Doklady AN SSSR, 269, 543-547, (1983)
[36] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Springer, New York (2004) · Zbl 1086.90045
[37] Nesterov, Y, Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM J. Optim., 22, 341-362, (2012) · Zbl 1257.90073
[38] Niu, F., Recht, B., Ré, C., Wright, S.J.: Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 24, pp. 693-701. Curran Associates (2011) · Zbl 1229.62091
[39] Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York (1970) · Zbl 0241.65046
[40] Patrascu, A., Necoara, I.: Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization. J. Glob. Optim. (2013). doi:10.1007/s10898-014-0151-9 · Zbl 1335.90074
[41] Platt, JC; Schölkopf, B (ed.); Burges, CJC (ed.); Smola, AJ (ed.), Fast training of support vector machines using sequential minimal optimization, 185-208, (1999), Cambridge
[42] Polyak, B.T.: Introduction to Optimization. Optimization Software, New York (1987)
[43] Powell, MJD, On search directions for minimization algorithms, Math. Program., 4, 193-201, (1973) · Zbl 0258.90043
[44] Razaviyayn, M; Hong, M; Luo, ZQ, A unified convergence analysis of block successive minimization methods for nonsmooth optimization, SIAM J. Optim., 23, 1126-1153, (2013) · Zbl 1273.90123
[45] Recht, B; Fazel, M; Parrilo, P, Guaranteed minimum-rank solutions to linear matrix equations via nuclear norm minimization, SIAM Rev., 52, 471-501, (2010) · Zbl 1198.90321
[46] Richtarik, P., Takac, M.: Parallel coordinate descent methods for big data optimization. Technical Report, School of Mathematics, University of Edinburgh (2013). arXiv:1212.0873 · Zbl 1342.90102
[47] Richtarik, P; Takac, M, Iteration complexity of a randomized block-coordinate descent methods for minimizing a composite function, Math. Program. Ser. A, 144, 1-38, (2014) · Zbl 1301.65051
[48] Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970) · Zbl 0193.18401
[49] Sardy, S; Bruce, A; Tseng, P, Block coordinate relaxation methods for nonparametric wavelet denoising, J. Comput. Graph. Stat., 9, 361-379, (2000)
[50] Shalev-Shwartz, S; Tewari, A, Stochastic methods for \(ℓ _1\)-regularized loss minimization, J. Mach. Learn. Res., 12, 1865-1892, (2011) · Zbl 1280.62081
[51] Shalev-Shwartz, S; Zhang, T, Stochastic dual coordinate ascent mehods for regularized loss minimization, J. Mach. Learn. Res., 14, 437-469, (2013)
[52] Strohmer, T; Vershynin, R, A randomized Kaczmarz algorithm with exponential convergence, J. Fourier Anal. Appl., 15, 262-278, (2009) · Zbl 1169.68052
[53] Tibshirani, R, Regression shrinkage and selection via the LASSO, J. R. Stat. Soc. B, 58, 267-288, (1996) · Zbl 0850.62538
[54] Tseng, P, Convergence of a block coordinate descent method for nondifferentiable minimization, J. Optim. Theory Appl., 109, 475-494, (2001) · Zbl 1006.65062
[55] Tseng, P; Yun, S, A coordinate gradient descent method for nonsmooth separable minimization, Math. Program. Ser. B, 117, 387-423, (2009) · Zbl 1166.90016
[56] Ye, JC; Webb, KJ; Bouman, CA; Millane, RP, Optical diffusion tomography by iterative-coordinate-descent optimization in a Bayesian framework, J. Opt. Soc. Am. A, 16, 2400-2412, (1999)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.