zbMATH — the first resource for mathematics

Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. (English) Zbl 1301.65051
The authors further improve, extend and simplify the iteration complexity results of Yu. Nesterov [SIAM J. Optim. 22, No. 2, 341–362 (2012; Zbl 1257.90073)], considering the problem of minimizing the sum of a smooth convex and a simple nonsmooth convex block separable function. They focus exclusively on simple (as opposed to accelerated) methods because the per-iteration work of the accelerated algorithm of Nesterov [loc. cit.] on huge scale instances of problems with sparse data is excessive. A randomized block-coordinate descent method for minimizing the sum of a smooth and a simple nonsmooth convex block separable function is developed. This extends the results of Nesterov [loc. cit.], which cover the smooth case. More importantly, in contrast with the aforementioned work in which the author achieves the results by applying the method to a regularized version of the objective function with an unknown scaling factor, it is shown that this is not necessary, thus achieving first true iteration complexity bounds. For a strongly convex function the method converges linearly. Each algorithm of this paper is supported by a high probability iteration complexity result. It is numerically presented that the algorithm is able to solve huge-scale $$\ell_1$$ -regularized least squares problems with a billion variables.

MSC:
 65K05 Numerical mathematical programming methods 90C05 Linear programming 90C06 Large-scale problems in mathematical programming 90C25 Convex programming 65Y20 Complexity and performance of numerical algorithms
Full Text:
References:
 [1] Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific (1999) · Zbl 1225.68157 [2] Canutescu, AA; Dunbrack, RL, Cyclic coordinate descent: a robotics algorithm for protein loop closure, Protein Sci., 12, 963-972, (2003) [3] Chang, K-W; Hsieh, C-J; Lin, C-J, Coordinate descent method for large-scale $$l_2$$-loss linear support vector machines, J. Mach. Learn. Res., 9, 1369-1398, (2008) · Zbl 1225.68157 [4] Friedman, J., Hastie, T., Robert, T.: A Note on the Group Lasso and a Sparse Group Lasso. Technical report (2010) [5] Hsieh, C.-J., Chang, K.-W., Lin, C.-J., Sathiya Keerthi, S., Sundararajan, S.: A dual coordinate descent method for large-scale linear svm. In ICML 2008, pp. 408-415 (2008) · Zbl 1391.94442 [6] Leventhal, D; Lewis, AS, Randomized methods for linear constraints: convergence rates and conditioning, Math. Oper. Res, 35, 641-654, (2010) · Zbl 1216.15006 [7] Lewis, A.S., Wright, S.J.: A Proximal Method for Composite Minimization. Technical report (2008) · Zbl 1345.49041 [8] Li, Y; Osher, S, Coordinate descent optimization for $$l_1$$ minimization with application to compressed sensing: a greedy algorithm, Inverse Probl. Imaging, 3, 487-503, (2009) · Zbl 1188.90196 [9] Luo, Z.Q., Tseng, P.: A coordinate gradient descent method for nonsmooth separable minimization. J. Optim. Theory Appl. 72 (1) (2002) · Zbl 1220.90092 [10] Meier, L; Geer, S; Buhlmann, P, The group lasso for logistic regression, J. R. Stat. Soc. B, 70, 53-71, (2008) · Zbl 1400.62276 [11] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, 1st edn. Springer, Netherlands (2004) · Zbl 1086.90045 [12] Nesterov, Y.: Gradient Methods for Minimizing Composite Objective Function. Core discussion paper $$\#$$ 2007/76, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE) (2007) [13] Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22, 341-362 (2012) · Zbl 1257.90073 [14] Qin, Z., Scheinberg, K., Goldfarb, D.: Efficient Block-Coordinate Descent Algorithms for the Group Lasso. Technical report (2010) · Zbl 1275.90059 [15] Richtárik, P., Takáč, M.: Efficiency of randomized coordinate descent methods on minimization problems with a composite objective function. In: 4th Workshop on Signal Processing with Adaptive Sparse Structured, Representations (2011) [16] Richtárik, P., Takáč, M.: Efficient serial and parallel coordinate descent method for huge-scale truss topology design. In: Operations Research Proceedings 2011, pp. 27-32. Springer (2012) · Zbl 1306.74043 [17] Saha, A., Tewari, A.: On the Finite Time Convergence of Cyclic Coordinate Descent Methods. CoRR, abs/1005.2146 (2010) · Zbl 1137.62045 [18] Shalev-Shwartz, S., Tewari, A.: Stochastic methods for $$l_1$$ regularized loss minimization. In: Proceedings of the 26th International Conference on, Machine Learning (2009) · Zbl 1280.62081 [19] Strohmer, T., Vershynin, R.: A randomized kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15, 262-278 (2009) · Zbl 1169.68052 [20] Tibshirani, R, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. B, 58, 268-288, (1996) · Zbl 0850.62538 [21] Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109, 475-494 (2001) · Zbl 1006.65062 [22] Tseng, P., Yun, S., Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization. J. Optim. Theory Appl. 140, 513-535 (2009). doi:10.1007/s10957-008-9458-3 · Zbl 1190.90279 [23] Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. Ser. B, 117, 387-423 (2009) · Zbl 1166.90016 [24] Wen, Z., Goldfarb, D., Scheinberg, K.: Block coordinate descent methods for semidefinite programming. In: Anjos, M.F., Lasserre, J.B. (eds.) Handbook on Semidefinite, Cone and Polynomial Optimization: Theory, Algorithms, Software and Applications. Springer (forthcoming) · Zbl 1334.90118 [25] Wright, S.J.: University of Wisconsin, Accelerated Block-Coordinate Relaxation for Regularized Optimization. Technical report (2010) · Zbl 1242.62065 [26] Wright, SJ; Nowak, RD; Figueiredo, MAT, Sparse reconstruction by separable approximation, Trans. Sig. Proc, 57, 2479-2493, (2009) · Zbl 1391.94442 [27] Wu, TT; Lange, K, Coordinate descent algorithms for lasso penalized regression, Ann. Appl. Stat., 2, 224-244, (2008) · Zbl 1137.62045 [28] Yuan, G-X; Chang, K-W; Hsieh, C-J; Lin, C-J, A comparison of optimization methods and software for large-scale $$l_1$$-regularized linear classification, J. Mach. Learn. Res., 11, 3183-3234, (2010) · Zbl 1242.62065 [29] Yuan, G.-X., Ho, C.-H., Lin, C.-J.: Recent Advances of Large-Scale Linear Classification. Technical report (2012) [30] Yuan, M; Lin, Y, Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. B, 68, 49-67, (2006) · Zbl 1141.62030 [31] Yun, S; Toh, K-C, A coordinate gradient descent method for $$l_1$$-regularized convex minimization, Comput. Optim. Appl., 48, 273-307, (2011) · Zbl 1220.90092 [32] Zou, H; Hastie, T, Regularization and variable selection via the elastic net, J. R. Stat. Soc. B, 67, 301-320, (2005) · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.