×

Pathwise coordinate optimization. (English) Zbl 1378.90064

Summary: We consider “one-at-a-time” coordinate-wise descent algorithms for a class of convex optimization problems. An algorithm of this kind has been proposed for the \(L_{1}\)-penalized regression (lasso) in the literature, but it seems to have been largely ignored. Indeed, it seems that coordinate-wise algorithms are not often used in convex optimization. We show that this algorithm is very competitive with the well-known LARS (or homotopy) procedure in large lasso problems, and that it can be applied to related methods such as the garotte and elastic net. It turns out that coordinate-wise descent does not work in the “fused lasso,” however, so we derive a generalized algorithm that yields the solution in much less time that a standard convex optimizer. Finally, we generalize the procedure to the two-dimensional fused lasso, and demonstrate its performance on some image smoothing problems.

MSC:

90C25 Convex programming
62J07 Ridge regression; shrinkage estimators (Lasso)

Software:

SQOPT; PDCO

References:

[1] Bertsekas, D. (1999). Nonlinear Programming . Athena Scientific. · Zbl 1015.90077
[2] Breiman, L. (1995). Better subset selection using the nonnegative garrote. Technometrics 37 738-754. · Zbl 0862.62059 · doi:10.2307/1269730
[3] Chen, S. S., Donoho, D. L. and Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 33-61. · Zbl 0919.94002 · doi:10.1137/S1064827596304010
[4] Daubechies, I., Defrise, M. and De Mol, C. (2004). An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm. Pure Appl. Math. 57 1413-1457. · Zbl 1077.65055 · doi:10.1002/cpa.20042
[5] Donoho, D. and Johnstone, I. (1995). Adapting to unknown smoothness via wavelet shrinkage. J. Amer. Statist. Assoc. 90 1200-1224. · Zbl 0869.62024 · doi:10.2307/2291512
[6] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407-499. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[7] Friedlander, M. and Saunders, M. (2007). Discussion of “Dantzig selector” by E. Candes and T. Tao. Ann. Statist. 35 2385-2391. · doi:10.1214/009053607000000479
[8] Fu, W. J. (1998). Penalized regressions: The bridge versus the lasso. J. Comput. Graph. Statist. 7 397-416.
[9] Gill, P., Murray, W. and Saunders, M. (1999). Users guide for sqopt 5.3: A fortran package for large-scale linear and quadratic programming. Technical report, Stanford Univ.
[10] Li, Y. and Arce, G. (2004). A maximum likelihood approach to least absolute deviation regression. URASIP J. Appl. Signal Processing 2004 1762-1769. · doi:10.1155/S1110865704401139
[11] Osborne, M., Presnell, B. and Turlach, B. (2000). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389-404. · Zbl 0962.65036 · doi:10.1093/imanum/20.3.389
[12] Owen, A. (2006). A robust hybrid of lasso and ridge regression. Technical report, Stanford Univ.
[13] Rudin, L. I., Osher, S. and Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms. Phys. D 60 259-268. · Zbl 0780.49028 · doi:10.1016/0167-2789(92)90242-F
[14] Schlegel, P. (1970). The explicit inverse of a tridiagonal matrix. Math. Comput. 24 665-665. · Zbl 0215.55404 · doi:10.2307/2004842
[15] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538
[16] Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. Roy. Statist. Soc. Ser. B 67 91-108. · Zbl 1060.62049 · doi:10.1111/j.1467-9868.2005.00490.x
[17] Tibshirani, R. and Wang, P. (2007). Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics . Advance Access published May 18, 2007. · Zbl 1274.62886
[18] Tseng, P. (1988). Coordinate ascent for maximizing nondifferentiable concave functions. Technical Report LIDS-P, 1840, Massachusetts Institute of Technology, Laboratory for Information and Decision Systems.
[19] Tseng, P. (2001). Convergence of block coordinate descent method for nondifferentiable maximation. J. Opt. Theory Appl. 109 474-494. · Zbl 1006.65062 · doi:10.1023/A:1017501703105
[20] Van der Kooij, A. (2007). Prediction accuracy and stability of regresssion with optimal scaling transformations. Technical report, Dept. Data Theory, Leiden Univ.
[21] Wang, H., Li, G. and Jiang, G. (2006). Robust regression shrinkage and consistent variable selection via the lad-lasso. J. Business Econom. Statist. 11 1-6.
[22] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. Roy. Statist. Soc. Ser. B 68 49-67. · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x
[23] Zhou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. Roy. Statist. Soc. Ser. B 67 301-320. · Zbl 1069.62054 · doi:10.1111/j.1467-9868.2005.00503.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.