Modular proximal optimization for multidimensional total-variation regularization.

*(English)*Zbl 1411.90314Summary: We study TV regularization, a widely used technique for eliciting structured sparsity. In particular, we propose efficient algorithms for computing prox-operators for \(\ell_p\)-norm TV. The most important among these is \(\ell_1\)-norm TV, for whose prox-operator we present a new geometric analysis which unveils a hitherto unknown connection to taut-string methods. This connection turns out to be remarkably useful as it shows how our geometry guided implementation results in efficient weighted and unweighted 1D-TV solvers, surpassing state-of-the-art methods. Our 1D-TV solvers provide the backbone for building more complex (two or higher-dimensional) TV solvers within a modular proximal optimization approach. We review the literature for an array of methods exploiting this strategy, and illustrate the benefits of our modular design through extensive suite of experiments on (i) image denoising, (ii) image deconvolution, (iii) four variants of fused-lasso, and (iv) video denoising. To underscore our claims and permit easy reproducibility, we provide all the reviewed and our new TV solvers in an easy to use multi-threaded C++, Matlab and Python library.

##### MSC:

90C30 | Nonlinear programming |

##### Keywords:

proximal optimization; total variation; regularized learning; sparsity; non-smooth optimization##### References:

[1] | M. V. Afonso, J. M. Bioucas-Dias, and M. A. T. Figueiredo. Fast image recovery using variable splitting and constrained optimization. IEEE Transactions on Image Processing, 19(9), September 2010. · Zbl 1371.94018 |

[2] | C. M. Alaız, ´A. Barbero, and J. R. Dorronsoro.Group fused lasso.Artificial Neural Networks and Machine Learning–ICANN 2013, page 66, 2013. · Zbl 1341.68302 |

[3] | E. Anderson et al. LAPACK Users’ Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, third edition, 1999. ISBN 0-89871-447-8 (paperback). |

[4] | F. Bach. Structured sparsity-inducing norms through submodular functions. In NIPS, 2010. |

[5] | Bach, Francis Learning with Submodular Functions: A Convex Optimization Perspective arXiv preprint arXiv:1111.6453 |

[6] | F. Bach, R. Jenatton, J. Mairal, and G. Obozinski. Convex optimization with sparsityinducing norms.In S. Sra, S. Nowozin, and S. J. Wright, editors, Optimization for Machine Learning. MIT Press, 2011. · Zbl 1280.68179 |

[7] | A. Barbero, J. L´´opez, and J. R. Dorronsoro. Finding Optimal Model Parameters by Discrete Grid Search. In Advances in Soft Computing: Innovations in Hybrid Intelligent Systems 44, pages 120–127. Springer, 2008. |

[8] | Barbero, A., Sra, S.Fast Newton-type methods for total variation regularization.In Proceedings of the 28th International Conference on Machine Learning (ICML-11) (pp. 313-320). |

[9] | A. Barbero, J. L´´opez, and J. R. Dorronsoro. Finding Optimal Model Parameters by Deterministic and Annealed Focused Grid Search. Neurocomputing, 72(13-15):2824–2832, 2009. ISSN 0925-2312. doi: DOI:10.1016/j.neucom.2008.09.024. |

[10] | Barlow, R. E., Bartholomew, D. J., Bremner, J. M., Brunk, H. D. Statistical inference under order restrictions: The theory and application of isotonic regression New York: Wiley, 1972 · Zbl 0246.62038 |

[11] | H. H. Bauschke and P. L. Combettes. Convex analysis and monotone operator theory in Hilbert spaces. CMS Books in Mathematics. Springer, 2011. · Zbl 1218.47001 |

[12] | Heinz H. Bauschke, Patrick L. Combettes, D. Russell Luke Finding best approximation pairs relative to two closed convex sets in Hilbert spaces Journal of Approximation Theory 127 (2004) 178–192 · Zbl 1050.46021 |

[13] | A. Beck and M. Teboulle. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. SIAM Journal of Imgaging Sciences, 2(1):183–202, 2009. · Zbl 1175.94009 |

[14] | D. P. Bertsekas. Projected newton methods for optimization problems with simple constraints. SIAM Journal on Control and Optimization, 20(2), March 1982. · Zbl 0507.49018 |

[15] | D. P. Bertsekas. Nonlinear Programming. Athena Scientific, 2nd edition, September 1999. · Zbl 1015.90077 |

[16] | J. M. Bioucas-Dias and M. A. T. Figueiredo. A new twist: Two-step iterative shrinkage/thresholding algorithms for image restoration. IEEE Transactions on Image Processing, 16(12):2992–3004, December 2007. |

[17] | J. M. Bioucas-Dias, M. A. T. Figueiredo, and J. P. Oliveira. Total variation-based image deconvolution: A majorization-minimization approach. In ICASSP Proceedings, 2006. · Zbl 1178.94029 |

[18] | BM3D. Bm3d software and test sequences, 2013. URL http://www.cs.tut.fi/ foi/ GCF-BM3D/. 77 |

[19] | R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu. A limited memory algorithm for bound constrained optimization. Technical report, Northwestern University, 1994. · Zbl 0836.65080 |

[20] | E. J. Cand‘es and T. Tao. Near-optimal signal recovery from random projections: universal encoding strategies. IEEE Trans. Info. Theory, 52:5406–5425, 2004. · Zbl 1309.94033 |

[21] | A. Chambolle and J. Darbon. On total variation minimization and surface evolution using parametric maximum flows. International Journal of Computer Vision, 84(3), 2009. · Zbl 1371.94073 |

[22] | A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40(1):120–145, 2011. · Zbl 1255.68217 |

[23] | Chambolle, A., Pock, T. On the ergodic convergence rates of a first-order primal-dual algorithm Mathematical Programming. September 2016, Volume 159, Issue 1, pp 253–287 · Zbl 1350.49035 |

[24] | Chambolle, A., Pock, T. A remark on accelerated block coordinate descent for computing the proximity operators of a sum of convex functions. SMAI Journal of Computational Mathematics, 129-54 |

[25] | V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky. The Convex Geometry of Linear Inverse Problems. Foundations of Computational Mathematics, 12(6), 2012. · Zbl 1280.52008 |

[26] | R. Choksi, Y. van Gennip, and A. Oberman. Anisotropic Total Variation Regularized L1Approximation and Denoising/Deblurring of 2D Bar Codes. Technical report, McGill University, July 2010. · Zbl 1226.49034 |

[27] | P. L. Combettes. Iterative construction of the resolvent of a sum of maximal monotone operators. Journal of Convex Analysis, 16:727–748, 2009. · Zbl 1193.47067 |

[28] | P. L. Combettes and J.-C. Pesquet.Proximal splitting methods in signal processing. arXiv:0912.3522, 2009. · Zbl 1242.90160 |

[29] | L. Condat. A direct algorithm for 1d total variation denoising. Technical report, GREYC laboratory, CNRS-ENSICAEN-Univ. of Caen, 2012. |

[30] | L. Condat. A generic proximal algorithm for convex optimization - application to total variation minimization. IEEE SIGNAL PROC. LETTERS, 21(8):985–989, 2014. |

[31] | L. Condat. Fast projection onto the simplex and the ‘1ball. Mathematical Programming, 158(1-2):575-585, 2016. · Zbl 1347.49050 |

[32] | A. R. Conn, N. I. M. Gould, and P. L. Toint. Trust-Region Methods. SIAM, 2000. |

[33] | J. Dahl, P. C. Hansen, S. H. Jensen, and T. L. Jensen. Algorithms and software for total variation image reconstruction via first-order methods. Numer Algor, 53:67–92, 2010. · Zbl 1181.94009 |

[34] | P. L. Davies and A. Kovac. Local extremes, runs, strings and multiresolution. The Annals of Statistics, 29(1):1–65, 2001. · Zbl 1029.62038 |

[35] | Y. Duan and X.-C. Tai.Domain decomposition methods with graph cuts algorithms for total variation minimization. Adv Comput Math, 36:175–199, 2012. doi: 10.1007/ s10444-011-9213-4. · Zbl 1242.65123 |

[36] | Esedoglu, Selim and Osher, Stanley J. Decomposition of images by the anisotropic RudinOsher-Fatemi model Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 77:(12): 1609–1626, 2014. · Zbl 1083.49029 |

[37] | J. Friedman, T. Hastie, H. H¨ofling, and R. Tibshirani. Pathwise coordinate optimization. Annals of Applied Statistics, 1(2):302–332, Aug. 2007. 78 · Zbl 1378.90064 |

[38] | D. Goldfarb and W. Yin. Parametric maximum flow algorithms for fast total variation minimization. SIAM Journal on Scientific Computing, 31(5):3712–3743, 2009. · Zbl 1198.49040 |

[39] | O. S. Goldstein T. The Split Bregman Method for L1 Regularized Problems. SIAM Journal on Imaging Sciences, 2(2):323–343, 2009. · Zbl 1177.65088 |

[40] | T. R. Golub et al. Molecular classification of cancer. Science, 286(5439):531–537, October 1999. |

[41] | M. Grasmair. The equivalence of the taut string algorithm and bv-regularization. Journal of Mathematical Imaging and Vision, 27(1):59–66, 2007. ISSN 0924-9907. doi: 10.1007/ s10851-006-9796-4. URL http://dx.doi.org/10.1007/s10851-006-9796-4. |

[42] | Z. Harchaoui and C. L´evy-Leduc. Multiple Change-Point Estimation With a Total Variation Penalty. Journal of the American Statistical Association, 105(492):1480–1493, 2010. · Zbl 1388.62211 |

[43] | J. Hua, W. D. Tembe, and E. R. Dougherty. Performance of feature-selection methods in the classification of high-dimension data. Pattern Recognition, 42:409–424, 2009. · Zbl 1181.68240 |

[44] | K. Ito and K. Kunisch. An active set strategy based on the augmented lagrangian formulation for image restoration. ESAIM: Mathematical Modelling and Numerical Analysis, 33 (1):1–21, 1999. URL http://eudml.org/doc/193911. · Zbl 0918.65050 |

[45] | M. Jaggi. Revisiting frank-wolfe: Projection-free sparse convex optimization. In Proceedings of the 30th International Conference on Machine Learning,, 2013. |

[46] | S. Jegelka, F. Bach, and S. Sra. Reflection methods for user-friendly submodular optimization. Advances in Neural Information Processing Systems 2013: 1313–1321. |

[47] | Jegou, H., Douze, M., Schmid, C. Hamming Embedding and Weak geometry consistency for large scale image search Proceedings of the 10th European conference on Computer vision, October, 2008 http://lear.inrialpes.fr/ jegou/data.php#holidays |

[48] | N. A. Johnson. A dynamic programming algorithm for the fused Lasso and l0-segmentation. J. Computational and Graphical Statistics, 2013. |

[49] | D. Kim, S. Sra, and I. Dhillon. A scalable trust-region algorithm with application to mixednorm regression. In International Conference on Machine Learning, 2010. |

[50] | S. Kim, K. Koh, S. Boyd, and D. Gorinevsky. ‘1trend filtering. SIAM Review, 51(2): 339–360, 2009. doi: 10.1137/070690274. |

[51] | K. C. Kiwiel. Variable fixing algorithms for the continuous quadratic knapsack problem. J. Optim. Theory Appl., 136:445–458, 2008. · Zbl 1145.90078 |

[52] | Knuth, Donald E. The art of computer programming, volume 1: fundamental algorithms. CA, USA: Addison Wesley Longman Publishing Co., Inc · Zbl 0895.68055 |

[53] | M. Kolar, L. Song, A. Ahmed, and E. Xing. Estimaging time-varying networks. The Annals of Applied Statistics, 4(1):94–123, 2010. · Zbl 1189.62142 |

[54] | Kolmogorov, V., Pock, T., Rolinek, M. Total variation on a tree SIAM J. Imaging Sci., 9(2), 605–636. · Zbl 1367.68331 |

[55] | D. Krishnan and R. Fergus. Fast image deconvolution using hyper-laplacian priors. In Advances in Neural Information Processing Systems, 2009. |

[56] | Kumar, K.S., Barbero, A., Jegelka, S., Sra, S., and Bach, F. Convex optimization for parallel energy minimization. arXiv preprint arXiv:1503.01563. |

[57] | S. R. Land and J. H. Friedman. Variable fusion: A new adaptive signal regression method. Technical Report 656, Department of Statistics, Carnegie Mellon University Pittsburgh, 79 1997. |

[58] | Y. Li and F. Santosa. A computational algorithm for minimizing total variation in image restoration. IEEE Transactions on Image Processing, 5(6):987–995, 1996. URL http: //dblp.uni-trier.de/db/journals/tip/tip5.html#LiS96. |

[59] | C.-J. Lin and J. J. Mor´e. Newton’s method for large bound-constrained optimization problems. SIAM Journal on Optimization, 9(4):1100–1127, 1999. · Zbl 0957.65064 |

[60] | H. Liu and J. Zhang. Estimation Consistency of the Group Lasso and its Applications. In Int. Conf. Mach. Learning (ICML), 2009. |

[61] | J. Liu and J. Ye. Efficient Euclidean projections in linear time. In ICML, Jun. 2009. |

[62] | J. Liu, S. Ji, and J. Ye. SLEP: Sparse Learning with Efficient Projections. Arizona State University, 2009. http://www.public.asu.edu/ jye02/Software/SLEP. |

[63] | J. Liu, L. Yuan, and J. Ye. An efficient algorithm for a class of fused lasso problems. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2010. |

[64] | J. Mairal, R. Jenatton, G. Obozinski, and F. Bach. Network Flow Algorithms for Structured Sparsity. In NIPS, 2010. To appear. · Zbl 1280.68179 |

[65] | B. Martinet. R´egularisation d’in´equations variationnelles par approximations successives. Mod´elisation Math´ematique et Analyse Num´erique, 4(R3):154–158, 1970. |

[66] | L. Meier, S. van de Geer, and P. B¨uhlmann. The group lasso for logistic regression. J. R. Statist. Soc., 70:53–71, 2008. · Zbl 1400.62276 |

[67] | J. J. Mor´e and D. C. Sorensen. Computing a trust region step. SIAM Journal of Scientific Computing, 4(3), September 1983. |

[68] | J. J. Moreau. Fonctions convexes duales et points proximaux dans un espace hilbertien. C. R. Acad. Sci. Paris S´er. A Math., 255:2897–2899, 1962. · Zbl 0118.10502 |

[69] | Y. Nesterov. Gradient methods for minimizing composite objective function. Technical Report 76, Catholic University of Louvain, CORE, 2007. |

[70] | J. Nocedal and S. J. Wright. Numerical Optimization. Springer Verlag, 2000. · Zbl 0930.65067 |

[71] | N. Parikh, S. Boyd, et al. Proximal algorithms. Foundations and Trends in Optimization,R 1(3):127–239, 2014. |

[72] | G. Pierra. Decomposition through formalization in a product space. Mathematical Programming, 28(1):96–115, 1984. · Zbl 0523.49022 |

[73] | C. Pontow and O. Scherzer. A derivative free approach for total variation regularization. arXiv:0911.1293, 2009. URL http://arxiv.org/abs/0911.1293. · Zbl 1250.65045 |

[74] | A. Ramdas and R. J. Tibshirani. Fast and flexible admm algorithms for trend filtering. arXiv:1406.2082, 2014. |

[75] | F. Rapaport and E. B. J.-P. Vert. Classification of arrayCGH data using fused SVM. Bioinformatics, 24(13):i375–i382, 2008. |

[76] | A. Rinaldo. Properties and refinements of the fused lasso. Annals of Statistics, 37(5B): 2922–2952, 2009. · Zbl 1173.62027 |

[77] | R. T. Rockafellar. Monotone operators and hte proximal point algorithm. SIAM J. Control and Opt., 14(5):877–898, 1976. · Zbl 0358.90053 |

[78] | S. Rogers, M. Girolami, C. Campbell, and R. Breitling. The latent process decomposition of cdna microarray data sets. IEEE/ACM Trans. Comp. Bio. and Bioinformatics, 2(2), April-June 2005. 80 |

[79] | L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D, 60:259–268, 1992. · Zbl 0780.49028 |

[80] | Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L. ImageNet Large Scale Visual Recognition Challenge International Journal of Computer Vision (IJCV), Year 2015, Volume 115, Number 3, pages 211-252http://image-net.org/challenges/LSVRC/ 2010/download-public |

[81] | S. Salzo and S. Villa. Inexact and accelerated proximal point algorithms. J. Convex Analysis, 19(4), 2012. · Zbl 1283.90030 |

[82] | M. Schmidt, N. L. Roux, and F. Bach. Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization. In Advances in Neural Information Processing Systems (NIPS), 2011. |

[83] | S. Sra. Scalable nonconvex inexact proximal splitting. In Advances in Neural Information Processing Systems, 2012. |

[84] | S. Sra, S. Nowozin, and S. Wright, editors. Optimization for machine learning. MIT Press, 2011. |

[85] | G. Steidl, S. Didas, and J. Neumann. Relations between higher order tv regularization and support vector regression. In Scale-Space, pages 515–527, 2005. · Zbl 1119.68507 |

[86] | G. Steidl and T. Teuber Anisotropic smoothing using double orientations. In International Conference on Scale Space and Variational Methods in Computer Vision, pages 477–489, 2009. Springer, Berlin, Heidelberg. · Zbl 1371.94352 |

[87] | N. Stransky et al.Regional copy number-independent deregulation of transcription in cancer. Nature Genetics, 38(12):1386–1396, December 2006. |

[88] | R. Tibshirani. Regression shrinkage and selection via the lasso. J. R. Statist. Soc., 58(1): 267–288, 1996. · Zbl 0850.62538 |

[89] | R. Tibshirani and P. Wang. Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics, 9(1):18–29, 2008. · Zbl 1274.62886 |

[90] | R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight. Sparsity and smoothness via the fused lasso. J. Royal Stat. Soc.: Series B, 67(1):91–108, 2005. · Zbl 1060.62049 |

[91] | R. J. Tibshirani. Adaptive piecewise polynomial estimation via trend filtering. The Annals of Statistics, 42(1):285–323, 02 2014. doi: 10.1214/13-AOS1189. · Zbl 1307.62118 |

[92] | U. Alon et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA, 96:6745–6750, June 1999. |

[93] | J.-P. Vert and K. Bleakley. Fast detection of multiple change-points shared by many signals using group LARS. In Advances in Neural Information Processing Systems, 2010. |

[94] | C. R. Vogel and M. E. Oman. Iterative methods for total variation denoising. SIAM Journal on Scientific Computing, 17(1):227–238, 1996. · Zbl 0847.65083 |

[95] | B. Wahlberg, S. Boyd, M. Annergren, and Y. Wang. An ADMM Algorithm for a Class of Total Variation Regularized Estimation Problems. In Proceedings 16th IFAC Symposium on System Identification, volume 16, 2012. |

[96] | J. Wang and Q. Li and S. Yang and W. Fan and P. Wonka and J. Ye. A Highly Scalable Parallel Algorithm for Isotropic Total Variation Models In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 235-243, 2014. 81 |

[97] | S. J. Wright, R. D. Nowak, and M. A. T. Figueiredo. Sparse reconstruction by separable approximation. IEEE Trans. Sig. Proc., 57(7):2479–2493, 2009. · Zbl 1391.94442 |

[98] | M. Wytock, S. Sra, and J. Z. Kolter. Fast Newton Methods for the Group Fused Lasso. In Conference on Uncertainty in Artificial Intelligence, 2014. |

[99] | S. Yang, J. Wang, W. Fan, X. Zhang, P. Wonka, and J. Ye. An Efficient ADMM Algorithm for Multidimensional Anisotropic Total Variation Regularization Problems. In ACM Knowledge Discovery and Data Mining (KDD), Chicago, Illinois, USA, August 2013. |

[100] | Y. Yu. On decomposing the proximal map. In Advances in Neural Information Processing Systems, 2013. |

[101] | M. Yuan and Y. Lin. Model Selection and Estimation in Regression with Grouped Variables. J. R. Statist. Soc. B, 68(1):49–67, 2006. · Zbl 1141.62030 |

[102] | M. Zhu and T. Chan. An efficient primal-dual hybrid gradient algorithm for total variation image restoration. Technical report, UCLA CAM, 2008. |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.