×

Iteratively reweighted group Lasso based on log-composite regularization. (English) Zbl 1476.62157

Summary: The paper considers supervised learning problems of labeled data with grouped input features. The groups are nonoverlapped such that the model coefficients corresponding to the input features form disjoint groups. The coefficients have group sparsity structure in the sense that coefficients corresponding to each group shall be simultaneously either zero or nonzero. To make effective use of such group sparsity structure given a priori, we introduce a novel log-composite regularizer, which can be minimized by an iterative algorithm. In particular, our algorithm iteratively solves for a traditional group least absolute shrinkage and selection operator (Lasso) problem that involves summing up the \(l_2\) norm of each group until convergent. By updating group weights, our approach enforces a group of smaller coefficients from the previous iterate to be more likely to set to zero compared to the group Lasso. Theoretical results include a minimizing property of the proposed model as well as the convergence of the iterative algorithm to a stationary solution under mild conditions. We conduct extensive experiments on synthetic and real datasets, indicating that our method yields a performance that is superior to that of the state-of-the-art methods in linear regression and binary classification.

MSC:

62J07 Ridge regression; shrinkage estimators (Lasso)
62-08 Computational methods for problems pertaining to statistics
65K05 Numerical mathematical programming methods
90C26 Nonconvex programming, global optimization
92C60 Medical epidemiology
92D10 Genetics and epigenetics
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] M. Ahn, Consistency bounds and support recovery of d-stationary solutions of sparse sample average approximations, J. Global Optim., 78 (2020), pp. 397-422. · Zbl 1465.90067
[2] M. Ahn, J.-S. Pang, and J. Xin, Difference-of-convex learning: Directional stationarity, optimality, and sparsity, SIAM J. Optim., 27 (2017), pp. 1637-1665, https://doi.org/10.1137/16M1084754. · Zbl 1369.90130
[3] S. Bakin, Adaptive Regression and Model Selection in Data Mining Problems, Ph.D. thesis, The Australian National University, Canberra, Australia, 1999.
[4] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends Mach. Learn., 3 (2011), pp. 1-122. · Zbl 1229.90122
[5] P. Breheny and J. Huang, Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors, Stat. Comput., 25 (2015), pp. 173-187. · Zbl 1331.62359
[6] P. Breheny and J. Huang, High-Dimensional Regression Modeling: Methodology, Applications, and Software, Texts Stat. Sci., Chapman & Hall/CRC, Boca Raton, FL, 2019.
[7] E. J. Candès, J. Romberg, and T. Tao, Stable signal recovery from incomplete and inaccurate measurements, Comm. Pure Appl. Math., 59 (2006), pp. 1207-1223. · Zbl 1098.94009
[8] E. J. Candès and T. Tao, Near-optimal signal recovery from random projections: Universal encoding strategies?, IEEE Trans. Inform. Theory, 52 (2006), pp. 5406-5425. · Zbl 1309.94033
[9] E. J. Candès and T. Tao, The Dantzig selector: Statistical estimation when p is much larger than n, Ann. Statist., 35 (2007), pp. 2313-2351. · Zbl 1139.62019
[10] E. J. Candès, M. B. Wakin, and S. Boyd, Enhancing sparsity by reweighted \(l_1\) minimization, J. Fourier Anal. Appl., 14 (2008), pp. 877-905. · Zbl 1176.94014
[11] S. S. Chen, D. L. Donoho, and M. A. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci. Comput., 20 (1998), pp. 33-61, https://doi.org/10.1137/S1064827596304010. · Zbl 0919.94002
[12] A. P. Chiang, J. S. Beck, H.-J. Yen, M. K. Tayeh, T. E. Scheetz, R. E. Swiderski, D. Y. Nishimura, T. A. Braun, K.-Y. A. Kim, J. Huang, K. Elbedour, R. Carmi, D. C. Slusarski, T. L. Casavant, E. M. Stone, and V. C. Sheffield, Homozygosity mapping with SNP arrays identifies TRIM32, an E3 ubiquitin ligase, as a Bardet-Biedl syndrome gene (BBS11), Proc. Natl. Acad. Sci. USA, 103 (2006), pp. 6287-6292.
[13] F. E. Curtis, Y. Dai, and D. P. Robinson, A Subspace Acceleration Method for Minimization Involving a Group Sparsity-Inducing Regularizer, preprint, https://arxiv.org/abs/2007.14951, 2020.
[14] W. Deng, W. Yin, and Y. Zhang, Group sparse optimization by alternating direction method, in Proceedings Volume 8858: Wavelets and Sparsity XV, SPIE, Bellingham, WA, 2013, pp. 242-256.
[15] D. L. Donoho, Compressed sensing, IEEE Trans. Inform. Theory, 52 (2006), pp. 1289-1306. · Zbl 1288.94016
[16] D. L. Donoho and X. Huo, Uncertainty principles and ideal atomic decomposition, IEEE Trans. Inform. Theory, 47 (2001), pp. 2845-2862. · Zbl 1019.94503
[17] J. Fan and R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc., 96 (2001), pp. 1348-1360. · Zbl 1073.62547
[18] E. Forsythe and P. L. Beales, Bardet-Biedl syndrome, European J. Human Genetics, 21 (2013), pp. 8-13.
[19] F. E. Harrell, Jr., P. A. Margolis, S. Gove, K. E. Mason, E. K. Mulholland, D. Lehmann, L. Muhe, S. Gatchalian, and H. F. Eichenwald, Development of a clinical prediction model for an ordinal outcome: The World Health Organization multicentre study of clinical signs and etiological agents of pneumonia, sepsis and meningitis in young infants, Statistics in Medicine, 17 (1998), pp. 909-944.
[20] T. Hastie and R. Tibshirani, Generalized Additive Models, Monogr. Statist. Appl. Probab. 43, Chapman & Hall/CRC, London, 1990. · Zbl 0747.62061
[21] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed., Springer Ser. Statist., Springer, New York, 2009. · Zbl 1273.62005
[22] T. Hastie, R. Tibshirani, and M. Wainwright, Statistical Learning with Sparsity: The Lasso and Generalizations, Monogr. Statist. Appl. Probab. 143, CRC Press, Boca Raton, FL, 2015. · Zbl 1319.68003
[23] J. Huang, P. Breheny, and S. Ma, A selective review of group selection in high-dimensional models, Statist. Sci., 27 (2012), pp. 481-499. · Zbl 1331.62347
[24] D. R. Hunter and K. Lange, A tutorial on MM algorithms, Amer. Statist., 58 (2004), pp. 30-37.
[25] D. T. Jamison, J. G. Breman, A. R. Measham, G. Alleyne, M. Claeson, D. B. Evans, P. Jha, A. Mills, and P. Musgrove, Disease Control Priorities in Developing Countries, The World Bank, Oxford University Press, New York, 2006.
[26] T. Kronvall and A. Jakobsson, Hyperparameter selection for group-sparse regression: A probabilistic approach, Signal Process., 151 (2018), pp. 107-118.
[27] K. Lange, MM Optimization Algorithms, SIAM, Philadelphia, 2016, https://doi.org/10.1137/1.9781611974409. · Zbl 1357.90002
[28] F. Lauer and H. Ohlsson, Finding sparse solutions of systems of polynomial equations via group-sparsity optimization, J. Global Optim., 62 (2015), pp. 319-349. · Zbl 1347.90078
[29] X. Liao, H. Li, and L. Carin, Generalized alternating projection for weighted-\({\ell_{2,1}}\) minimization with applications to model-based compressive sensing, SIAM J. Imaging Sci., 7 (2014), pp. 797-823, https://doi.org/10.1137/130936658. · Zbl 1296.65088
[30] Y. Lou and M. Yan, Fast L1-L2 minimization via a proximal operator, J. Sci. Comput., 74 (2018), pp. 767-785. · Zbl 1390.90447
[31] Y. Lou, P. Yin, Q. He, and J. Xin, Computing sparse representation in a highly coherent dictionary based on difference of \({L_1}\) and \({L_2 }\), J. Sci. Comput., 64 (2015), pp. 178-196. · Zbl 1327.65111
[32] L. Meier, S. Van De Geer, and P. Bühlmann, The group Lasso for logistic regression, J. R. Stat. Soc. Ser. B Stat. Methodol., 70 (2008), pp. 53-71. · Zbl 1400.62276
[33] M. Nikolova, Local strong homogeneity of a regularized estimator, SIAM J. Appl. Math., 61 (2000), pp. 633-658, https://doi.org/10.1137/S0036139997327794. · Zbl 0991.94015
[34] J. S. Pang, M. Razaviyayn, and A. Alvarado, Computing b-stationary points of nonsmooth DC programs, Math. Oper. Res., 42 (2017), pp. 95-118. · Zbl 1359.90106
[35] M. Y. Park and T. Hastie, \(L_1\)-regularization path algorithm for generalized linear models, J. R. Stat. Soc. Ser. B Stat. Methodol., 69 (2007), pp. 659-677. · Zbl 07555370
[36] D. N. Phan and H. A. Le-Thi, Group variable selection via \(\ell_{p, 0}\) regularization and application to optimal scoring, Neural Networks, 118 (2019), pp. 220-234. · Zbl 1441.62073
[37] Z. Qin and D. Goldfarb, Structured sparsity via alternating direction methods, J. Mach. Learn. Res., 13 (2012), pp. 1435-1468. · Zbl 1303.68108
[38] Y. Rahimi, C. Wang, H. Dong, and Y. Lou, A scale-invariant approach for sparse signal recovery, SIAM J. Sci. Comput., 41 (2019), pp. A3649-A3672, https://doi.org/10.1137/18M123147X. · Zbl 1427.94050
[39] A. Rakotomamonjy, Surveying and comparing simultaneous sparse approximation (or group-lasso) algorithms, Signal Process., 91 (2011), pp. 1505-1526. · Zbl 1213.94044
[40] M. Razaviyayn, M. Hong, and Z.-Q. Luo, A unified convergence analysis of block successive minimization methods for nonsmooth optimization, SIAM J. Optim., 23 (2013), pp. 1126-1153, https://doi.org/10.1137/120891009. · Zbl 1273.90123
[41] I. Rudan, C. Boschi-Pinto, Z. Biloglav, K. Mulholland, and H. Campbell, Epidemiology and etiology of childhood pneumonia, Bull. World Health Organization, 86 (2008), pp. 408-416.
[42] T. E. Scheetz, K.-Y. A. Kim, R. E. Swiderski, A. R. Philp, T. A. Braun, K. L. Knudtson, A. M. Dorrance, G. F. DiBona, J. Huang, T. L. Casavant, V. C. Sheffield, and E. M. Stone, Regulation of gene expression in the mammalian eye and its relevance to eye disease, Proc. Natl. Acad. Sci. USA, 103 (2006), pp. 14429-14434.
[43] S. Shin, J. Fine, and Y. Liu, Adaptive estimation with partially overlapping models, Statistica Sinica, 26 (2016), pp. 235-253. · Zbl 1419.62044
[44] S. Shin, Y. Liu, S. R. Cole, and J. P. Fine, Ensemble estimation and variable selection with semiparametric regression models, Biometrika, 107 (2020), pp. 433-448. · Zbl 1441.62103
[45] R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B Stat. Methodol., 58 (1996), pp. 267-288. · Zbl 0850.62538
[46] C. Wang, M. Yan, Y. Rahimi, and Y. Lou, Accelerated schemes for the \({L}_1/{L}_2\) minimization, IEEE Trans. Signal Process., 68 (2020), pp. 2660-2669. · Zbl 07590919
[47] L. Wang, G. Chen, and H. Li, Group SCAD regression analysis for microarray time course gene expression data, Bioinformatics, 23 (2007), pp. 1486-1494.
[48] Y. Wang, W. Yin, and J. Zeng, Global convergence of ADMM in nonconvex nonsmooth optimization, J. Sci. Comput., 78 (2019), pp. 29-63. · Zbl 1462.65072
[49] D. Wipf and S. Nagarajan, Iterative reweighted \(\ell_1\) and \(\ell_2\) methods for finding sparse solutions, IEEE J. Sel. Topics Signal Process., 4 (2010), pp. 317-329.
[50] Y. Wu and Y. Liu, Variable selection in quantile regression, Statistica Sinica, 19 (2009), pp. 801-817. · Zbl 1166.62012
[51] Y. Yang and H. Zou, A fast unified algorithm for solving group-lasso penalize learning problems, Stat. Comput., 25 (2015), pp. 1129-1141. · Zbl 1331.62343
[52] P. Yin, E. Esser, and J. Xin, Ratio and difference of \(l_1\) and \(l_2\) norms and sparse representation with coherent dictionaries, Commun. Inform. Syst., 14 (2014), pp. 87-109. · Zbl 1429.94037
[53] M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Ser. B Stat. Methodol., 68 (2007), pp. 49-67. · Zbl 1141.62030
[54] C. Zhang, Nearly unbiased variable selection under minimax concave penalty, Ann. Statist., 38 (2010), pp. 894-942. · Zbl 1183.62120
[55] S. Zhang and J. Xin, Minimization of transformed \({L_1 }\) penalty: Theory, difference of convex function algorithm, and robust application in compressed sensing, Math. Program., 169 (2018), pp. 307-336. · Zbl 1386.94049
[56] Y. Zhang, N. Zhang, D. Sun, and K.-C. Toh, An efficient Hessian based algorithm for solving large-scale sparse group Lasso problems, Math. Program., 179 (2020), pp. 223-263. · Zbl 1435.90112
[57] Z. Zhao, S. Wang, C. Sun, R. Yan, and X. Chen, Sparse multiperiod group lasso for bearing multifault diagnosis, IEEE Trans. Instrumentation Measurement, 69 (2019), pp. 419-431.
[58] Z. Zhao, S. Wu, B. Qiao, S. Wang, and X. Chen, Enhanced sparse period-group lasso for bearing fault diagnosis, IEEE Trans. Industrial Electron., 66 (2018), pp. 2143-2153.
[59] H. Zou, The adaptive lasso and its oracle properties, J. Amer. Statist. Assoc., 101 (2006), pp. 1418-1429. · Zbl 1171.62326
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.