Efficient nonconvex sparse group feature selection via continuous and discrete optimization. (English) Zbl 1343.68210

Summary: Sparse feature selection has proven to be effective in analyzing high-dimensional data. While promising, most existing works apply convex methods, which may be suboptimal in terms of the accuracy of feature selection and parameter estimation. In this paper, we consider both continuous and discrete nonconvex paradigms to sparse group feature selection, which are motivated by applications that require identifying the underlying group structure and performing feature selection simultaneously. The main contribution of this article is twofold: (1) computationally, we develop efficient optimization algorithms for both continuous and discrete formulations, of which the key step is a projection with two coupled constraints; (2) statistically, we show that the proposed continuous model reconstructs the oracle estimator. Therefore, consistent feature selection and parameter estimation are achieved simultaneously. Numerical results on synthetic and real-world data suggest that the proposed nonconvex methods compare favorably against their competitors, thus achieving desired goal of delivering high performance.


68T05 Learning and adaptive systems in artificial intelligence
90C26 Nonconvex programming, global optimization
92C50 Medical applications (general)
Full Text: DOI


[1] Bach, Francis; Jenatton, Rodolphe; Mairal, Julien; Obozinski, Guillaume, Convex Optimization with Sparsity-Inducing Norms (2010) · Zbl 06064248
[2] Barzilai, Jonathan; Borwein, Jonathan M., Two-point step size gradient methods, IMA J. Numer. Anal., 8, 1, 141-148 (1988) · Zbl 0638.65055
[3] Beck, A.; Teboulle, M., A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., 2, 1, 183-202 (2009) · Zbl 1175.94009
[4] Bickel, P. J.; Ritov, Y.; Tsybakov, A. B., Simultaneous analysis of Lasso and Dantzig selector, Ann. Stat., 37, 4, 1705-1732 (2009) · Zbl 1173.62022
[5] Blumensath, Thomas; Davies, Mike E., Iterative hard thresholding for compressed sensing, Appl. Comput. Harmon. Anal., 27, 3, 265-274 (2009) · Zbl 1174.94008
[6] Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J., Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers (2011), Now Publishers
[7] Boyd, S.; Vandenberghe, L., Convex Optimization (2004), Cambridge University Press · Zbl 1058.90049
[8] Breheny, P.; Huang, J., Penalized methods for bi-level variable selection, Stat. Interface, 2, 3, 369-380 (2009) · Zbl 1245.62034
[9] Brucker, Peter, An \(O(n)\) algorithm for quadratic knapsack problems, Oper. Res. Lett., 3, 3, 163-166 (1984) · Zbl 0544.90086
[10] Candes, Emmanuel J.; Tao, Terence, Decoding by linear programming, IEEE Trans. Inf. Theory, 51, 12, 4203-4215 (2005) · Zbl 1264.94121
[11] Combettes, P. L.; Pesquet, J. C., Proximal splitting methods in signal processing, (Fixed-Point Algorithms for Inverse Problems in Science and Engineering (2010))
[12] Donoho, D. L., De-noising by soft-thresholding, IEEE Trans. Inf. Theory, 41, 3, 613-627 (2002) · Zbl 0820.62002
[13] Duchi, J.; Shalev-Shwartz, S.; Singer, Y.; Chandra, T., Efficient projections onto the \(\ell_1\)-ball for learning in high dimensions, (Proceedings of the 25th International Conference on Machine Learning (2008), ACM), 272-279
[14] Fan, J.; Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties, J. Am. Stat. Assoc., 96, 456, 1348-1360 (2001) · Zbl 1073.62547
[15] Foucart, Simon, Sparse recovery algorithms: sufficient conditions in terms of restricted isometry constants, (Approximation Theory XIII: San Antonio 2010 (2012), Springer), 65-77 · Zbl 1250.65057
[16] Frank, A.; Asuncion, A., (UCI Machine Learning Repository (2010))
[17] Friedman, J.; Hastie, T.; Tibshirani, R., A note on the group lasso and a sparse group lasso (2010), arXiv preprint
[18] Gong, Pinghua; Zhang, Changshui; Lu, Zhaosong; Huang, Jianhua; Ye, Jieping, A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems, (The 30th International Conference on Machine Learning. The 30th International Conference on Machine Learning, ICML (2013)), 37-45
[20] Huang, J.; Ma, S.; Xie, H.; Zhang, C. H., A group bridge approach for variable selection, Biometrika, 96, 2, 339-355 (2009) · Zbl 1163.62050
[21] Huang, J.; Zhang, T., The benefit of group sparsity, Ann. Stat., 38, 4, 1978-2004 (2010) · Zbl 1202.62052
[22] Huang, Jian; Breheny, Patrick; Ma, Shuangge, A selective review of group selection in high-dimensional models, Stat. Sci., 27, 4 (2012) · Zbl 1331.62347
[23] Jenatton, Rodolphe; Mairal, Julien; Obozinski, Guillaume; Bach, Francis, Proximal methods for hierarchical sparse coding, J. Mach. Learn. Res., 12, 2297-2334 (2011) · Zbl 1280.94029
[24] Kolmogorov, A. N.; Tikhomirov, V. M., e-Entropy and e-capacity of sets in functional spaces, Transl. Am. Math. Soc. (1961) · Zbl 0133.06703
[25] Kolmogorov, Andrei Nikolaevich; Tikhomirov, Vladimir Mikhailovich, \(ε\)-Entropy and \(ε\)-capacity of sets in function spaces, Usp. Mat. Nauk, 14, 2, 3-86 (1959)
[26] Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford; Cormen, Thomas H., Introduction to Algorithms (2001), The MIT Press · Zbl 1047.68161
[27] Liu, J.; Ji, S.; Ye, J., SLEP: Sparse Learning with Efficient Projections (2009), Arizona State University
[28] Lu, Zhaosong, Iterative hard thresholding methods for \(\ell_0\) regularized convex cone programming (2013), arXiv preprint · Zbl 1308.65094
[29] Mazumder, R.; Friedman, J. H.; Hastie, T., Sparsenet: coordinate descent with nonconvex penalties, J. Am. Stat. Assoc., 106, 495, 1125-1138 (2011) · Zbl 1229.62091
[30] Meier, L.; Van De Geer, S.; Bühlmann, P., The group lasso for logistic regression, J. R. Stat. Soc., Ser. B, Stat. Methodol., 70, 1, 53-71 (2008) · Zbl 1400.62276
[31] Natarajan, B. K., Sparse approximation solutions to linear systems, SIAM J. Comput., 24, 2, 227-234 (1995) · Zbl 0827.68054
[32] Nesterov, Y., Gradient methods for minimizing composite objective function (2007), CORE discussion papers
[33] Nocedal, Jorge; Wright, Stephen J., Numerical Optimization (2000), Springer · Zbl 0930.65067
[34] Schmidt, M.; Le Roux, N.; Bach, F., Convergence rates of inexact proximal-gradient methods for convex optimization, (NIPS’11—25th Annual Conference on Neural Information Processing Systems (2011))
[35] Shen, X.; Pan, W.; Zhu, Y., Likelihood-based selection and sharp parameter estimation, J. Am. Stat. Assoc., 107, 223-232 (2012) · Zbl 1261.62020
[36] Shen, X.; Pan, W.; Zhu, Y.; Zhou, H., On constrained and regularized high-dimensional regression, Ann. Inst. Stat. Math., 1, 1-26 (2013)
[37] Su, H.; Yu, W.; Li, F. F., Efficient Euclidean projections onto the intersection of norm balls, (Proceedings of the 29th International Conference on Machine Learning, vol. 951 (2012), International Machine Learning Society), 12
[38] Tao, P. D.; An, L. T.H., Convex analysis approach to dc programming: theory, algorithms and applications, Acta Math. Vietnam., 22, 1, 289-355 (1997) · Zbl 0895.90152
[39] Tibshirani, R., Regression shrinkage and selection via the lasso, J. R. Stat. Soc., Ser. B, Stat. Methodol., 267-288 (1996) · Zbl 0850.62538
[40] Turlach, Berwin A.; Venables, William N.; Wright, Stephen J., Simultaneous variable selection, Technometrics, 47, 3, 349-363 (2005)
[41] Van De Geer, S. A.; Bühlmann, P., On the conditions used to prove oracle results for the lasso, Electron. J. Stat., 3, 1360-1392 (2009) · Zbl 1327.62425
[42] Wang, L.; Chen, G.; Li, H., Group SCAD regression analysis for microarray time course gene expression data, Bioinformatics, 23, 12, 1486-1494 (2007)
[43] Wong, W. H.; Shen, X., Probability inequalities for likelihood ratios and convergence rates of sieve MLEs, Ann. Stat., 23, 2, 339-362 (1995) · Zbl 0829.62002
[44] Wright, Stephen J.; Nowak, Robert D.; Figueiredo, Mário A. T., Sparse reconstruction by separable approximation, IEEE Trans. Signal Process., 57, 7, 2479-2493 (2009) · Zbl 1391.94442
[45] Xiang, S.; Shen, X.; Ye, J., Efficient sparse group feature selection via nonconvex optimization, (The 30th International Conference on Machine Learning. The 30th International Conference on Machine Learning, ICML (2013)), 284-292
[46] Xiang, Shuo; Yuan, Lei; Fan, Wei; Wang, Yalin; Thompson, Paul M.; Ye, Jieping, Bi-level multi-source learning for heterogeneous block-wise missing data, NeuroImage (2013)
[47] Xiang, Shuo; Yuan, Lei; Fan, Wei; Wang, Yalin; Thompson, Paul M.; Ye, Jieping, Multi-source learning with block-wise missing data for Alzheimer’s disease prediction, (Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2013), ACM), 185-193
[48] Yang, H.; Xu, Z.; King, I.; Lyu, M., Online learning for group lasso, (Proceedings of the 27th International Conference on Machine Learning (2010), ACM), 1191-1198
[49] Yuan, M.; Lin, Y., Model selection and estimation in regression with grouped variables, J. R. Stat. Soc., Ser. B, Stat. Methodol., 68, 1, 49-67 (2006) · Zbl 1141.62030
[50] Zhang, T., Analysis of multi-stage convex relaxation for sparse regularization, J. Mach. Learn. Res., 11, 1081-1107 (2010) · Zbl 1242.68262
[51] Zhang, T., Multi-stage convex relaxation for feature selection, Bernoulli, 19, 5B, 2277-2293 (2013) · Zbl 1359.62293
[52] Zhao, P.; Yu, B., On model selection consistency of lasso, J. Mach. Learn. Res., 7, 2541-2563 (2006) · Zbl 1222.62008
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.