An interior point method for \(L_{1 / 2}\)-SVM and application to feature selection in classification. (English) Zbl 1442.68208

Summary: This paper studies feature selection for support vector machine (SVM). By the use of the \(L_{1 / 2}\) regularization technique, we propose a new model \(L_{1 / 2}\)-SVM. To solve this nonconvex and non-Lipschitz optimization problem, we first transform it into an equivalent quadratic constrained optimization model with linear objective function and then develop an interior point algorithm. We establish the convergence of the proposed algorithm. Our experiments with artificial data and real data demonstrate that the \(L_{1 / 2}\)-SVM model works well and the proposed algorithm is more effective than some popular methods in selecting relevant features and improving classification performance.


68T05 Learning and adaptive systems in artificial intelligence
62H30 Classification and discrimination; cluster analysis (statistical aspects)


Full Text: DOI


[1] Yang, Y. M.; Pedersen, J. O., A comparative study on feature selection in text categorization, Proceedings of the 14th International Conference on Machine Learning (ICML ’97)
[2] Forman, G., An extensive empirical study of feature selection metrics for text classification, The Journal of Machine Learning Research, 3, 1289-1305 (2003) · Zbl 1102.68553
[3] Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V., Gene selection for cancer classification using support vector machines, Machine Learning, 46, 1-3, 389-422 (2002) · Zbl 0998.68111
[4] Zhang, H. H.; Ahn, J.; Lin, X.; Park, C., Gene selection using support vector machines with non-convex penalty, Bioinformatics, 22, 1, 88-95 (2006)
[5] Saeys, Y.; Inza, I.; Larrañaga, P., A review of feature selection techniques in bioinformatics, Bioinformatics, 23, 19, 2507-2517 (2007)
[6] Weston, J.; Pérez-Cruz, F.; Bousquet, O.; Chapelle, O.; Elisseeff, A.; Schölkopf, B., Feature selection and transduction for prediction of molecular bioactivity for drug design, Bioinformatics, 19, 6, 764-771 (2003)
[7] Liu, Y., A comparative study on feature selection methods for drug discovery, Journal of Chemical Information and Computer Sciences, 44, 5, 1823-1828 (2004)
[8] Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V., Feature Extraction: Foundations and Applications, 207 (2006), Springer
[9] Rakotomamonjy, A., Variable selection using SVM-based criteria, Journal of Machine Learning Research, 3, 7-8, 1357-1370 (2003) · Zbl 1102.68583
[10] Weston, J.; Mukherjee, S.; Chapelle, O.; Pontil, M.; Poggio, T.; Vapnik, V., Feature selection for SVMs, Proceedings of the Conference on Neural Information Processing Systems
[11] Peleg, D.; Meir, R., A feature selection algorithm based on the global minimization of a generalization error bound, Proceedings of the Conference on Neural Information Processing Systems
[12] Bradley, P. S.; Mangasarian, O. L., Feature selection via concave minimization and support vector machines, Proceedings of the International Conference on Machine Learning
[13] Weston, J.; Elisseeff, A.; Schölkopf, B.; Tipping, M., Use of the zero-norm with linear models and kernel methods, Journal of Machine Learning Research, 3, 1439-1461 (2003) · Zbl 1102.68605
[14] Chan, A. B.; Vasconcelos, N.; Lanckriet, G. R. G., Direct convex relaxations of sparse SVM, Proceedings of the 24th International Conference on Machine Learning (ICML ’07)
[15] Fung, G. M.; Mangasarian, O. L., A feature selection Newton method for support vector machine classification, Computational Optimization and Applications, 28, 2, 185-202 (2004) · Zbl 1056.90103
[16] Zhu, J.; Rosset, S.; Hastie, T.; Tibshirani, R., 1-norm support vector machines, Advances in Neural Information Processing Systems, 16, 1, 49-56 (2004)
[17] Tibshirani, R., Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society B: Methodological, 58, 1, 267-288 (1996) · Zbl 0850.62538
[18] Zhang, T., Analysis of multi-stage convex relaxation for sparse regularization, Journal of Machine Learning Research, 11, 1081-1107 (2010) · Zbl 1242.68262
[19] Chartrand, R., Exact reconstruction of sparse signals via nonconvex minimization, IEEE Signal Processing Letters, 14, 10, 707-710 (2007)
[20] Chartrand, R., Nonconvex regularizaron for shape preservation, Proceedings of the 14th IEEE International Conference on Image Processing (ICIP ’07)
[21] Xu, Z.; Zhang, H.; Wang, Y.; Chang, X.; Liang, Y., L1/2 regularization, Science China: Information Sciences, 53, 6, 1159-1169 (2010)
[22] Chen, W. J.; Tian, Y. J., Lp-norm proximal support vector machine and its applications, Procedia Computer Science, 1, 1, 2417-2423 (2010)
[23] Liu, J.; Li, J.; Xu, W.; Shi, Y., A weighted L_q adaptive least squares support vector machine classifiers-Robust and sparse approximation, Expert Systems with Applications, 38, 3, 2253-2259 (2011)
[24] Liu, Y.; Zhang, H. H.; Park, C.; Ahn, J., Support vector machines with adaptive Lq penalty, Computational Statistics & Data Analysis, 51, 12, 6380-6394 (2007) · Zbl 1446.62179
[25] Liu, Z.; Lin, S.; Tan, M., Sparse support vector machines with Lp penalty for biomarker identification, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7, 1, 100-107 (2010)
[26] Rakotomamonjy, A.; Flamary, R.; Gasso, G.; Canu, S., l_p-l_q penalty for sparse linear and sparse multiple kernel multitask learning, IEEE Transactions on Neural Networks, 22, 8, 1307-1320 (2011)
[27] Tan, J. Y.; Zhang, Z.; Zhen, L.; Zhang, C. H.; Deng, N. Y., Adaptive feature selection via a new version of support vector machine, Neural Computing and Applications, 23, 3-4, 937-945 (2013)
[28] Xu, Z. B.; Chang, X. Y.; Xu, F. M.; Zhang, H., L1/2 regularization: an iterative half thresholding algorithm, IEEE Transactions on Neural Networks and Learning Systems, 23, 7, 1013-1027 (2012)
[29] Ge, D.; Jiang, X.; Ye, Y., A note on the complexity of Lp minimization, Mathematical Programming, 129, 2, 285-299 (2011) · Zbl 1226.90076
[30] Vapnik, V. N., The Nature of Statistical Learning Theory (1995), New York, NY, USA: Springer, New York, NY, USA · Zbl 0833.62008
[31] Fan, J.; Peng, H., Nonconcave penalized likelihood with a diverging number of parameters, The Annals of Statistics, 32, 3, 928-961 (2004) · Zbl 1092.62031
[32] Knight, K.; Fu, W., Asymptotics for lasso-type estimators, The Annals of Statistics, 28, 5, 1356-1378 (2000) · Zbl 1105.62357
[33] Tian, B. S.; Yang, X. Q., An interior-point l L1/2-penalty method for nonlinear programming, Technical Report (2013), Department of Applied Mathematics, Hong Kong Polytechnic University
[34] Armijo, L., Minimization of functions having Lipschitz continuous first partial derivatives, Pacific Journal of Mathematics, 16, 1-3 (1966) · Zbl 0202.46105
[35] John, F., Extremum problems with inequalities as side-conditions, Studies and Essays. Courant Anniversary Volume, 187-1204 (1948)
[36] Newman, D. J.; Hettich, S.; Blake, C. L.; Merz, C. J., UCI repository of machine learning databases, Technical Report, 9702 (1998), Irvine, Calif, USA: Department of Information and Computer Science, Univerisity of California, Irvine, Calif, USA
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.