×

Sparse high-dimensional fractional-norm support vector machine via DC programming. (English) Zbl 1471.62080

Summary: This paper considers a class of feature selecting support vector machines (SVMs) based on \(L_q\)-norm regularization, where \(q \in(0, 1)\). The standard SVM [V. N. Vapnik, The nature of statistical learning theory. Berlin: Springer-Verlag (1995; Zbl 0833.62008)] minimizes the hinge loss function subject to the \(L_2\)-norm penalty. Recently, \(L_1\)-norm SVM (\(L_1\)-SVM) [P. Bradley and O. Mangasarian, “Feature selection via concave minimization and support vector machines”, in: Machine Learning Proceedings of the Fifteenth International Conference (ICML98). Citeseer. 82–90 (1998)] was suggested for feature selection and has gained great popularity since its introduction. \(L_0\)-norm penalization would result in more powerful sparsification, but exact solution is NP-hard. This raises the question of whether fractional-norm (\(L_q\) for \(q\) between 0 and 1) penalization can yield benefits over the existing \(L_1\), and approximated \(L_0\) approaches for SVMs. The major obstacle to answering this is that the resulting objective functions are non-convex. This paper addresses the difficult optimization problems of fractional-norm SVM by introducing a new algorithm based on the Difference of Convex functions (DC) programming techniques [Pham Din Tao and Le Thi Hoai An, SIAM J. Optim. 8, No. 2, 476–505 (1998; Zbl 0913.65054); Le Thi Hoai An and Pham Dinh Tao, Discrete Appl. Math. 156, No. 3, 325–338 (2008; Zbl 1190.90142)], which efficiently solves a reweighted \(L_1\)-SVM problem at each iteration. Numerical results on seven real world biomedical datasets support the effectiveness of the proposed approach compared to other commonly-used sparse SVM methods, including \(L_1\)-SVM, and recent approximated \(L_0\)-SVM approaches.

MSC:

62-08 Computational methods for problems pertaining to statistics
68T05 Learning and adaptive systems in artificial intelligence
90C26 Nonconvex programming, global optimization

Software:

UCI-ml; LIBSVM
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Abeel, T.; Helleputte, T.; Van de Peer, Y.; Dupont, P.; Saeys, Y., Robust biomarker identification for cancer diagnosis with ensemble feature selection methods, Bioinformatics, 26, 3, 392-398, (2010)
[2] Alon, U.; Barkai, N.; Notterman, D.; Gish, K.; Ybarra, S.; Mack, D.; Levine, A., Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proceedings of the National Academy of Sciences of the United States of America, 96, 12, 6745, (1999)
[3] Amaldi, E.; Kann, V., On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems, Theoretical Computer Science, 209, 1-2, 237-260, (1998) · Zbl 0915.68072
[4] Ben-Israel, A.; Greville, T., Generalized inverses: theory and applications, (2003), Springer Verlag · Zbl 1026.15004
[5] Bradley, P., Mangasarian, O., 1998. Feature selection via concave minimization and support vector machines. In: Machine Learning Proceedings of the Fifteenth International Conference (ICML98). Citeseer, pp. 82-90.
[6] Candes, E.; Wakin, M.; Boyd, S., Enhancing sparsity by reweighted \(L_1\) minimization, Journal of Fourier Analysis and Applications, 14, 5, 877-905, (2008) · Zbl 1176.94014
[7] Chang, C., Lin, C., 2001. LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm.
[8] Fan, J.; Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96, 456, 1348-1361, (2001) · Zbl 1073.62547
[9] Fung, G.; Mangasarian, O., A feature selection Newton method for support vector machine classification, Computational Optimization and Applications, 28, 2, 185-202, (2004) · Zbl 1056.90103
[10] Gasso, G.; Rakotomamonjy, A.; Canu, S., Recovering sparse signals with a certain family of nonconvex penalties and DC programming, IEEE Transactions on Signal Processing, 57, 12, 4686-4698, (2009) · Zbl 1391.90489
[11] Guan, W.; Zhou, M.; Hampton, C.; Benigno, B.; Walker, L.; Gray, A.; McDonald, J.; Fernández, F., Ovarian cancer detection from metabolomic liquid chromatography/mass spectrometry data by support vector machines, BMC Bioinformatics, 10, 1, 259, (2009)
[12] Guyon, I.; Elisseeff, A., An introduction to variable and feature selection, The Journal of Machine Learning Research, 3, 1157-1182, (2003) · Zbl 1102.68556
[13] Guyon, I.; Gunn, S.; Ben-Hur, A.; Dror, G., Result analysis of the nips 2003 feature selection challenge, Advances in Neural Information Processing Systems, 17, 545-552, (2005)
[14] Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V., Gene selection for cancer classification using support vector machines, Machine Learning, 46, 1, 389-422, (2002) · Zbl 0998.68111
[15] Hingorani, S.; Petricoin, E.; Maitra, A.; Rajapakse, V.; King, C.; Jacobetz, M.; Ross, S.; Conrads, T.; Veenstra, T.; Hitt, B., Preinvasive and invasive ductal pancreatic cancer and its early detection in the mouse, Cancer Cell, 4, 6, 437-450, (2003)
[16] Knight, K.; Fu, W., Asymptotics for lasso-type estimators, Annals of Statistics, 1356-1378, (2000) · Zbl 1105.62357
[17] Langford, J.; Li, L.; Zhang, T., Sparse online learning via truncated gradient, The Journal of Machine Learning Research, 10, 777-801, (2009) · Zbl 1235.68167
[18] Le Thi, H.; Le, H.; Nguyen, V.; Pham Dinh, T., A DC programming approach for feature selection in support vector machines learning, Advances in Data Analysis and Classification, 2, 3, 259-278, (2008) · Zbl 1284.90057
[19] Le Thi, H.; Pham Dinh, T., A continuous approach for the concave cost supply problem via DC programming and DCA, Discrete Applied Mathematics, 156, 3, 325-338, (2008) · Zbl 1190.90142
[20] Liu, Y.; Zhang, H., Support vector machines with adaptive \(L_q\) penalty, Computational Statistics & Data Analysis, 51, 12, 6380-6394, (2007) · Zbl 1446.62179
[21] Loscalzo, S.; Yu, L.; Ding, C., Consensus group stable feature selection, (Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2009), ACM), 567-576
[22] Mangasarian, O., Exact 1-norm support vector machines via unconstrained convex differentiable minimization, The Journal of Machine Learning Research, 7, 1530, (2006) · Zbl 1211.68329
[23] Murphy, P., Aha, D., 1992. UCI repository of machine learning databases.
[24] Neumann, J.; Schnörr, C.; Steidl, G., Combined SVM-based feature selection and classification, Machine Learning, 61, 1, 129-150, (2005) · Zbl 1137.90643
[25] Pham Dinh, T.; Le Thi, H., A DC optimization algorithm for solving the trust-region subproblem, SIAM Journal on Optimization, 8, 2, 476-505, (1998) · Zbl 0913.65054
[26] Rockafellar, R., Convex analysis, (1997), Princeton landmarks in mathematics · Zbl 0932.90001
[27] Saab, R., Chartrand, R., Yilmaz, O., 2008. Stable sparse approximations via nonconvex optimization. In: Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on. IEEE, pp. 3885-3888.
[28] Shalev-Shwartz, S.; Tewari, A., Stochastic methods for \(L_1\) regularized loss minimization, (Proceedings of the 26th Annual International Conference on Machine Learning, (2009), ACM), 929-936
[29] Singh, D.; Febbo, P.; Ross, K.; Jackson, D.; Manola, J.; Ladd, C.; Tamayo, P.; Renshaw, A.; D’Amico, A.; Richie, J., Gene expression correlates of clinical prostate cancer behavior, Cancer Cell, 1, 2, 203-209, (2002)
[30] Vapnik, V., The nature of statistical learning theory, (1995), Springer NY · Zbl 0833.62008
[31] Weston, J.; Mukherjee, S.; Chapelle, O.; Pontil, M.; Poggio, T.; Vapnik, V., Feature selection for svms, Advances in Neural Information Processing Systems, 668-674, (2001)
[32] Yu, L.; Ding, C.; Loscalzo, S., Stable feature selection via dense feature groups, (Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2008), ACM), 803-811
[33] Yuille, A.; Rangarajan, A., The concave-convex procedure, Neural Computation, 15, 4, 915-936, (2003) · Zbl 1022.68112
[34] Zhang, H.; Ahn, J.; Lin, X.; Park, C., Gene selection using support vector machines with non-convex penalty, Bioinformatics, 22, 1, 88, (2005)
[35] Zhang, Z.; Kwok, J.; Yeung, D., Surrogate maximization/minimization algorithms and extensions, Machine Learning, 69, 1, 1-33, (2007)
[36] Zhu, J., Rosset, S., Hastie, T., Tibshirani, R., 2004. 1-norm support vector machines. In: Advances in Neural Information Processing Systems 16: Proceedings of the 2003 Conference. pp. 49-56.
[37] Zou, H.; Li, R., One-step sparse estimates in nonconcave penalized likelihood models, Annals of Statistics, 36, 4, 1509, (2008) · Zbl 1142.62027
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.