×

Double regularization methods for robust feature selection and SVM classification via DC programming. (English) Zbl 1436.68310

Summary: In this work, two novel formulations for embedded feature selection are presented. A second-order cone programming approach for Support Vector Machines is extended by adding a second regularizer to encourage feature elimination. The one- and the zero-norm penalties are used in combination with the Tikhonov regularization under a robust setting designed to correctly classify instances, up to a predefined error rate, even for the worst data distribution. The use of the zero norm leads to a nonconvex formulation, which is solved by using Difference of Convex (DC) functions, extending DC programming to second-order cones. Experiments on high-dimensional microarray datasets were performed, and the best performance was obtained with our approaches compared with well-known feature selection methods for Support Vector Machines.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
90C22 Semidefinite programming

Software:

SeDuMi
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Alizadeh, A.; Eisen, M.; Davis, R.; et al., Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling, Nature, 403, 503-511 (2000)
[2] Alizadeh, F.; Goldfarb, D., Second-order cone programming, Mathematical Programming, 95, 3-51 (2003) · Zbl 1153.90522
[3] Alon, U.; Barkai, N.; Notterman, D.; Gish, K.; Ybarra, S.; Mack, D.; Levine, A., Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proceedings of the National Academy of Sciences, 96, 12, 6745-6750 (1999)
[4] Alvarez, F.; López, J.; Ramírez C., H., Interior proximal algorithm with variable metric for second-order cone programming: applications to structural optimization and support vector machines, Optimization Methods Software, 25, 6, 859-881 (2010) · Zbl 1226.90041
[5] Bhattacharyya, C., Second order cone programming formulations for feature selection, Journal of Machine Learning Research, 5, 1417-1433 (2004) · Zbl 1222.68147
[6] Bradley, P.; Mangasarian, O., Feature selection va concave minimization and support vector machines, Machine Learning proceedings of the fifteenth International Conference (ICML’98) 82-90, San Francisco, California, Morgan Kaufmann (1998)
[7] Carrasco, M.; López, J.; Maldonado, S., A multi-class svm approach based on the l1-norm minimization of the distances between the reduced convex hulls, Pattern Recognition, 48, 5, 1598-1607 (2015) · Zbl 1374.68367
[8] Cortes, C.; Vapnik, V., Support-vector networks, Machine Learning, 20, 273-297 (1995) · Zbl 0831.68098
[9] Demšar, J., Statistical comparisons of classifiers over multiple data set, Journal of Machine Learning Research, 7, 1-30 (2006) · Zbl 1222.68184
[10] Dinh, T. P.; Souad, E., Algorithms for solving a class of nonconvex optimization problems. methods of subgradients, (Hiriart-Urruty, J.-B., Fermat Days 85: Mathematics for Optimization. Fermat Days 85: Mathematics for Optimization, North-Holland Mathematics Studies, 129 (1986), North-Holland), 249-271 · Zbl 0638.90087
[11] Dinh, T. P.; Thi, H. L., Convex analysis approaches to dc programming: Theory, algorithms and applications, Acta Mathematica Vietnamica, 22, 1, 287-367 (1997)
[12] Dinh, T. P.; Thi, H. L., A d.c. optimization algorithm for solving the trust-region subproblem, SIAM Journal on Optimization, 8, 2, 476-505 (1998) · Zbl 0913.65054
[13] Duda, R.; Hard, P.; Stork, D., Pattern Classification (2001), Wiley-Interscience Publication · Zbl 0968.68140
[14] Gravier, E.; Pierron, G.; Vincent-Salomon, A.; Gruel, N.; Raynal, V.; Savignoni, A.; De Rycke, Y.; Pierga, J.-Y.; Lucchesi, C.; Reyal, F.; Fourquet, A.; Roman-Roman, S.; Radvanyi, F.; Sastre-Garau, X.; Asselain, O., A prognostic dna signature for t1t2 node-negative breast cancer patients, Genes, Chromosomes and Cancer, 49, 12, 1125 (2010)
[15] Guyon, I.; Elisseeff, A., An introduction to variable and feature selection, Journal of Machine Learning research, 3, 1157-1182 (2003) · Zbl 1102.68556
[16] Guyon, I.; Gunn, S.; Nikravesh, M.; Zadeh, L. A., Feature extraction, foundations and applications (2006), Springer, Berlin
[17] Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V., Gene selection for cancer classification using support vector machines., Machine Learning, 46, 1-3, 389-422 (2002) · Zbl 0998.68111
[18] Hastie, T.; Tibshirani, R.; Friedman, J., Elements of Statistical Learning (2009), Springer · Zbl 1273.62005
[19] Krstajic, D.; Buturovic, L.; Leahy, D.; Thomas, S., Cross-validation pitfalls when selecting and assessing regression and classification models, Journal of Cheminformatics, 6, 10, 1-15 (2014)
[20] Lanckriet, G.; Ghaoui, L.; Bhattacharyya, C.; Jordan, M., A robust minimax approach to classification, Journal of Machine Learning Research, 3, 555-582 (2003) · Zbl 1084.68657
[21] López, J.; Maldonado, S., Robust feature selection for multi-class second-order cone programming support vector machines, Intelligent Data Analysis., 19, S1, S117-S133 (2015)
[22] López, J.; Maldonado, S., Multi-class second-order cone programming support vector machines, Information Sciences, 330, 328-341 (2016) · Zbl 06871048
[23] Maldonado, S.; López, J., Alternative second-order cone programming formulations for support vector classification, Information Sciences, 268, 328-341 (2014) · Zbl 1341.68164
[24] Maldonado, S.; López, J., An embedded feature selection approach for support vector classification via second-order cone programming, Intelligent Data Analysis, 19, 6, 1259-1273 (2015)
[25] Maldonado, S.; Weber, R.; Basak, J., Kernel-penalized SVM for feature selection, Information Sciences, 181, 1, 115-128 (2011)
[26] Martin-Barragan, B.; Lillo, R.; Romo, J., Interpretable support vector machines for functional data, European Journal of Operational Research, 232, 1, 146-155 (2014)
[27] Meier, L.; Van De Geer, S.; Buehlmann, P., The group lasso for logistic regression, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 1, 53-71 (2008) · Zbl 1400.62276
[28] Neumann, J.; Schnorr, C.; Steidl, G., Combined svm-based feature selection and classification, Machine Learning, 61, 1-3, 129-150 (2005) · Zbl 1137.90643
[29] Pomeroy, S. L.; Tamayo, P.; Gaasenbeek, M.; Sturla, L. M.; Angelo, M.; McLaughlin, M. E.; Kim, J. Y.H.; Goumnerova, L. C.; Black, P. M.; Lau, C.; Allen, J.; Zagzag, D.; Olson, J.; Curran, T.; Wetmore, C.; Biegel, J. A.; Poggio, T.; Mukherjee, S.; Rifkin, R.; Califano, A.; Stolovitzky, G.; Louis, D. N.; Mesirov, J.; Lander, E.; Golub, T., Prediction of central nervous system embryonal tumour outcome based on gene expression, Nature, 415, 6870, 436-442 (2002)
[30] Rinaldi, F.; Schoen, F.; Sciandrone, M., Concave programming for minimizing the zero-norm over polyhedral sets, Computational Optimization and Applications, 46, 3, 467-486 (2010) · Zbl 1229.90170
[31] Rockafellar, R., Convex analysis, Princeton Mathematical Series, No. 28 (1970), Princeton University Press: Princeton University Press Princeton, N.J. · Zbl 0193.18401
[32] Saketha Nath, J.; Bhattacharyya, C., Maximum margin classifiers with specified false positive and false negative error rates, Proceedings of the SIAM International Conference on Data mining (2007)
[33] Shipp, M. A.; Ross, K. N.; Tamayo, P.; Weng, A. P.; Kutok, J. L.; Aguiar, R. C.T.; Gaasenbeek, M.; Angelo, M.; Reich, M.; Pinkus, G. S.; Ray, T. S.; Koval, M. A.; Last, K. W.; Norton, A.; Lister, T. A.; Mesirov, J.; Neuberg, D. S.; Lander, E. S.; Aster, J. C.; Golub, T. R., Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning, Nature Medicine, 8, 1, 68-74 (2002)
[34] Sturm, J., Using sedumi 1.02, a matlab toolbox for optimization over symmetric cones, Optimization Methods and Software, 11, 12, 625-653 (1999) · Zbl 0973.90526
[35] Thi, H. L.; Dinh, T. P., Solving a class of linearly constrained indefinite quadratic problems by dc algorithms, Journal of Global Optimization, 11, 3, 253-285 (1997) · Zbl 0905.90131
[36] Thi, H. L.; Dinh, T. P., The dc (difference of convex functions) programming and dca revisited with dc models of real world nonconvex optimization problems, Annals of Operations Research, 133, 1, 23-46 (2005) · Zbl 1116.90122
[37] Thi, H. L.; Dinh, T. P.; Muu, L., Numerical solution for optimization over the efficient set by d.c. optimization algorithms, Operations Research Letters, 19, 3, 117-128 (1996) · Zbl 0871.90074
[38] Vapnik, V., Statistical Learning Theory (1998), John Wiley and Sons · Zbl 0935.62007
[39] West, M. M.; Blanchette, C. C.; Dressman, H. H.; Huang, E. E.; Ishida, S. S.; Spang, R. R.; Zuzan, H.; Olson, J.; Marks, J.; Nevins, J., Predicting the clinical status of human breast cancer by using gene expression profiles, Proceedings of the National Academy of Sciences of the United States of America, 98, 20, 11462-11467 (2001)
[40] Weston, J.; Elisseeff, A.; Schlkopf, B.; Tipping, M., The use of zero-norm with linear models and kernel methods, Journal of Machine Learning Research, 3, 1439-1461 (2003) · Zbl 1102.68605
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.