zbMATH — the first resource for mathematics

Sensitivity and specificity based multiobjective approach for feature selection: application to cancer diagnosis. (English) Zbl 1205.68494
Summary: The study of the sensitivity and the specificity of a classification test constitute a powerful kind of analysis since it provides specialists with very detailed information useful for cancer diagnosis. In this work, we propose the use of a multiobjective genetic algorithm for gene selection of Microarray datasets. This algorithm performs gene selection from the point of view of the sensitivity and the specificity, both used as quality indicators of the classification test applied to the previously selected genes. In this algorithm, the classification task is accomplished by Support Vector Machines; in addition a 10-Fold Cross-Validation is applied to the resulting subsets. The emerging behavior of all these techniques used together is noticeable, since this approach is able to offer, in an original and easy way, a wide range of accurate solutions to professionals in this area. The effectiveness of this approach is proved on public cancer datasets by working out new and promising results. A comparative analysis of our approach using two and three objectives, and with other existing algorithms, suggest that our proposal is highly appropriate for solving this problem.

68W05 Nonnumerical algorithms
68T05 Learning and adaptive systems in artificial intelligence
68U99 Computing methodologies and applications
Full Text: DOI
[1] Pease, A.; Solas, D.; Sullivan, E.; Cronin, M.; Holmes, C.P.; Fodor, S., Light-generated oligonucleotide arrays for rapid DNA sequence analysis, Proc. natl. acad. sci. USA, 96, 5022-5026, (1994)
[2] Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V., Gene selection for cancer classification using support vector machines, Machine learning, 46, 1-3, 389-422, (2002) · Zbl 0998.68111
[3] Narendra, M.; Fukunaga, K., A branch and bound algorithm for feature subset selection, IEEE trans. comput., 26, 917-922, (1977) · Zbl 0363.68059
[4] E. Alba, J. García-Nieto, L. Jourdan, E.-G. Talbi, Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms, in: IEEE Congress on Evolutionary Computation CEC-07, Singapore, 2007, pp. 284-290
[5] Huerta, E.B.; Duval, B.; Hao, J.-K., A hybrid GASVM approach for gene selection and classification of microarray data, (), 34-44
[6] T. Juliusdottir, D. Corne, E. Keedwell, A. Narayanan, Two-phase EA/K-NN for feature selection and classification in cancer microarray datasets, in: CIBCB, 2005, pp. 1-8
[7] Deb, K.; Raji, A., Reliable classification of two-class cancer data using evolutionary algorithms, Biosystems, 72, 111-129, (2003)
[8] J. Liu, H. Iba, Selecting informative genes using a multiobjective evolutionary algorithm, in: Proceedings of the IEEE Congress on Evolutionary Computation, CEC ’02, vol. 1, 2002, pp. 297-302
[9] Banerjee, M.; Mitra, S.; Banka, H., Evolutionary rough feature selection in gene expression data, IEEE transactions on systems, man and cybernetics, part C: applications and reviews, 37, 4, 622-632, (2007)
[10] Metz, C., Basic principles of ROC analysis, Seminars in nuclear medicine, 8, 4, 283-298, (1978)
[11] Alberg, A.J.; Park, J.W.; Hager, B.W.; Brock, M.V.; Diener-West, M., The use of “overall accuracy” to evaluate the validity of screening or diagnostic tests, J. of general internal medicine, 19, 5, 460-465, (2004)
[12] Kupinski, M.; Anastasio, M., Multiobjective genetic optimization of diagnostic classifiers with implications for generating receiver operating characteristic curves, IEEE trans. on medical imaging, 18, 8, 675-685, (1999)
[13] Everson, R.M.; Fieldsend, J.E., Multi-class ROC analysis from a multi-objective optimisation perspective, Pattern recogn. lett., 27, 8, 918-927, (2006)
[14] Liu, H.; Motoda, H., Feature extraction, construction and selection: A data mining perspective, (1998), Kluwer Academic Publishers Norwell, MA, USA · Zbl 0912.00012
[15] Cortes, C.; Vapnik, V., Support-vector networks, machine learning, 20, 3, 273-297, (1995) · Zbl 0831.68098
[16] Furey, T.; Cristianini, N.; Duffy, N.; Bednarski, D.W.; Schummer, M.; Haussler, D., Support vector machines classification and validation of cancer tissue samples using microarray expression data, Bioinformatics, 16, 10, 906-914, (2000)
[17] Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T., A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE trans. on EC, 6, 2, 182-197, (2002)
[18] C. Emmanouilidis, A. Hunter, J. MacIntyre, A multiobjective evolutionary setting for feature selection and a commonality-based crossover operator, in: IEEE Congress on Evolutionary Computation, California, USA, 2000, pp. 309-316
[19] Jourdan, L.; Dhaenens, C.; Talbi, E.G.; Gallina, S., A data mining approach to discover genetic and environmental factors involved in multifactorial diseases, Knowledge-based systems, 15, 4, 235-242, (2002)
[20] Liefooghe, A.; Basseur, M.; Jourdan, L.; Talbi, E.-G., Paradiseo-MOEO: A framework for evolutionary multi-objective optimization, (), 386-400
[21] Chang, C.-C.; Lin, C.-J., LIBSVM: A library for support vector machines, (2002), Software available at URL
[22] Hernandez, J.; Duval, B.; Hao, J.-K., A genetic embedded approach for gene selection and classification of microarray data, (), 90-101
[23] Witten, I.; Frank, E., Data mining: practical machine learning tools and techniques, (2005), M. Kaufmann · Zbl 1076.68555
[24] Dramiński, M.; Rada-Iglesias, A.; Enroth, S.; Wadelius, C.; Koronacki, J.; Komorowski, J., Monte Carlo feature selection for supervised classification, Bioinformatics, 24, 1, 110-117, (2008)
[25] Wang, S.; Zhu, J., Improved centroids estimation for the nearest shrunken centroid classifier, Bioinformatics, 32, 2, 972-979, (2007)
[26] Golub, R.; Slonim, D.K.; Tamayo, P.; Huard, C.; Gaasenbeek, M.; Mesirov, J.P.; Coller, H.; Loh, M.L.; Downing, J.R.; Caligiuri, M.A.; Bloomfield, C.D.; Lander, E.S., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science, 286, 531-537, (1999)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.