×

zbMATH — the first resource for mathematics

Improved binary PSO for feature selection using gene expression data. (English) Zbl 1142.92319
Summary: Gene expression profiles, which represent the state of a cell at a molecular level, have great potential as a medical diagnosis tool. Compared to the number of genes involved, available training data sets generally have a fairly small sample size in cancer type classification. These training data limitations constitute a challenge to certain classification methodologies. A reliable selection method for genes relevant for sample classification is needed in order to speed up the processing rate, decrease the predictive error rate, and to avoid incomprehensibility due to the large number of genes investigated. Improved binary particle swarm optimization (IBPSO) is used in this study to implement feature selection, and the K-nearest neighbor (K-NN) method serves as an evaluator of the IBPSO for gene expression data classification problems. Experimental results show that this method effectively simplifies feature selection and reduces the total number of features needed. The classification accuracy obtained by the proposed method has the highest classification accuracy in nine of the 11 gene expression data test problems, and is comparative to the classification accuracy of the two other test problems, as compared to the best results previously published.

MSC:
92C40 Biochemistry, molecular biology
92-08 Computational methods for problems pertaining to biology
92-04 Software, source code, etc. for problems pertaining to biology
Software:
DistAl; GeneSrF
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Ancona, N.; Maglietta, R.; D’Addabbo, A.; Liuni, S.; Pesole, G., Regularized least squares cancer classifiers from DNA microarray data, Bioinformatics, 6, S2, (2005)
[2] Berrar, D.; Bradbury, I.; Dubitzky, W., Instance-based concept learning from multiclass DNA microarray data, Bioinformatics, 7, 73, (2006)
[3] Cover, T.; Hart, P., Nearest neighbor pattern classification, (), 21-27 · Zbl 0154.44505
[4] Crammer, K.; Singer, Y., On the learnability and design of output codes for multiclass problems, () · Zbl 1012.68155
[5] Dasarathy, B.V., (), 1-30
[6] Diaz-Uriarte, R.; Alvarez de Andres, S., Gene selection and classification of microarray data using random forest, BMC bioinformatics, 7, 3, (2006)
[7] Fix, E., Hodges, J.L., 1951. Discriminatory Analysis—Nonparametric Discrimination: Consistency Properties. Technical Report 21-49-004, Report no. 4, US Air Force School of Aviation Medicine, Randolph Field, pp. 261-279. · Zbl 0715.62080
[8] Hsu, C.-W.; Lin, C.-J., A comparison of methods for multi-class support vector machines, IEEE trans. neural netw., 12, 415-425, (2002)
[9] Kennedy, J.; Eberhart, R.C., Particle swarm optimization, (), 1942-1948
[10] Kennedy, J.; Eberhart, R.C., A discrete binary version of the particle swarm algorithm. systems, man, and cybernetics, 1997, (), 4104-4108
[11] Kennedy, J.; Eberhart, R.C.; Shi, Y., Swarm intelligence, (2001), Morgan Kaufman San Mateo, CA
[12] Kreßel, U., Pairwise classification and support vector machines, Advances in kernel methods: support vector learning, (1999), MIT Press Cambridge, MA, pp. 255-268
[13] Liu, X.; Krishnan, A.; Mondry, A., An entropy-based gene selection method for cancer classification using microarray data, BMC bioinformatics, 6, 76, (2005)
[14] Mitchell, T.M., Machine learning, (1997), McGraw-Hill New York, NY, USA · Zbl 0913.68167
[15] Narendra, P.M.; Fukunage, K., A branch and bound algorithm for feature subset selection, IEEE trans. comput., 6, 9, 917-922, (1997) · Zbl 0363.68059
[16] Oh, Hybrid genetic algorithm for feature selection, IEEE trans. pattern anal. Mach. intell., 26, 11, 2004, (2004), Nov.
[17] Palau, A.M.; Snapp, R., The labeled cell classifier: a fast approximation to k nearest neighbors, (), 823-827
[18] Platt, J.C.; Cristianini, N.; Shawe-Taylor, J., Large margin dags for multiclass classification, Advances in neural information processing systems 12, (2000), MIT Press, pp. 547-553
[19] Pudil, P.; Novovicova, J.; Kittler, J., Floating search methods in feature selection, Pattern recognit. lett., 15, 1119-1125, (1994)
[20] Raymer, M.L.; Punch, W.F.; Goodman, E.D.; Kuhn, L.A.; Jain, A.K., Dimensionality reduction using genetic algorithms, IEEE trans. evol. comput., 4, 2, 164-171, (2000)
[21] Roberto, B., Using mutual information for selecting features in supervised neural net learning, IEEE trans. neural netw., 5, 4, 537-550, (1994)
[22] Shi, X.H.; Liang, Y.C.; Lee, H.P.; Lu, C.; Wang, L.M., An improved ga and a novel pso-ga-based hybrid algorithm, Inf. process. lett., 93, 5, 255-261, (2005) · Zbl 1173.68828
[23] Shi, Y.; Eberhart, R.C., A modified particle swarm optimizer. IEEE international conference on evolutionary computation, (1998), Anchorage Alaska, pp. 69-73
[24] Specht, D.F., Probabilistic neural network, Neural netw., 3, 109-118, (1990)
[25] Stacey, A.; Jancic, M.; Grundy, I., Particle swarm optimization with mutation, (), 1425-1430
[26] Statnikov, A.; Aligeris, C.F.; Tsamardinos, L.; Hardin, D.; Levy, S., A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis, Bioinformatics, 21, 5, 631-643, (2004), Sep
[27] Tang, E.K.; Suganthan, P.; Yao, X., Gene selection algorithms for microarray data based on least squares support vector machine, Bioinformatics, 7, 95, (2006)
[28] Weston, J.; Watkins, C., Support vector machines for multi-class pattern recognition, (), 21-23
[29] Yang, J.H.; Honavar, V., Feature subset selection using a genetic algorithm, IEEE intell. syst., 13, 2, 44-49, (1998)
[30] Yu, B.; Yuan, B., A more efficient branch and bound algorithm for feature selection, Pattern recognit., 26, 6, 883-889, (1993)
[31] Zhang, H.; Sun, G., Feature selection using tabu search method, Pattern recognit., 35, 701-711, (2002) · Zbl 0999.68231
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.