×

A hybrid system with filter approach and multiple population genetic algorithm for feature selection in credit scoring. (English) Zbl 1377.62201

Summary: With the financial crisis happened in 2007, massive credit risks are exposed to the banking sectors. So credit scoring has attracted more and more attention. Bank owns a lot of customer data. By using those data, credit scoring model can judge the applicants’ credit risk accurately. But those data are often high dimensional, and have some irrelevant features. Those irrelevant features will affect classifiers accuracy. Therefore, feature selection is an important topic. This paper proposes a two-phase hybrid approach based on filter approach and multiple population genetic algorithm-HMPGA. In phase 1, it introduces the idea of wrapper approach into three filter approaches to acquire some important prior information for initial populations setting of MPGA. In phase 2, it takes advantage of MPGA’s characteristics of global optimization and quick convergence to find optimal feature subset. This paper uses two real credit scoring datasets of UCI databases to compare HMPGA, MPGA and GA. It verifies that the accuracies of feature subsets acquired from HMPGA, MPGA and GA are superior to three filter approaches. Meanwhile, nonparametric Wilcoxon signed rank test is held to confirm that HMPGA is better than MPGA and GA. HMPGA not only can be applied to feature selection of credit scoring, but also can be applied to more fields of data mining.

MSC:

62P05 Applications of statistics to actuarial sciences and financial mathematics
62G10 Nonparametric hypothesis testing
91G40 Credit risk

Software:

C4.5; Matlab
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Oreski, S.; Oreski, D.; Oreski, G., Hybrid system with genetic algorithm and artificial neural networks and its application to retail credit risk assessment, Expert Syst. Appl., 39, 12605-12617 (2012)
[2] Khandani, A. E.; amd Andrew W. Lo, A. J.K., Consumer credit-risk models via machine-learning algorithms, J. Bank., 34, 2767-2787 (2010)
[3] Akko, S., An empirical comparison of convential techniques, neural networks and the three stage hybrid adaptive neuro fuzzy inference system (anfis) model for credit scoreing analysis: the case of turkish credit card data, European J. Oper. Res., 222, 168-178 (2012)
[4] Danenas, P.; Garsva, G.; Gudas, S., Credit risk evaluation model development using support vector based classifiers, Procedia Comput. Sci., 4, 1699-1707 (2011)
[5] Finlay, S., Multiple classifier architectures and their application to credit risk assessment, European J. Oper. Res., 210, 368-378 (2011)
[6] Tsai, M.-C.; Lin, S.-P.; Cheng, C.-C.; Lin, Y.-P., The consumer loan default predicting model an application of dea-da and neural network, Expert Syst. Appl., 36, 11682-11690 (2009)
[7] Sustersic, M.; Mramor, D.; Zupan, J., Consumer credit scoring models with limited data, Expert Syst. Appl., 36, 4736-4744 (2009)
[8] Oreski, S.; Oreski, G., Genetic algorithm-based heuristic for feature selection in credit risk assessment, Expert Syst. Appl., 41, 2052-2064 (2014)
[9] Huang, J.; Wang, H.; Wang, W.; Xiong, Z., A computational study for feature selection on customer credit evaluation, (2013 IEEE International Conference on Systems Man and Cybernetics Conference Proceedings (2013)), 2973-2978
[10] Chen, F.-L.; Li, F.-C., Combination of feature selection approaches with svm in credit scoring, Expert Syst. Appl., 37, 4902-4909 (2010)
[11] Ping, Y.; Yongheng, L., Neighborhood rough set and svm based hybrid credit scoring classifier, Expert Syst. Appl., 38, 11300-11304 (2011)
[12] Huang, C.-L.; Chen, M.-C.; Wang, C.-J., Credit scoring with a data mining approach based on support vector machines, Expert Syst. Appl., 33, 847-856 (2007)
[13] Bellotti, T.; Crook, J., Support vector machines for credit scoring and discovery of significant features, Expert Syst. Appl., 36, 3302-3308 (2009)
[14] Quinlan, J. R., Learning efficient classification procedures and their application to chess end games, (Machine Learning: An Artificial Intelligence Approach, San Francisco, CA:Morgan Kaufmann (1983)), 463-482
[15] Quinlan, J. R., C4.5: Programs for Machine Learning (1993), Morgan kaufmann: Morgan kaufmann San Francisco
[16] Breiman, L.; Friedman, J.; Olshen, R.; Stone, C. J., Classification and Regression Trees (1984), Wadsworth & Brooks: Wadsworth & Brooks Monterey · Zbl 0541.62042
[17] Tuv, E.; Borisov, A.; Runger, G. C.; Torkkola, K., Feature selection with ensembles, artificial variables, and redundancy elimination, J. Mach. Learn. Res., 10, 1341-1366 (2009) · Zbl 1235.62003
[18] Peralta, B.; Soto, A., Embedded local feature selection within mixture of experts, Inform. Sci., 269, 176-187 (2014)
[19] Jin, C.; Jin, S.-W.; Qin, L.-N., Attribute selection method based on a hybrid bpnn and pso algorithms, Appl. Soft Comput., 12, 2147-2155 (2012)
[20] Pezzella, F.; Morganti, G.; Ciaschetti, G., A genetic algorithm for the flexible job-shop scheduling problem, Comput. Oper. Res., 35, 3202-3212 (2008) · Zbl 1162.90014
[21] Talavera, L., An evaluation of filter and wrapper methods for feature selection in categorical clustering, Lecture Notes in Comput. Sci., 3646, 440-451 (2005) · Zbl 1165.68432
[22] Chen, Y.-W.; Lin, C.-J., Combining svms with various feature selection strategies, Stud. Fuzziness Soft Comput., 207, 315-324 (2006)
[23] Quinlan, J. R., Induction of decision trees, Mach. Learn., 1, 81-106 (1986)
[24] Shi, F.; Wang, H.; Yu, L.; Hu, F., 30 Cases of Intelligent Algorithm in Matlab (2011), Beijing University of Aeronautics and Astronautics Press: Beijing University of Aeronautics and Astronautics Press Beijing
[25] Potts, J.; Giddens, T. D.; Yadav, S., The development and evaluation of an improved genetic algorithm based on migration and artificial selection, IEEE Trans. Syst. Man Cybern., 24, 73-86 (1994)
[26] Huang, M.-L.; Hung, Y.-H.; Lee, W. M.; andBo Ru Jiang, R. K.L., Svm-rfe based feature selection and taguchi parameters optimization for multiclass svm classifier, Sci. World J., 2014 (2014), 795624-795624
[27] Wang, J.; Du, H.; Yao, X.; Hu, Z., Using classification structure pharmacokinetic relationship (scpr) method to predict drug bioavailability based on grid-search support vector machine, Anal. Chim. Acta, 601, 156-163 (2007)
[28] Kudo, M.; Sklansky, J., Comparison of algorithms that select features for pattern classifiers, Pattern Recognit., 33, 25-41 (2000)
[29] Ghamisi, P.; Benediktsson, J. A., Feature selection based on hybridization of genetic algorithm and particle swarm optimization, IEEE Geosci. Remote Sens. Lett., 12, 309-313 (2015)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.