×

Neighborhood rough set based heterogeneous feature subset selection. (English) Zbl 1154.68466

Summary: Feature subset selection is viewed as an important preprocessing step for pattern recognition, machine learning and data mining. Most of researches are focused on dealing with homogeneous feature selection, namely, numerical or categorical features. In this paper, we introduce a neighborhood rough set model to deal with the problem of heterogeneous feature subset selection. As the classical rough set model can just be used to evaluate categorical features, we generalize this model with neighborhood relations and introduce a neighborhood rough set model. The proposed model will degrade to the classical one if we specify the size of neighborhood zero. The neighborhood model is used to reduce numerical and categorical features by assigning different thresholds for different kinds of attributes. In this model the sizes of the neighborhood lower and upper approximations of decisions reflect the discriminating capability of feature subsets. The size of lower approximation is computed as the dependency between decision and condition attributes. We use the neighborhood dependency to evaluate the significance of a subset of heterogeneous features and construct forward feature subset selection algorithms. The proposed algorithms are compared with some classical techniques. Experimental results show that the neighborhood model based method is more flexible to deal with heterogeneous data.

MSC:

68T05 Learning and adaptive systems in artificial intelligence

Software:

UCI-ml
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Bhatt, R. B.; Gopal, M., On fuzzy-rough sets approach to feature selection, Pattern Recognition Letters, 26, 965-975 (2005)
[2] Chen, D. G.; Wang, C. Z.; Hu, Q. H., A new approach to attribute reduction of consistent and inconsistent covering decision systems with covering rough sets, Information Sciences, 177, 3500-3518 (2007) · Zbl 1122.68131
[3] Ching, J. Y.; Wong, A. K.C.; Chan, K. C.C., class-dependent discretization for inductive learning from continuous and mixed-mode data, IEEE Transactions on PAMI, 17, 641-651 (1995)
[4] Dash, M.; Liu, H., Consistency-based search in feature selection, Artificial Intelligence, 151, 155-176 (2003) · Zbl 1082.68791
[9] Hu, Q. H.; Yu, D. R.; Xie, Z. X.; Liu, J. F., Fuzzy probabilistic approximation spaces and their information measures, IEEE Transactions on Fuzzy Systems, 14, 191-201 (2006)
[10] Hu, Q. H.; Yu, D. R.; Xie, Z. X., Information-preserving hybrid data reduction based on fuzzy-rough techniques, Pattern Recognition Letters, 27, 414-423 (2006)
[11] Hu, Q. H.; Yu, D. R.; Xie, Z. X., Neighborhood classifiers, Expert Systems with Applications, 34, 866-876 (2008)
[12] Hu, Q. H.; Liu, J. F.; Yu, D. R., Mixed feature selection based on granulation and approximation, Knowledge-Based Systems, 21, 294-304 (2008)
[13] Jensen, R.; Shen, Q., Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches, IEEE Transactions of Knowledge and Data Engineering, 16, 1457-1471 (2004)
[14] Jensen, R.; Shen, Q., Fuzzy-rough sets assisted attribute selection, IEEE Transactions on Fuzzy Systems, 15, 1, 73-89 (2007)
[15] Jin, W.; Tung, Anthony K. H.; Han, J.; Wang, W., Ranking outliers using symmetric neighborhood relationship, PAKDD, 577-593 (2006)
[16] Li, Y.; Shiu, S. C.K.; Pal, S. K., Combining feature reduction and case selection in building CBR classifiers, IEEE Transactions on Knowledge and Data Engineering, 18, 3, 415-429 (2006)
[18] Lin, T. Y., Granulation and Nearest Neighborhoods: Rough Set Approach. Granulation and Nearest Neighborhoods: Rough Set Approach, Granular Computing: An Emerging Paradigm (2001), Physica-Verlag: Physica-Verlag Heidelberg, Germany, pp. 125-142 · Zbl 0986.68144
[20] Liu, H.; Yu, L., Towards integrating feature selection algorithms for classification and clustering, IEEE Transactions on Knowledge and Data Engineering, 17, 491-502 (2005)
[21] Muni, D. P.; Pal, N. R.; Das, J., Genetic programming for simultaneous feature selection and classifier design, IEEE Transactions on Systems, Man, and Cybernetics Part B: Cybernetics, 36, 1, 106-117 (2006)
[22] Neumann, J.; Schnorr, C.; Steidl, G., Combined SVM-based feature selection and classification, Machine Learning, 61, 129-150 (2005) · Zbl 1137.90643
[24] Pawlak, Z., Rough Sets, Theoretical Aspects of Reasoning About Data (1991), Kluwer Academic Publishers: Kluwer Academic Publishers Dordrecht · Zbl 0758.68054
[25] Pawlak, Z.; Skowron, A., Rough Sets: Some Extensions, Information Sciences, 177, 28-40 (2007) · Zbl 1142.68550
[26] Randall Wilson, D.; Martinez, Tony R., Improved heterogeneous distance functions, Journal of Artificial Intelligence Research, 6, 1-34 (1997) · Zbl 0894.68118
[27] Shin, H.; Cho, S., Invariance of neighborhood relation under input space to feature space mapping, Pattern Recognition Letters, 26, 707-718 (2005)
[29] Slowinski, R.; Vanderpooten, D., A generalized definition of rough approximations based on similarity, IEEE Transactions on Knowledge and Data Engineering, 12, 331-336 (2000)
[30] Skowron, A.; Stepaniuk, J., Tolerance approximation spaces, Fundamenta Informaticae, 27, 245-253 (1996) · Zbl 0868.68103
[31] Shen, Q.; Jensen, R., Selecting informative features with fuzzy-rough sets and its application for complex systems monitoring, Pattern recognition, 37, 1351-1363 (2004) · Zbl 1070.68600
[32] Stefanowski, J., On rough set based approaches to induction of decision rules, (Polkowski, L.; Skowron, A., Rough Sets in Knowledge Discovery (1998), Physica-Verlag: Physica-Verlag Heidelberg, Germany), 501-529 · Zbl 0927.68094
[33] Swiniarski, R. W.; Skowron, A., Rough set methods in feature selection and recognition, Pattern Recognition Letters, 24, 833-849 (2003) · Zbl 1053.68093
[34] Tang, W. Y.; Mao, K. Z., Feature selection algorithm for mixed data with both nominal and continuous features, Pattern Recognition Letters, 28, 5, 563-571 (2007)
[35] Wang, H., Nearest neighbors by neighborhood counting, IEEE Transactions on PAMI, 28, 942-953 (2006)
[36] Yao, Y. Y., Relational interpretations of neighborhood operators and rough set approximation operators, Information Sciences, 111, 239-259 (1998) · Zbl 0949.68144
[37] Yeung, D. S.; Chen, D. G.; Tsang, E. C.C.; Lee, J. W.T.; Wang, X. Z., On the generalization of fuzzy rough sets, IEEE Transactions on Fuzzy Systems, 13, 3, 343-361 (2005)
[38] Yu, D. R.; Hu, Q. H.; Bao, W., Combining rough set methodology and fuzzy clustering for knowledge discovery from quantitative data, Proceedings of the CSEE, 24, 6, 205-210 (2004)
[40] Yu, L.; Liu, H., Efficient feature selection via analysis of relevance and redundancy, Journal of Machine Learning Research, 5, 1205-1224 (2004) · Zbl 1222.68340
[41] Zhu, Z. X.; Ong, Y. S.; Dash, M., Wrapper-filter feature selection algorithm using a memetic framework, IEEE Transactions on Systems, Man, and Cybernetics Part B: Cybernetics, 37, 70-76 (2007)
[42] Zadeh, L., Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic, Fuzzy Sets and Systems, 19, 111-127 (1997) · Zbl 0988.03040
[43] Zadeh, L., Fuzzy logic equals computing with words, IEEE Transactions on Fuzzy Systems, 4, 2, 103-111 (1996)
[44] Zhong, N.; Dong, J.; Ohsuga, S., Using rough sets with heuristics for feature selection, Journal of Intelligent Information Systems, 16, 3, 199-214 (2001) · Zbl 0994.68149
[45] Zhu, W., Generalized rough sets based on relations, Information Sciences, 177, 4997-5011 (2007) · Zbl 1129.68088
[46] Zhu, W.; Wang, F.-Y., Reduction and axiomization of covering generalized rough sets, Information Sciences, 152, 217-230 (2003) · Zbl 1069.68613
[47] Ziarko, W., Variable precision rough sets model, Journal of Computer and System Sciences, 46, 1, 39-59 (1993) · Zbl 0764.68162
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.