×

zbMATH — the first resource for mathematics

Heuristic-based feature selection for rough set approach. (English) Zbl 07264275
Summary: The paper presents the proposed research methodology, dedicated to the application of greedy heuristics as a way of gathering information about available features. Discovered knowledge, represented in the form of generated decision rules, was employed to support feature selection and reduction process for induction of decision rules with classical rough set approach. Observations were executed over input data sets discretised by several methods. Experimental results show that elimination of less relevant attributes through the proposed methodology led to inferring rule sets with reduced cardinalities, while maintaining rule quality necessary for satisfactory classification.
MSC:
68T37 Reasoning under uncertainty in the context of artificial intelligence
Software:
C4.5; GuideR
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Pawlak, Z., Rough sets and intelligent data analysis, Inf. Sci., 147, 1-12 (2002) · Zbl 1018.68082
[2] Pawlak, Z.; Skowron, A., Rudiments of rough sets, Inf. Sci., 177, 1, 3-27 (2007) · Zbl 1142.68549
[3] An, A.; Cercone, N., Rule quality measures improve the accuracy of rule induction: an experimental approach, (Raś, Z. W.; Ohsuga, S., Foundations of Intelligent Systems. Foundations of Intelligent Systems, ISMIS 2000. Foundations of Intelligent Systems. Foundations of Intelligent Systems, ISMIS 2000, Lecture Notes in Computer Science, vol. 1932 (2000), Springer), 119-129 · Zbl 0983.68770
[4] Wróbel, L.; Sikora, M.; Michalak, M., Rule quality measures settings in classification, regression and survival rule induction — an empirical approach, Fundam. Inform., 149, 419-449 (2016)
[5] Nguyen, H. S., Approximate Boolean reasoning: foundations and applications in data mining, (Peters, J. F.; Skowron, A., Transactions on Rough Sets V. Transactions on Rough Sets V, Lecture Notes in Computer Science, vol. 4100 (2006), Springer), 334-506 · Zbl 1136.68497
[6] Pawlak, Z.; Skowron, A., Rough sets and Boolean reasoning, Inf. Sci., 177, 1, 41-73 (2007) · Zbl 1142.68551
[7] Amin, T.; Chikalov, I.; Moshkov, M.; Zielosko, B., Dynamic programming approach to optimization of approximate decision rules, Inf. Sci., 119, 403-418 (2013) · Zbl 1293.68265
[8] Amin, T.; Chikalov, I.; Moshkov, M.; Zielosko, B., Relationships between length and coverage of decision rules, Fundam. Inform., 129, 1-2, 1-13 (2014) · Zbl 1285.68141
[9] Zielosko, B., Application of dynamic programming approach to optimization of association rules relative to coverage and length, Fundam. Inform., 148, 1-2, 87-105 (2016) · Zbl 1373.68408
[10] Błaszczyński, J.; Słowiński, R.; Szeląg, M., Sequential covering rule induction algorithm for variable consistency rough set approaches, Inf. Sci., 181, 5, 987-1002 (2011)
[11] Clark, P.; Niblett, T., The CN2 induction algorithm, Mach. Learn., 3, 4, 261-283 (1989)
[12] Sikora, M.; Wróbel, L.; Gudyś Guider, A., A guided separate-and-conquer rule learning in classification, regression, and survival settings, Knowl.-Based Syst., 173, 1-14 (2019)
[13] Quinlan, J. R., C4.5: Programs for Machine Learning (1993), Morgan Kaufmann Publishers Inc.
[14] Azad, M.; Zielosko, B.; Moshkov, M.; Chikalov, I., Decision rules, trees and tests for tables with many-valued decisions-comparative study, (Watada, J.; Jain, L. C.; Howlett, R. J.; Mukai, N.; Asakura, K., Proceedings of the 17th International Conference in Knowledge Based and Intelligent Information and Engineering Systems. Proceedings of the 17th International Conference in Knowledge Based and Intelligent Information and Engineering Systems, KES 2013. Proceedings of the 17th International Conference in Knowledge Based and Intelligent Information and Engineering Systems. Proceedings of the 17th International Conference in Knowledge Based and Intelligent Information and Engineering Systems, KES 2013, Procedia Computer Science, vol. 22 (2013), Elsevier), 87-94
[15] Ang, J.; Tan, K.; Mamun, A., An evolutionary memetic algorithm for rule extraction, Expert Syst. Appl., 37, 2, 1302-1315 (2010)
[16] Ślȩzak, D.; Wróblewski, J., Order based genetic algorithms for the search of approximate entropy reducts, (Wang, G.; Liu, Q.; Yao, Y.; Skowron, A., Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, RSFDGrC 2003. Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, RSFDGrC 2003, Lecture Notes in Computer Science, vol. 2639 (2003), Springer), 308-311 · Zbl 1026.68654
[17] Moshkov, M. J.; Piliszczuk, M.; Zielosko, B., On construction of partial reducts and irreducible partial decision rules, Fundam. Inform., 75, 1-4, 357-374 (2007) · Zbl 1108.68116
[18] Stańczyk, U.; Zielosko, B.; Żabiński, K., Application of greedy heuristics for feature characterisation and selection: a case study in stylometric domain, (Nguyen, H.; Ha, Q.; Li, T.; Przybyla-Kasperek, M., Rough Sets. Rough Sets, IJCRS 2018. Rough Sets. Rough Sets, IJCRS 2018, Lecture Notes in Computer Science, vol. 11103 (2018), Springer: Springer Quy Nhon, Vietnam), 350-362
[19] (Guyon, I.; Gunn, S.; Nikravesh, M.; Zadeh, L., Feature Extraction: Foundations and Applications. Feature Extraction: Foundations and Applications, Studies in Fuzziness and Soft Computing, vol. 207 (2006), Physica-Verlag, Springer) · Zbl 1114.68059
[20] Guyon, I.; Elisseeff, A., An introduction to variable and feature selection, J. Mach. Learn. Res., 3, 1157-1182 (2003) · Zbl 1102.68556
[21] (Argamon, S.; Burns, K.; Dubnov, S., The Structure of Style: Algorithmic Approaches to Understanding Manner and Meaning (2010), Springer: Springer Berlin)
[22] Stamatatos, E., A survey of modern authorship attribution methods, J. Am. Soc. Inf. Sci. Technol., 60, 3, 538-556 (2009)
[23] Dougherty, J.; Kohavi, R.; Sahami, M., Supervised and unsupervised discretization of continuous features, (Machine Learning Proceedings 1995: Proceedings of the 12th International Conference on Machine Learning (1995), Elsevier), 194-202
[24] Liu, H.; Motoda, H., Computational Methods of Feature Selection, Data Mining and Knowledge Discovery (2007), Chapman & Hall/CRC
[25] Janusz, A.; Ślȩzak, D., Rough set methods for attribute clustering and selection, Appl. Artif. Intell., 28, 3, 220-242 (2014)
[26] Jensen, R.; Shen, Q., Computational Intelligence and Feature Selection: Rough and Fuzzy Approaches, IEEE Press Series on Computational Intelligence (2008), Wiley-IEEE Press
[27] Stańczyk, U., Ranking of characteristic features in combined wrapper approaches to selection, Neural Comput. Appl., 26, 2, 329-344 (2015)
[28] Kohavi, R.; John, G., Wrappers for feature subset selection, Artif. Intell., 97, 1, 273-324 (1997) · Zbl 0904.68143
[29] Stańczyk, U., Weighting of attributes in an embedded rough approach, (Gruca, A.; Czachórski, T.; Kozielski, S., Man-Machine Interactions 3. Man-Machine Interactions 3, Advances in Intelligent and Soft Computing, vol. 242 (2013), Springer-Verlag: Springer-Verlag Berlin, Germany), 475-483
[30] Stańczyk, U., Selection of decision rules based on attribute ranking, J. Intell. Fuzzy Syst., 29, 2, 899-915 (2015)
[31] Jia, X.; Shang, L.; Zhou, B.; Yao, Y., Generalized attribute reduct in rough set theory, Knowl.-Based Syst., 91, 204-218 (2016)
[32] Grzegorowski, M.; Ślȩzak, D., On resilient feature selection: computational foundations of r-C-reducts, Inf. Sci., 499, 25-44 (2019)
[33] Ge, H.; Li, L.; Xu, Y.; Yang, C., Quick general reduction algorithms for inconsistent decision tables, Int. J. Approx. Reason., 82, 56-80 (2017) · Zbl 1404.68162
[34] Liang, J.; Wang, F.; Dang, C.; Qian, Y., An efficient rough feature selection algorithm with a multi-granulation view, Int. J. Approx. Reason., 53, 6, 912-926 (2012)
[35] Raza, M. S.; Qamar, U., Feature selection using rough set-based direct dependency calculation by avoiding the positive region, Int. J. Approx. Reason., 92, 175-197 (2018) · Zbl 1423.68512
[36] Yang, Y.; Chen, D.; Wang, H.; Tsang, E.; Zhang, D., Fuzzy rough set based incremental attribute reduction from dynamic data with sample arriving, Fuzzy Sets Syst., 312, 66-86 (2017), theme: Fuzzy Rough Sets · Zbl 1368.68301
[37] Yang, Y.; Chen, D.; Wang, H., Active sample selection based incremental algorithm for attribute reduction with rough sets, IEEE Trans. Fuzzy Syst., 25, 4, 825-838 (2017)
[38] Liang, J.; Wang, F.; Dang, C.; Qian, Y., A group incremental approach to feature selection applying rough set technique, IEEE Trans. Knowl. Data Eng., 26, 2, 294-308 (2014)
[39] Liu, Y.; Zheng, L.; Xiu, Y.; Yin, H.; Zhao, S.; Wang, X.; Chen, H.; Li, C., Discernibility matrix based incremental feature selection on fused decision tables, Int. J. Approx. Reason., 118, 1-26 (2020) · Zbl 07174794
[40] Yao, Y., Three-way granular computing, rough sets, and formal concept analysis, Int. J. Approx. Reason., 116, 106-125 (2020) · Zbl 07174739
[41] Wan, Q.; Li, J.; Wei, L.; Qian, T., Optimal granule level selection: a granule description accuracy viewpoint, Int. J. Approx. Reason., 116, 85-105 (2020) · Zbl 07174738
[42] Jing, Y.; Li, T.; Huang, J.; Zhang, Y., An incremental attribute reduction approach based on knowledge granularity under the attribute generalization, Int. J. Approx. Reason., 76, 80-95 (2016) · Zbl 1385.68047
[43] Ferone, A., Feature selection based on composition of rough sets induced by feature granulation, Int. J. Approx. Reason., 101, 276-292 (2018) · Zbl 1448.68415
[44] Wang, C.; Shi, Y.; Fan, X.; Shao, M., Attribute reduction based on k-nearest neighborhood rough sets, Int. J. Approx. Reason., 106, 18-31 (2019) · Zbl 1456.68190
[45] Pacheco, F.; Cerrada, M.; Sanchez, R.; Cabrera, D.; Li, C.; de Oliveira, J. V., Attribute clustering using rough set theory for feature selection in fault severity classification of rotating machinery, Expert Syst. Appl., 71, 69-86 (2017)
[46] Wang, X.; Yang, J.; Teng, X.; Xia, W.; Jensen, R., Feature selection based on rough sets and particle swarm optimization, Pattern Recognit. Lett., 28, 4, 459-471 (2007)
[47] Jensen, R.; Shen, Q., Finding rough set reducts with ant colony optimization, (Proceedings of the 2003 UK Workshop on Computational Intelligence (2003)), 15-22
[48] Chen, Y.; Zhu, Q.; Xu, H., Finding rough set reducts with fish swarm algorithm, Knowl.-Based Syst., 81, 22-29 (2015)
[49] Bazan, J.; Szczuka, M., The rough set exploration system, (Peters, J. F.; Skowron, A., Transactions on Rough Sets III. Transactions on Rough Sets III, Lecture Notes in Computer Science, vol. 3400 (2005), Springer: Springer Berlin, Heidelberg), 37-56 · Zbl 1116.68599
[50] Bazan, J.; Nguyen, H.; Nguyen, S.; Synak, P.; Wróblewski, J., Rough set algorithms in classification problem, (Polkowski, L.; Tsumoto, S.; Lin, T., Rough Set Methods and Applications: New Developments in Knowledge Discovery in Information Systems. Rough Set Methods and Applications: New Developments in Knowledge Discovery in Information Systems, Studies in Fuzziness and Soft Computing, vol. 56 (2000), Physica: Physica Heidelberg), 49-88 · Zbl 0992.68197
[51] Bonates, T.; Hammer, P. L.; Kogan, A., Maximum patterns in datasets, Discrete Appl. Math., 156, 6, 846-861 (2008) · Zbl 1140.68457
[52] Nguyen, H. S.; Ślȩzak, D., Approximate reducts and association rules - correspondence and complexity results, (RSFDGrC ’99: Proceedings of the 7th International Workshop on New Directions in Rough Sets, Data Mining, and Granular-Soft Computing. RSFDGrC ’99: Proceedings of the 7th International Workshop on New Directions in Rough Sets, Data Mining, and Granular-Soft Computing, Lecture Notes in Computer Science, vol. 1711 (1999), Springer), 137-145 · Zbl 0954.68129
[53] Feige, U., A threshold of \(\ln n\) for approximating set cover, J. ACM, 45, 634-652 (1998), ACM, New York · Zbl 1065.68573
[54] Moshkov, M.; Zielosko, B., Combinatorial Machine Learning - A Rough Set Approach, Studies in Computational Intelligence, vol. 360 (2011), Springer · Zbl 1246.68010
[55] Alsolami, F.; Amin, T.; Moshkov, M.; Zielosko, B.; Żabiński, K., Comparison of heuristics for optimization of association rules, Fundam. Inform., 166, 1, 1-14 (2019) · Zbl 1414.68102
[56] Jockers, M.; Witten, D., A comparative study of machine learning methods for authorship attribution, Lit. Linguist. Comput., 25, 2, 215-223 (2010)
[57] Koppel, M.; Schler, J.; Argamon, S., Computational methods in authorship attribution, J. Am. Soc. Inf. Sci. Technol., 60, 1, 9-26 (2009)
[58] Eder, M., Does size matter? Authorship attribution, small samples, big problem, Dig. Scholarship Humanit., 30, 167-182 (2015)
[59] Baron, G., Comparison of cross-validation and test sets approaches to evaluation of classifiers in authorship attribution domain, (Czachórski, T.; Gelenbe, E.; Grochla, K.; Lent, R., Proceedings of the 31st International Symposium on Computer and Information Sciences. Proceedings of the 31st International Symposium on Computer and Information Sciences, Communications in Computer and Information Sciences, vol. 659 (2016), Springer: Springer Cracow), 81-89
[60] Garcia, S.; Luengo, J.; Saez, J.; Lopez, V.; Herrera, F., A survey of discretization techniques: taxonomy and empirical analysis in supervised learning, IEEE Trans. Knowl. Data Eng., 25, 4, 734-750 (2013)
[61] Fayyad, U.; Irani, K., Multi-interval discretization of continuous valued attributes for classification learning, (Proceedings of the 13th International Joint Conference on Artificial Intelligence, vol. 2 (1993), Morgan Kaufmann Publishers), 1022-1027
[62] Kononenko, I., On biases in estimating multi-valued attributes, (Proceedings of the 14th International Joint Conference on Artificial Intelligence. Proceedings of the 14th International Joint Conference on Artificial Intelligence, IJCAI’95, vol. 2 (1995), Morgan Kaufmann Publishers Inc.), 1034-1040
[63] Rissanen, J., Modeling by shortest data description, Automatica, 14, 5, 465-471 (1978) · Zbl 0418.93079
[64] Witten, I.; Frank, E.; Hall, M., Data Mining. Practical Machine Learning Tools and Techniques (2011), Morgan Kaufmann
[65] Lindgren, T., Methods for rule conflict resolution, (Boulicaut, J.; Esposito, F.; Giannotti, F.; Pedreschi, D., Machine Learning: ECML 2004. Machine Learning: ECML 2004, Lecture Notes in Computer Science, vol. 3201 (2004), Springer: Springer Berlin, Heidelberg), 262-273 · Zbl 1132.68572
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.