×

Distributed approach for computing rough set approximations of big incomplete information systems. (English) Zbl 1475.68381

Summary: The size of information gathered from real world applications today is staggering. To make matters worse, this information may also be incomplete, due to errors in measurement or lack of discipline. The two phenomena give rise to a big incomplete information system (IIS). The processing of a big IIS is difficult because of its two problems, big size and incompleteness, and the present work introduces an approach that addresses both. Specifically, we develop an efficient rough set theoretic (RST) algorithm to compute the approximation space of the IIS, which addresses the incompleteness problem. Then we distribute the computational chores of the algorithm using the MapReduce framework, which addresses the size problem. The approach is explained fully, and a detailed illustrative example is provided. For validation and performance analysis, the approach has been implemented and tested on four publicly-accessible big IISs for many metrics including sizeup, scaleup, and speedup. The experimental results attest to its validity, accuracy and efficiency. A comparison test with similar approaches shows that it has superior performance.

MSC:

68T37 Reasoning under uncertainty in the context of artificial intelligence
68T09 Computational aspects of data analysis and big data
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] J.G. Bazan, R. Latkowski, M. Szczuka, Missing template decomposition method and its implementation in rough set exploration system, in: International Conference on Rough Sets and Current Trends in Computing, Granada and Madrid, Spain, 2006, pp. 254-263. https://doi.org/10.1007/11908029_28 · Zbl 1162.68684
[2] A.K. Shekar, M. Pappik, P.I. Sanchez, E. Maller, Selection of relevant and non-redundant multivariate ordinal patterns for time series classification, in: International Conference on Discovery Science, Limassol, Cyprus, 2018, pp. 224-240. https://doi.org/10.1007/978-3-030-01771-2_15
[3] Zhang, Q.; Xie, Q.; Wang, G., A survey on rough set theory and its applications, CAAI Trans. Intell. Technol., 1, 4, 323-333 (2016)
[4] Grzegorowski, M.; Slezak, D., On resilient feature selection: Computational foundations of rC-reducts, Inf. Sci., 499, 25-44 (2019) · Zbl 1451.68224
[5] Cao, T.; Yamada, K.; Unehara, M.; Suzuki, I., Semi-supervised based rough set to handle missing decision data, (IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Vancouver, BC, Canada (2017)), 1948-1954
[6] Cao, T.; Yamada, K.; Unehara, M.; Suzuki, I.; Do, V. N., Rough set model in incomplete decision systems, J. Adv. Comput. Intell. Intell. Inform., 21, 7, 1221-1231 (2018)
[7] Clark, P. G.; Gao, C.; Grzymala-Busse, J. W.; Mroczek, T., Characteristic sets and generalized maximal consistent blocks in mining incomplete data, Inf. Sci., 453, 66-79 (2018) · Zbl 1440.68204
[8] Gupta, K.; Janghel, R. R., Dimensionality reduction-based breast cancer classification using machine learning, computational intelligence: theories, Appl. Future Directions, 133-146 (2019)
[9] Rajesh, T.; Malar, R. S.M.; Geetha, M. R., Brain tumor detection using optimisation classification based on rough set theory, Cluster Comput., 22, 6, 13853-13859 (2019)
[10] Slezak, D.; Grzegorowski, M.; Janusz, A.; Kozielski, M.; Nguyen, S. H.; Sikora, M.; Stawicki, S.; Wrobel, L., A framework for learning and embedding multi-sensor forecasting models into a decision support system: A case study of methane concentration in coal mines, Inf. Sci., 451, 112-133 (2018)
[11] Attia, A. H.; Sherif, A. S.; El-Tawel, G. S., Maximal limited similarity-based rough set model, Soft. Comput., 20, 8, 3153-3161 (2016) · Zbl 1370.68269
[12] Alarabi, L.; Mokbel, M. F.; Musleh, M., St-hadoop: a mapreduce framework for spatio-temporal data, GeoInformatica, 22, 4, 785-813 (2018)
[13] Fekade, B.; Maksymyuk, T.; Kyryk, M.; Jo, M., Probabilistic recovery of incomplete sensed data in IoT, IEEE Internet Things J., 5, 4, 2282-2292 (2017)
[14] Yelipe, U.; Porika, S.; Golla, M., An efficient approach for imputation and classification of medical data values using class-based clustering of medical records, Comput. Electr. Eng., 66, 487-504 (2018)
[15] Che, Z.; Purushotham, S.; Cho, K.; Sontag, D.; Liu, Y., Recurrent neural networks for multivariate time series with missing values, Sci. Rep., Nature, 8, 1, 1-12 (2018)
[16] Imani, F.; Cheng, C.; Chen, R.; Yang, H., Nested gaussian process modeling for high-dimensional data imputation in healthcare systems, (Institute of Industrial and Systems Engineers, Annual Conference and Expo (2018), IISE: IISE Orlando, United States), 19-22
[17] H. Sakai, M. Nakata, D. Slezak, The Lower and the Upper Systems of Rules in Tables with Missing Values, in: Database Theory and Application, Bio-Science and Bio-Technology, Jeju Island, Korea, 2010, pp. 132-141. doi: 10.1007/978-3-642-17622-7_14.
[18] Nakata, M.; Sakai, H., Describing rough approximations by indiscernibility relations in information tables with incomplete information, (International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Eindhoven, The Netherlands (2016)), 355-366 · Zbl 1455.68213
[19] R. Saedudin, H. Mahdin, S. Kasim, E. Sutoyo, E., I.T.R. Yanto, R. Hassan, A Relative Tolerance Relation of Rough Set for Incomplete Information Systems, in: International Conference on Soft Computing and Data Mining, Johor, Malaysia, 2018, pp. 72-81. https://doi.org/10.1007/978-3-319-72550-5_8
[20] Dai, J.; Gao, S.; Zheng, G., Generalized rough set models determined by multiple neighborhoods generated from a similarity relation, Soft. Comput., 22, 7, 2081-2094 (2018) · Zbl 1398.68534
[21] Riza, L. S.; Janusz, A.; Bergmeir, C.; Cornelis, C.; Herrera, F.; Śle, D.; Benítez, J. M., Implementing algorithms of rough set theory and fuzzy rough set theory in the R package “RoughSets, Inf. Sci., 287, 68-89 (2014)
[22] A. Wojna, R. Latkowski, Rseslib 3: Library of rough set and machine learning methods with extensible architecture, in: Lecture Notes in Computer Science, 10810, Springer, Berlin, Heidelberg, 2019, pp. 301-323 doi: 10.1007/978-3-662-58768-3_7.
[23] Villuendas-Rey, Y., Maximal similarity granular rough sets for mixed and incomplete information systems, Soft. Comput., 23, 13, 4617-4631 (2019) · Zbl 1418.68213
[24] Jing, S. Y.; Li, G. L.; Zeng, K.; Pan, W.; Liu, C. M., Efficient parallel algorithm for computing rough set approximation on GPU, Soft. Comput., 22, 22, 7553-7569 (2018)
[25] Zhang, J.; Zhu, Y.; Pan, Y.; Li, T., Efficient parallel boolean matrix based algorithms for computing composite rough set approximations, Inf. Sci., 329, 287-302 (2016) · Zbl 1390.68687
[26] Jing, S.; Liu, C.; Li, G.; Yan, G.; Zhang, Y., An efficient algorithm for parallel computation of rough entropy using cuda, (13th International Conference on Computational Intelligence and Security (CIS), Hong Kong, China (2017)), 1-15
[27] Yang, J.; Jing, S., Acceleration of Feature Subset Selection Using CUDA, (14th International Conference on Computational Intelligence and Security (CIS), Hangzhou, China (2018)), 140-144
[28] Jing, S. Y.; Yang, J., High-performance attribute reduction on graphics processing unit, J. Exp. Theor. Artif. Intell., 1-20 (2020)
[29] Yuan, J.; Chen, M.; Jiang, T.; Li, T., Complete tolerance relation based parallel filling for incomplete energy big data, Knowl.-Based Syst., 132, 215-225 (2017)
[30] Chen, M.; Yuan, J.; Li, L.; Liu, D.; Li, T., A fast heuristic attribute reduction algorithm using spark, (IEEE 37th International Conference on Distributed Computing Systems (ICDCS), GA, USA (2017)), 2393-2398
[31] Z.C. Dagdia, C. Zarges, B. Schannes, M. Micalef, L. Galiana, B. Rolland, M. Benchoufi, Rough set theory as a data mining technique: A case study in epidemiology and cancer incidence prediction, in: Part of the Lecture Notes in Computer Science, 11053, Springer, Cham, 2019, pp. 440-455. doi: 10.1007/978-3-030-10997-4_27.
[32] Sowkuntla, P.; Prasad, P. S., MapReduce based improved quick reduct algorithm with granular refinement using vertical partitioning scheme, Knowl.-Based Syst., 189 (2020)
[33] Dai, G.; Jiang, T.; Mu, Y.; Zhang, N.; Liu, H.; Hassanien, A. E., A Novel Rough Sets Positive Region Based Parallel Multi-reduction Algorithm, (Part of the Advances in Intelligent Systems and Computing, 548 (2018), Springer: Springer Cham), 515-524
[34] Qian, J.; Xia, M.; Yue, X., Parallel knowledge acquisition algorithms for big data using MapReduce, Int. J. Mach. Learning Cybern., 9, 6, 1007-1021 (2018)
[35] El-Alfy, E. S.M.; Alshammari, M. A., Towards scalable rough set based attribute subset selection for intrusion detection using parallel genetic algorithm in MapReduce, Simul. Model. Pract. Theory, 64, 18-29 (2016)
[36] Bhuvaneshwarri, I.; Tamilarasi, A., Optimization of big data using rough set theory and data mining for textile applications, Artif. Intell. Evol. Comput. Eng. Syst., 69-77 (2020)
[37] Yang, X.; Yang, J., Incomplete information system and rough set theory, Models and attribute reductions, Science Press: Springer (2012) · Zbl 1262.68008
[38] D. Newman, S. Hettich, C. Blake, C. Merz, UCI Repository of Machine Learning Databases, University of California, Department of Information and Computer Science, Irvine, CA, 1998. https://archive.ics.uci.edu/ml/index.php. Retrieved October 11, 2019.
[39] Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I. H., The weka data mining software: an update, ACM SIGKDD Explorations Newsletter, 11, 1, 10-18 (2009)
[40] Chen, H.; Li, T.; Cai, Y.; Luo, C.; Fujita, H., Parallel features reduction in dominance-based neighborhood rough set, Inf. Sci., 373, 351-368 (2016) · Zbl 1429.68280
[41] Qian, J.; Miao, D. Q.; Zhang, Z. H.; Yue, X. D., Parallel attribute reduction algorithms using mapreduce, Inf. Sci., 279, 671-690 (2014) · Zbl 1354.68265
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.