×

Local rough set: a solution to rough data analysis in big data. (English) Zbl 1445.68223

Summary: As a supervised learning method, classical rough set theory often requires a large amount of labeled data, in which concept approximation and attribute reduction are two key issues. With the advent of the age of big data however, labeling data is an expensive and laborious task and sometimes even infeasible, while unlabeled data are cheap and easy to collect. Hence, techniques for rough data analysis in big data using a semi-supervised approach, with limited labeled data, are desirable. Although many concept approximation and attribute reduction algorithms have been proposed in the classical rough set theory, quite often, these methods are unable to work well in the context of limited labeled big data. The challenges to classical rough set theory can be summarized with three issues: limited labeled property of big data, computational inefficiency and over-fitting in attribute reduction. To address these three challenges, we introduce a theoretic framework called local rough set, and develop a series of corresponding concept approximation and attribute reduction algorithms with linear time complexity, which can efficiently and effectively work in limited labeled big data. Theoretical analysis and experimental results show that each of the algorithms in the local rough set significantly outperforms its original counterpart in classical rough set theory. It is worth noting that the performances of the algorithms in the local rough set become more significant when dealing with larger data sets.

MSC:

68T37 Reasoning under uncertainty in the context of artificial intelligence
03E72 Theory of fuzzy sets, etc.
68T09 Computational aspects of data analysis and big data

Software:

LERS; UCI-ml
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Ben-David, A.; Sterling, L.; Tran, T. D., Adding monotonicity to learning algorithms may impair their accuracy, Expert Syst. Appl., 36, 3, 6627-6634, (2009)
[2] Bhatt, R. B.; Gopal, M., On fuzzy-rough sets approach to feature selection, Pattern Recognit. Lett., 26, 7, 965-975, (2005)
[3] Dubois, D.; Prade, H., Rough fuzzy sets and fuzzy rough sets, Int. J. Gen. Syst., 17, 191-209, (1990) · Zbl 0715.04006
[4] Düntsch, I.; Gediga, G., Uncertainty measures of rough set prediction, Artif. Intell., 106, 1, 109-137, (1998) · Zbl 0909.68040
[5] Eskandari, S.; Javidi, M. M., Online streaming feature selection using rough sets, Int. J. Approx. Reason., 69, 35-57, (2016) · Zbl 1344.68187
[6] Gediga, G.; Düntsch, I., Rough approximation quality revisited, Artif. Intell., 132, 2, 219-234, (2001) · Zbl 0983.68194
[7] Greco, S.; Matarazzo, B.; Slowinski, R., Rough approximation of a preference relation by dominance relations, Eur. J. Oper. Res., 117, 1, 63-83, (1999) · Zbl 0998.90044
[8] Grzymala-Busse, J. W., LERS—A system for learning from examples based on rough sets, (1992), Springer Netherlands
[9] Guyon, I.; Elisseeff, A., An introduction to variable feature selection, J. Mach. Learn. Res., 3, 1157-1182, (2003) · Zbl 1102.68556
[10] Hu, Q.; Xie, Z.; Yu, D., Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation, Pattern Recognit., 3509-3521, (2007) · Zbl 1129.68073
[11] Hu, Q.; Yu, D.; Liu, J.; Wu, C., Neighborhood rough set based heterogeneous feature subset selection, Inf. Sci., 178, 18, 3577-3594, (2008) · Zbl 1154.68466
[12] Hu, X.; Cercone, N., Learning in relational databases: a rough set approach, Comput. Intell., 11, 2, 323-338, (1995)
[13] Jensen, R.; Shen, Q., Computational intelligence and feature selection: rough and fuzzy approaches, (2008), Wiley-IEEE Press
[14] Kohavi, R.; John, G. H., Wrappers for feature subset selection, Artif. Intell., 97, 1-2, 273-324, (1997) · Zbl 0904.68143
[15] Kryszkiewicz, M., Rough set approach to incomplete information systems, Inf. Sci., 112, 1-4, 39-49, (1998) · Zbl 0951.68548
[16] Kryszkiewicz, M., Rules in incomplete information systems, Inf. Sci., 113, 3-4, 271-292, (1999) · Zbl 0948.68214
[17] Lezak, D.; Ziarko, W., The investigation of the Bayesian rough set model, Int. J. Approx. Reason., 40, 1, 81-91, (2005) · Zbl 1099.68089
[18] Liang, D.; Liu, D.; Kobina, A., Three-way group decisions with decision-theoretic rough sets, Inf. Sci., 345, 46-64, (2016)
[19] Liang, J.; Wang, F.; Dang, C.; Qian, Y., An efficient rough feature selection algorithm with a multi-granulation view, Int. J. Approx. Reason., 53, 6, 912-926, (2012)
[20] Liang, J.; Wang, F.; Dang, C.; Qian, Y., A group incremental approach to feature selection applying rough set technique, IEEE Trans. Knowl. Data Eng., 26, 2, 294-308, (2013)
[21] Lichma, M., UCI machine learning repository, (2013), University of California, Irvine, School of Information and Computer Sciences
[22] Lin, T. Y., Data mining and machine oriented modeling: a granular computing approach, Appl. Intell., 13, 2, 113-124, (2000)
[23] Liu, D.; Li, T.; Liang, D., Incorporating logistic regression to decision-theoretic rough sets for classifications, Int. J. Approx. Reason., 55, 1, 197-210, (2014) · Zbl 1316.68185
[24] Liu, H.; Setiono, R., Feature selection via discretization, IEEE Trans. Knowl. Data Eng., 9, 4, 642-645, (1997)
[25] Pavlenko, T., On feature selection, curse-of-dimensionality and error probability in discriminant analysis, J. Stat. Plan. Inference, 115, 2, 565-584, (2001) · Zbl 1015.62066
[26] Pawlak, Z., Rough sets, Int. J. Parallel Program., 11, 5, 341-356, (1982) · Zbl 0501.68053
[27] Pedrycz, W., Granular computing: analysis and design of intelligent systems, (2013), CRC Press
[28] Pedrycz, W.; Bargiela, A., Granular clustering: a granular signature of data, IEEE Trans. Syst. Man Cybern., Part B, Cybern., 32, 2, 212-224, (2002), a publication of the IEEE Systems Man & Cybernetics Society
[29] Pedrycz, W.; Vukovich, G., Feature analysis through information granulation and fuzzy sets, Pattern Recognit., 35, 4, 825-834, (2002) · Zbl 0997.68114
[30] Polkowski, L.; Skowron, A., Rough mereology: a new paradigm for approximate reasoning, Int. J. Approx. Reason., 15, 4, 333-365, (1996) · Zbl 0938.68860
[31] Qian, Y.; Cheng, H.; Wang, J.; Liang, J.; Pedrycz, W.; Dang, C., Grouping granular structures in human granulation intelligence, Inf. Sci., 382-383, 150-169, (2017)
[32] Qian, Y.; Li, S.; Liang, J.; Shi, Z.; Wang, F., Pessimistic rough set based decisions: a multigranulation fusion strategy, Inf. Sci., 264, 6, 196-210, (2014) · Zbl 1335.68270
[33] Qian, Y.; Liang, J.; Dang, C., Incomplete multigranulation rough set, IEEE Trans. Syst. Man Cybern., Part A, 40, 2, 420-431, (2010)
[34] Qian, Y.; Liang, J.; Pedrycz, W.; Dang, C., Positive approximation: an accelerator for attribute reduction in rough set theory, Artif. Intell., 174, 9, 597-618, (2010) · Zbl 1205.68310
[35] Qian, Y.; Liang, J.; Pedrycz, W.; Dang, C., An efficient accelerator for attribute reduction from incomplete data in rough set framework, Pattern Recognit., 44, 8, 1658-1670, (2011) · Zbl 1218.68152
[36] Qian, Y.; Liang, J.; Yao, Y.; Dang, C., Mgrs: a multi-granulation rough set, Inf. Sci., 180, 6, 949-970, (2010) · Zbl 1185.68695
[37] Qian, Y.; Zhang, H.; Li, F.; Hu, Q.; Liang, J., Set-based granular computing: a lattice model, Int. J. Approx. Reason., 55, 3, 834-852, (2014) · Zbl 1316.68189
[38] She, Y.; He, X., On the structure of the multigranulation rough set model, Knowl.-Based Syst., 36, 6, 81-92, (2012)
[39] Skowron, A., Extracting laws from decision tables: a rough set approach, Comput. Intell., 11, 2, 371-388, (1995)
[40] Skowron, A.; Stepaniuk, J., Tolerance approximation spaces, Fundam. Inform., 27, 2-3, 245-253, (1996) · Zbl 0868.68103
[41] Sun, B.; Ma, W.; Xiao, X., Three-way group decision making based on multigranulation fuzzy decision-theoretic rough set over two universes, Int. J. Approx. Reason., 81, 87-102, (2016) · Zbl 1401.68330
[42] Swiniarski, R. W.; Skowron, A., Rough set methods in feature selection and recognition, Pattern Recognit. Lett., 24, 6, 833-849, (2003) · Zbl 1053.68093
[43] Wang, G. Y.; Yu, H.; Yang, D. C., Decision table reduction based on conditional information entropy, Chinese J. Comput., 25, 7, 759-766, (2002)
[44] Wang, G. Y.; Zhao, J.; An, J.; Wu, Y., A comparative study of algebra viewpoint and information viewpoint in attribute reduction, Fundam. Inform., 68, 3, 289-301, (2005) · Zbl 1098.68134
[45] Wu, W. Z.; Zhang, M.; Li, H. Z.; Mi, J. S., Knowledge reduction in random information systems via Dempster-Shafer theory of evidence, Inf. Sci., 174, 3-4, 143-164, (2005) · Zbl 1088.68169
[46] Xu, J.; Miao, D.; Zhang, Y.; Zhang, Z., A three-way decisions model with probabilistic rough sets for stream computing, Int. J. Approx. Reason., 88, 1-22, (2017) · Zbl 1418.68214
[47] Xu, Z.-Y.; Liu, Z.-P.; Yang, B.-R.; Song, W., A quick attribute reduction algorithm with complexity of \(m a x(o(| c | | u |), o(| c | 2 | u / c |))\), Chinese J. Comput., 29, 3, 391-399, (2006)
[48] Yao, Y., Probabilistic rough set approximations, Int. J. Approx. Reason., 49, 2, 255-271, (2008) · Zbl 1191.68702
[49] Yao, Y., Three-way decisions with probabilistic rough sets, Inf. Sci., 180, 3, 341-353, (2010)
[50] Yao, Y., The superiority of three-way decisions in probabilistic rough set models, Inf. Sci., 181, 6, 1080-1096, (2011) · Zbl 1211.68442
[51] Yao, Y. Y., Probabilistic approaches to rough sets, Expert Syst., 20, 5, 287-297, (2003)
[52] Yue, X.; Chen, Y.; Miao, D.; Qian, J., Tri-partition neighborhood covering reduction for robust classification, Int. J. Approx. Reason., 83, 371-384, (2017) · Zbl 1404.68125
[53] Zhang, W. X.; Leung, Y., Theory of including degrees and its applications to uncertainty inferences, Int. J. Approx. Reason., 496-501, (1996)
[54] Zhang, X.; Miao, D., Two basic double-quantitative rough set models of precision and grade and their investigation using granular computing, Int. J. Approx. Reason., 54, 8, 1130-1148, (2013) · Zbl 1316.68201
[55] Ziarko, W., Variable precision rough set model, J. Comput. Syst. Sci., 46, 1, 39-59, (1993) · Zbl 0764.68162
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.