##
**Attribute reduction of data with error ranges and test costs.**
*(English)*
Zbl 1250.68227

Summary: In data mining applications, we have a number of measurement methods to obtain a data item with different test costs and different error ranges. Test costs refer to time, money, or other resources spent in obtaining data items related to some object; observational errors correspond to differences in measured and true value of a data item. In supervised learning, we need to decide which data items to obtain and which measurement methods to employ, so as to minimize the total test cost and help in constructing classifiers. This paper studies this problem in four steps. First, data models are built to address error ranges and test costs. Second, error-range-based covering rough set is constructed to define lower and upper approximations, positive regions, and relative reducts. A closely related theory deals with neighborhood rough set, which has been successfully applied to heterogeneous attribute reduction. The major difference between the two theories is the definition of neighborhood. Third, the minimal test cost attribute reduction problem is redefined in the new theory. Fourth, both backtrack and heuristic algorithms are proposed to deal with the new problem. The algorithms are tested on ten UCI (University of California – Irvine) datasets. Experimental results show that the backtrack algorithm is efficient on rational-sized datasets, the weighting mechanism for the heuristic information is effective, and the competition approach can improve the quality of the result significantly. This study suggests new research trends concerning attribute reduction and covering rough set.

### MSC:

68T05 | Learning and adaptive systems in artificial intelligence |

PDF
BibTeX
XML
Cite

\textit{F. Min} and \textit{W. Zhu}, Inf. Sci. 211, 48--67 (2012; Zbl 1250.68227)

Full Text:
DOI

### References:

[2] | Bargiela, A.; Pedrycz, W., Granular Computing: An Introduction (2002), Kluwer Academic Publishers: Kluwer Academic Publishers Boston |

[3] | Bartol, W.; Miro, J.; Pioro, K.; Rossello, F., On the coverings by tolerance classes, Information Sciences, 166, 1-4, 193-211 (2004) · Zbl 1101.68863 |

[5] | Bianucci, D.; Cattaneo, G.; Ciucci, D., Entropies and co-entropies of coverings with application to incomplete information systems, Fundamenta Informaticae, 75, 1-4, 77-105 (2007) · Zbl 1108.68112 |

[8] | Chvatal, V., A greedy heuristic for the set-covering problem, Mathematics of Operations Research, 4, 3, 233-235 (1979) · Zbl 0443.90066 |

[9] | Cover, T. M., Nearest neighbor pattern classification, IEEE Transactions on Information Theory, 13, 21-27 (1967) · Zbl 0154.44505 |

[10] | Dai, J., Rough 3-valued algebras, Information Sciences, 178, 8, 1986-1996 (2008) · Zbl 1134.06008 |

[11] | Dai, J.; Xu, Q., Approximations and uncertainty measures in incomplete information systems, Information Sciences, 198, 1, 62-80 (2012) · Zbl 1248.68487 |

[12] | Diker, M., Textural approach to generalized rough sets based on relations, Information Sciences, 180, 8, 1418-1433 (2010) · Zbl 1187.68579 |

[14] | Du, Y.; Hu, Q.; Zhu, P.; Ma, P., Rule learning for classification based on neighborhood covering reduction, Information Sciences, 181, 24, 5457-5467 (2011) |

[18] | Hu, Q.; Pedrycz, W.; Yu, D.; Lang, J., Selecting discrete and continuous features based on neighborhood decision error minimization, IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics, 40, 1, 37-50 (2010) |

[19] | Hu, Q.; Yu, D.; Liu, J.; Wu, C., Neighborhood rough set based heterogeneous feature subset selection, Information Sciences, 178, 18, 3577-3594 (2008) · Zbl 1154.68466 |

[20] | Hu, Q.; Yu, D.; Xie, Z., Numerical attribute reduction based on neighborhood granulation and rough approximation (in chinese), Journal of Software, 19, 3, 640-649 (2008) · Zbl 1199.68523 |

[21] | Huang, B.; Li, H.-X.; Wei, D.-K., Dominance-based rough set model in intuitionistic fuzzy information systems, Knowledge-Based Systems, 28, 115-123 (2012) |

[25] | Li, H.; Wang, M.; Zhou, X.; Zhao, J., An interval set model for learning rules from incomplete information table, International Journal of Approximate Reasoning, 53, 24-37 (2012) · Zbl 1242.68235 |

[26] | Lin, T. Y., Neighborhood systems and approximation in database and knowledge base systems, (Proceedings of the 4th International Symposium on Methodologies of Intelligent Systems (1989), ACM), 75-86 |

[32] | Liu, D.; Li, T. R.; Ruan, D., Probabilistic model criteria with decision-theoretic Rough sets, Information Sciences, 181, 3709-3722 (2011) |

[33] | Liu, G., Generalized rough sets over fuzzy lattices, Information Sciences, 178, 6, 1651-1662 (2008) · Zbl 1136.03328 |

[34] | Liu, Q.; Li, F.; Min, F.; Ye, M.; Yang, G., An efficient reduction algorithm based on new conditional information entropy, Control and Decision, 20, 8, 878-882 (2005), (in Chinese) · Zbl 1115.68519 |

[35] | Meng, Z.; Shi, Z., A fast approach to attribute reduction in incomplete decision systems with tolerance relation-based rough sets, Information Sciences, 179, 16, 2774-2793 (2009) · Zbl 1191.68667 |

[36] | Miao, D. Q.; Zhao, Y.; Yao, Y. Y.; Li, H. X.; Xu, F. F., Relative reducts in consistent and inconsistent decision tables of the Pawlak rough set model, Information Sciences, 179, 24, 4140-4150 (2009) · Zbl 1183.68608 |

[37] | Min, F.; He, H.; Qian, Y.; Zhu, W., Test-cost-sensitive attribute reduction, Information Sciences, 181, 4928-4942 (2011) |

[38] | Min, F.; Liu, Q., A hierarchical model for test-cost-sensitive decision systems, Information Sciences, 179, 2442-2452 (2009) · Zbl 1192.68651 |

[39] | Min, F.; Zhu, W., Attribute reduction with test cost constraint, Journal of Electronic Science and Technology of China, 9, 2, 97-102 (2011) |

[43] | Pawlak, Z., Rough sets, International Journal of Computer and Information Sciences, 11, 341-356 (1982) · Zbl 0501.68053 |

[44] | Pawlak, Z., Rough Sets: Theoretical Aspects of Reasoning about Data (1991), Kluwer Academic Publishers: Kluwer Academic Publishers Boston · Zbl 0758.68054 |

[45] | Pawlak, Z., Rough set theory and its applications, Journal of Telecommunications and Information Technology, 3, 7-10 (2002) |

[46] | Pawlak, Z., Rough sets and intelligent data analysis, Information Sciences, 147, 12, 1-12 (2002) · Zbl 1018.68082 |

[47] | Peters, J. F., Near sets. General theory about nearness of objects, Applied Mathematical Sciences, 1, 53, 2609-2629 (2007) · Zbl 1135.68570 |

[48] | Polkowski, L., A set theory for rough sets: toward a formal calculus of vague, Fundamenta Informaticae, 71, 1, 49-61 (2006) · Zbl 1094.03041 |

[49] | Qian, Y.; Liang, J.; Pedrycz, W.; Dang, C., Positive approximation: an accelerator for attribute reduction in rough set theory, Artificial Intelligence, 174, 9-10, 597-618 (2010) · Zbl 1205.68310 |

[50] | Quinlan, J. R., Induction of decision trees, Machine Learning, 1, 81-106 (1986) |

[53] | Skowron, A.; Stepaniuk, J., Tolerance approximation spaces, Fundamenta Informaticae, 27, 245-253 (1996) · Zbl 0868.68103 |

[54] | Śle¸zak, D., Approximate entropy reducts, Fundamenta Informaticae, 53, 3-4, 365-390 (2002) · Zbl 1092.68676 |

[55] | Turney, P. D., Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm, Journal of Artificial Intelligence Research, 2, 369-409 (1995) |

[58] | Wang, G.; Yu, H.; Yang, D., Decision table reduction based on conditional information entropy, Chinese Journal of Computers, 2, 7, 759-766 (2002) |

[59] | Wang, X.-Z.; Zhai, J.-H.; Lu, S.-X., Induction of multiple fuzzy decision trees based on rough set technique, Information Sciences, 178, 3188-3202 (2008) · Zbl 1154.68529 |

[60] | Wei, W.; Liang, J.; Qian, Y., A comparative study of rough sets for hybrid data, Information Sciences, 190, 1-16 (2012) · Zbl 1248.68164 |

[62] | Yang, Q.; Wu, X., 10 challenging problems in data mining research, International Journal of Information Technology and Decision Making, 5, 4, 597-604 (2006) |

[63] | Yao, Y. Y., Relational interpretations of neighborhood operators and rough set approximation operators, Information Sciences, 111, 1-4, 239-259 (1998) · Zbl 0949.68144 |

[64] | Yao, Y. Y., A partition model of granular computing, Lecture Notes in Computer Science, 3100, 232-253 (2004) · Zbl 1104.68776 |

[65] | Yao, Y. Y.; Wong, S., A decision theoretic framework for approximating concepts, International Journal of Man-Machine Studies, 37, 793-809 (1992) |

[66] | Yao, Y. Y.; Zhao, Y., Attribute reduction in decision-theoretic rough set models, Information Sciences, 178, 17, 3356-3373 (2008) · Zbl 1156.68589 |

[68] | Zhou, Z.; Liu, X., Training cost-sensitive neural networks with methods addressing the class imbalance problem, IEEE Transactions on Knowledge and Data Engineering, 18, 1, 63-77 (2006) |

[70] | Zhu, W., Generalized rough sets based on relations, Information Sciences, 177, 22, 4997-5011 (2007) · Zbl 1129.68088 |

[71] | Zhu, W., Topological approaches to covering rough sets, Information Sciences, 177, 6, 1499-1508 (2007) · Zbl 1109.68121 |

[72] | Zhu, W.; Wang, F., Reduction and axiomization of covering generalized rough sets, Information Sciences, 152, 1, 217-230 (2003) · Zbl 1069.68613 |

[73] | Ziarko, W., Variable precision rough set model, Journal of Computer and System Sciences, 46, 1, 39-59 (1993) · Zbl 0764.68162 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.