×

Learning classification rules from data. (English) Zbl 1041.68075

Summary: We present ELEM2, a machine learning system that induces classification rules from a set of data based on a heuristic search over a hypothesis space. ELEM2 is distinguished from other rule induction systems in three aspects. First, it uses a new heuristtic function to guide the heuristic search. The function reflects the degree of relevance of an attribute-value pair to a target concept and leads to selection of the most relevant pairs for formulating rules. Second, ELEM2 handles inconsistent training examples by defining an unlearnable region of a concept based on the probability distribution of that concept in the training data. The unlearnable region is used as a stopping criterion for the concept learning process, which resolves conflicts without removing inconsistent examples. Third, ELEM2 employs a new rule quality measure in its post-pruning process to prevent rules from overfitting the data. The rule quality formula measures the extent to which a rule can discriminate between the positive and negative examples of a class. We describe features of ELEM2, its rule induction algorithm and its classification procedure. We report experimental results that compare ELEM2 with C4.5 and CN2 on a number of datasets.

MSC:

68T05 Learning and adaptive systems in artificial intelligence

Software:

UCI-ml; C4.5; LERS
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Quinlan, J. R., C4.5: Programs for Machine Learning (1993), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA
[2] Clark, P.; Niblett, T., The CN2 induction algorithm, Machine Learning, 3, 261-283 (1989)
[3] Cendrowska, J., PRISM: An algorithm for inducing modular rules, (Gaines, B.; Boose, J., Knowledge Acquisition for Knowledge-Based Systems (1988), Academic Press) · Zbl 0638.68110
[4] Quinlan, J. R., Learning efficient classification procedures and their application to chess end games, (Michalski, R. S.; Carbonell, J. G.; Mitchell, T. M., Machine Learning: An Artificial Intelligence Approach, Volume 1 (1983))
[5] Grzymala-Busse, J. W., LERS—A system for learning from examples based on rough sets, (Slowinski, R., Intelligent Decision Support: Handbook of Applications and Advances of Rough Sets Theory (1992), Kluwer Academic), 3-18 · Zbl 0820.68001
[6] Holte, R.; Acker, L.; Porter, B., Concept learning and the problem of small disjuncts, (Proceedings of the \(11^{th}\) International Joint Conference on Artificial Intelligence. Proceedings of the \(11^{th}\) International Joint Conference on Artificial Intelligence, Detroit, MI (1989)) · Zbl 0709.68057
[7] Michalski, R. S.; Mozetic, I.; Hong, J.; Lavrac, N., The multi-purpose incremental learning system AQ15 and its testing application to three medical domains, (Proceedings of AAAI 1986 (1986)), 1041-1045
[8] An, A.; Cercone, N., Discretization of continuous attributes for learning classification rules, (Proceedings of the Third Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-99). Proceedings of the Third Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-99), Beijing, China (1999))
[9] An, A., Analysis Methodologies for Integrated and Enhanced Problem Solving, (Ph.D. Thesis (1997), Dept. of Computer Science, University of Regina: Dept. of Computer Science, University of Regina Regina, Canada)
[10] Hamilton, H. J.; Shan, N.; Cercone, N., RIAC: A rule induction algorithm based on approximate classification, (Technical Report CS-96-06 (1996), University of Regina)
[11] Robertson, S. E.; Sparck Jones, K., Relevance weighting of search terms, Journal of the American Society for Information Science, 27, 129-146 (1976)
[12] Murphy, P. M.; Aha, D. W., UCI Repository of Machine Learning Database (1994), For information contact ml-repository@ics.uci.edu.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.