iRIPPER: an improved rule-based text categorization algorithm. (Chinese. English summary) Zbl 1174.68442

Summary: The rule-based text categorization algorithm RIPPER is specialized with easy understanding, quick optimization, and high efficiency. However, when the rule refers to too many features, not only are the above advantages apparently weakened, but also the performance of the algorithm decreases. The hierarchy-based hRIPPER though uses hierarchical feature selection and can still not filter features fully. Then an improved text categorization algorithm iRIPPER is proposed to solve the problems in the learning process of RIPPER and hRIPPER, which filters features more thoroughly during the learning process. The experiment proves that it selects features effectively, generates fewer rules, and reduces the time in the growing process. Therefore it improves the performance of the rule-based text categorization.


68Q32 Computational learning theory
68T05 Learning and adaptive systems in artificial intelligence