Itemset mining: a constraint programming perspective.

*(English)*Zbl 1353.68233Summary: The field of data mining has become accustomed to specifying constraints on patterns of interest. A large number of systems and techniques has been developed for solving such constraint-based mining problems, especially for mining itemsets. The approach taken in the field of data mining contrasts with the constraint programming principles developed within the artificial intelligence community. While most data mining research focuses on algorithmic issues and aims at developing highly optimized and scalable implementations that are tailored towards specific tasks, constraint programming employs a more declarative approach. The emphasis lies on developing high-level modeling languages and general solvers that specify what the problem is, rather than outlining how a solution should be computed, yet are powerful enough to be used across a wide variety of applications and application domains.

This paper contributes a declarative constraint programming approach to data mining. More specifically, we show that it is possible to employ off-the-shelf constraint programming techniques for modeling and solving a wide variety of constraint-based itemset mining tasks, such as frequent, closed, discriminative, and cost-based itemset mining. In particular, we develop a basic constraint programming model for specifying frequent itemsets and show that this model can easily be extended to realize the other settings. This contrasts with typical procedural data mining systems where the underlying procedures need to be modified in order to accommodate new types of constraint, or novel combinations thereof. Even though the performance of state-of-the-art data mining systems outperforms that of the constraint programming approach on some standard tasks, we also show that there exist problems where the constraint programming approach leads to significant performance improvements over state-of-the-art methods in data mining and as well as to new insights into the underlying data mining problems. Many such insights can be obtained by relating the underlying search algorithms of data mining and constraint programming systems to one another. We discuss a number of interesting new research questions and challenges raised by the declarative constraint programming approach to data mining.

This paper contributes a declarative constraint programming approach to data mining. More specifically, we show that it is possible to employ off-the-shelf constraint programming techniques for modeling and solving a wide variety of constraint-based itemset mining tasks, such as frequent, closed, discriminative, and cost-based itemset mining. In particular, we develop a basic constraint programming model for specifying frequent itemsets and show that this model can easily be extended to realize the other settings. This contrasts with typical procedural data mining systems where the underlying procedures need to be modified in order to accommodate new types of constraint, or novel combinations thereof. Even though the performance of state-of-the-art data mining systems outperforms that of the constraint programming approach on some standard tasks, we also show that there exist problems where the constraint programming approach leads to significant performance improvements over state-of-the-art methods in data mining and as well as to new insights into the underlying data mining problems. Many such insights can be obtained by relating the underlying search algorithms of data mining and constraint programming systems to one another. We discuss a number of interesting new research questions and challenges raised by the declarative constraint programming approach to data mining.

##### MSC:

68T05 | Learning and adaptive systems in artificial intelligence |

68T20 | Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.) |

PDF
BibTeX
XML
Cite

\textit{T. Guns} et al., Artif. Intell. 175, No. 12--13, 1951--1983 (2011; Zbl 1353.68233)

Full Text:
DOI

##### References:

[1] | Agrawal, Rakesh; Imielinski, Tomasz; Swami, Arun N., Mining association rules between sets of items in large databases, (), 207-216 |

[2] | Agrawal, Rakesh; Mannila, Hiekki; Srikant, Ramakrishnan; Toivonen, Hannu; Inkeri Verkamo, A., Fast discovery of association rules, (), 307-328 |

[3] | Apt, Krzysztof R.; Wallace, Mark, Constraint logic programming using eclipse, (2007), Cambridge University Press New York, NY, USA · Zbl 1119.68044 |

[4] | Bay, Stephen D.; Pazzani, Michael J., Detecting change in categorical data: mining contrast sets, (), 302-306 |

[5] | Bayardo, Roberto J.; Agrawal, Rakesh; Gunopulos, Dimitrios, Constraint-based rule mining in large, dense databases, Data mining and knowledge discovery, 4, 2/3, 217-240, (2000) |

[6] | Beldiceanu, Nicolas; Carlsson, Mats; Demassey, Sophie; Petit, Thierry, Global constraint catalogue: past, present and future, Constraints, 12, 21-62, (March 2007) |

[7] | Bessiere, Christian; Hebrard, Emmanuel; OʼSullivan, Barry, Minimising decision tree size as combinatorial optimisation, (), 173-187 |

[8] | Bonchi, Francesco; Goethals, Bart, FP-bonsai: the art of growing and pruning small fp-trees, (), 155-160 |

[9] | Bonchi, Francesco; Lucchese, Claudio, Extending the state-of-the-art of constraint-based pattern discovery, Data and knowledge engineering, 60, 2, 377-399, (2007) |

[10] | Brailsford, Sally C.; Potts, Chris N.; Smith, Barbara M., Constraint satisfaction problems: algorithms and applications, European journal of operational research, 119, 3, 557-581, (1999) · Zbl 0938.90055 |

[11] | Bucila, Cristian; Gehrke, Johannes; Kifer, Daniel; White, Walker M., Dualminer: a dual-pruning algorithm for itemsets with constraints, Data mining and knowledge discovery, 7, 3, 241-272, (2003) |

[12] | Chang, Ming-Wei; Ratinov, Lev-Arie; Rizzolo, Nicholas; Roth, Dan, Learning and inference with constraints, (), 1513-1518 |

[13] | Cheng, Hong; Xifeng, Yan; Han, Jiawei; Hsu, Chih-Wei, Discriminative frequent pattern analysis for effective classification, (), 716-725 |

[14] | Cheng, Hong; Xifeng, Yan; Han, Jiawei; Yu, P.S., Direct discriminative pattern mining for effective classification, (), 169-178 |

[15] | Cussens, James, Bayesian network learning by compiling to weighted MAX-sat, (), 105-112 |

[16] | De Raedt, Luc; Guns, Tias; Nijssen, Siegfried, Constraint programming for itemset mining, (), 204-212 · Zbl 1353.68233 |

[17] | De Raedt, Luc; Guns, Tias; Nijssen, Siegfried, Constraint programming for data mining and machine learning, (), 1513-1518 · Zbl 1353.68233 |

[18] | De Raedt, Luc; Kramer, Stefan, The levelwise version space algorithm and its application to molecular fragment finding, (), 853-862 |

[19] | De Raedt, Luc; Zimmermann, Albrecht, Constraint-based pattern set mining, (), 1-12 |

[20] | Dong, Guozhu; Li, Jinyan, Efficient mining of emerging patterns: discovering trends and differences, (), 43-52 |

[21] | Fan, Wei; Zhang, Kun; Cheng, Hong; Gao, Jing; Xifeng, Yan; Han, Jiawei; Yu, Philip S.; Verscheure, Olivier, Direct mining of discriminative and essential frequent patterns via model-based search tree, (), 230-238 |

[22] | Frisch, Alan M.; Grum, Matthew; Jefferson, Christopher; Martínez Hernández, Bernadette; Miguel, Ian, The design of essence: a constraint language for specifying combinatorial problems, (), 80-87 |

[23] | Fürnkranz, Johannes; Flach, Peter A., ROC ‘n’ rule learning - towards a better understanding of covering algorithms, Machine learning, 58, 1, 39-77, (2005) · Zbl 1075.68071 |

[24] | () |

[26] | Gent, Ian P.; Jefferson, Christopher; Miguel, Ian, MINION: a fast scalable constraint solver, (), 98-102 |

[27] | Bart Goethals, Mohammed J. Zaki, Advances in frequent itemset mining implementations: report on FIMIʼ03, in: SIGKDD Explorations Newsletter, vol. 6, 2004, pp. 109-117. |

[28] | Grosskreutz, Henrik; Rüping, Stefan; Wrobel, Stefan, Tight optimistic estimates for fast subgroup discovery, (), 440-456 |

[29] | Tias Guns, Siegfried Nijssen, Luc De Raedt, k-Pattern set mining under constraints, CW Reports CW596, Department of Computer Science, K.U. Leuven, October 2010. · Zbl 1211.68142 |

[30] | Han, J.; Pei, J.; Yin, Y., Mining frequent patterns without candidate generation, (), 1-12 |

[31] | Han, Jiawei; Cheng, Hong; Xin, Dong; Yan, Xifeng, Frequent pattern mining: current status and future directions, Data mining and knowledge discovery, 15, 1, 55-86, (2007) |

[32] | Kavsek, Branko; Lavrac, Nada; Jovanoski, Viktor, APRIORI-SD: adapting association rule learning to subgroup discovery, (), 230-241 |

[33] | Mannila, Heikki; Toivonen, Hannu, Levelwise search and borders of theories in knowledge discovery, Data mining and knowledge discovery, 1, 3, 241-258, (1997) |

[34] | Morimoto, Yasuhiko; Fukuda, Takeshi; Matsuzawa, Hirofumi; Tokuyama, Takeshi; Yoda, Kunikazu, Algorithms for mining association rules for binary segmentations of huge categorical databases, (), 380-391 |

[35] | Morishita, Shinichi; Sese, Jun, Traversing itemset lattice with statistical metric pruning, (), 226-236 |

[36] | Nijssen, Siegfried; Fromont, Élisa, Optimal constraint-based decision tree induction from itemset lattices, Data mining and knowledge discovery, 21, 1, 9-51, (2010) |

[37] | Nijssen, Siegfried; Guns, Tias, Integrating constraint programming and itemset mining, (), 467-482 · Zbl 1353.68233 |

[38] | Nijssen, Siegfried; Guns, Tias; De Raedt, Luc, Correlated itemset mining in ROC space: a constraint programming approach, (), 647-656 · Zbl 1353.68233 |

[39] | Nijssen, Siegfried; Kok, Joost N., Multi-class correlated pattern mining, (), 165-187 · Zbl 1178.68206 |

[40] | Pasquier, Nicolas; Bastide, Yves; Taouil, Rafik; Lakhal, Lotfi, Discovering frequent closed itemsets for association rules, (), 398-416 · Zbl 0983.68511 |

[41] | Pei, Jian; Han, Jiawei, Can we push more constraints into frequent pattern mining?, (), 350-354 |

[42] | Pei, Jian; Han, Jiawei; Lakshmanan, Laks V.S., Mining frequent item sets with convertible constraints, (), 433-442 |

[43] | Pei, Jian; Han, Jiawei; Mao, Runying, Closet: an efficient algorithm for mining frequent closed itemsets, (), 21-30 |

[44] | Perron, Laurent, Search procedures and parallelism in constraint programming, (), 346-360 |

[45] | Rossi, Francesca; van Beek, Peter; Walsh, Toby, Handbook of constraint programming (foundations of artificial intelligence), (2006), Elsevier Science Inc. · Zbl 1175.90011 |

[46] | Schulte, Christian, Programming constraint services: high-level programming of standard and new constraint services, Lecture notes in computer science, vol. 2302, (2002), Springer · Zbl 0994.68038 |

[47] | Schulte, Christian; Stuckey, Peter J., Efficient constraint propagation engines, Transactions on programming languages and systems, 31, 1, 1-43, (2008) |

[48] | Sese, Jun; Morishita, Shinichi, Answering the most correlated n association rules efficiently, (), 410-422 · Zbl 1020.68886 |

[49] | Shenoy, Pradeep; Haritsa, Jayant R.; Sudarshan, S.; Bhalotia, Gaurav; Bawa, Mayank; Devavrat, Shah, Turbo-charging vertical mining of large databases, (), 22-33 |

[50] | Soulet, Arnaud; Crømilleux, Bruno, An efficient framework for mining flexible constraints, (), 43-64 |

[51] | Uno, Takeaki; Kiyomi, Masashi; Arimura, Hiroki, LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining, (), 77-86 |

[52] | Van Hentenryck, Pascal; Deville, Yves, (), 383-403 |

[53] | Van Hentenryck, Pascal; Perron, Laurent; Puget, Jean-Francois, Search and strategies in OPL, ACM transations computational logic, 1, 2, 285-320, (2000) · Zbl 1365.90281 |

[54] | Van Hentenryck, Pascal; Saraswat, Vijay A.; Deville, Yves, Design, implementation, and evaluation of the constraint language CC(FD), Journal of logic programming, 37, 1-3, 139-164, (1998) · Zbl 0920.68026 |

[55] | Wrobel, Stefan, An algorithm for multi-relational discovery of subgroups, (), 78-87 |

[56] | Javeed Zaki, Mohammed; Gouda, Karam, Fast vertical mining using diffsets, (), 326-335 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.