Krimp swMATH ID: 28422 Software Authors: Vreeken J, Van Leeuwen M, Siebes A Description: Krimp: mining itemsets that compress. One of the major problems in pattern mining is the explosion of the number of results. Tight constraints reveal only common knowledge, while loose constraints lead to an explosion in the number of returned patterns. This is caused by large groups of patterns essentially describing the same set of transactions. In this paper we approach this problem using the MDL principle: the best set of patterns is that set that compresses the database best. For this task we introduce the Krimp algorithm. Experimental evaluation shows that typically only hundreds of itemsets are returned; a dramatic reduction, up to seven orders of magnitude, in the number of frequent item sets. These selections, called code tables, are of high quality. This is shown with compression ratios, swap-randomisation, and the accuracies of the code table-based Krimp classifier, all obtained on a wide range of datasets. Further, we extensively evaluate the heuristic choices made in the design of the algorithm. Homepage: https://link.springer.com/article/10.1007/s10618-010-0202-x Related Software: UCI-ml; MDL4BMF; StreamKrimp; gSpan; PrefixSpan; C4.5; MODL; ElemStatLearn; DBpedia; KONECT; OddBall; WebGraph; PEGASUS; MapReduce; LCM; Eclat; AlexNet; PASCAL VOC; BinaryNet; CNN-RNN Cited in: 19 Documents Standard Articles 1 Publication describing the Software, including 1 Publication in zbMATH Year Krimp: mining itemsets that compress. Zbl 1235.68071Vreeken, Jilles; Van Leeuwen, Matthijs; Siebes, Arno 2011 all top 5 Cited by 50 Authors 6 Vreeken, Jilles 2 Tatti, Nikolaj 1 Akoglu, Leman 1 Bloem, Peter 1 Böhm, Klemens 1 Calders, Toon 1 De Raedt, Luc 1 de Rooij, Steven 1 Dries, Anton 1 Faloutsos, Christos 1 Fradkin, Dmitriy 1 Fürnkranz, Johannes 1 Grünwald, Peter D. 1 Guns, Tias 1 Hess, Sibylle 1 Kang, U. 1 Kliegr, Tomáš 1 Koutra, Danai 1 Kuznetsov, Sergei O. 1 Lam, Hoang Thanh 1 Li, Geng 1 Li, Tao 1 Li, Yao 1 Lijffijt, Jefrey 1 Liu, Lingqiao 1 Macha, Meghanath 1 Makhalova, Tatiana 1 Mampaey, Michael 1 Mörchen, Fabian 1 Morik, Katharina 1 Müller, Emmanuel 1 Napoli, Amedeo 1 Nguyen, Hoang-Vu 1 Nijssen, Siegfried 1 Papapetrou, Panagiotis 1 Paulheim, Heiko 1 Petitjean, François 1 Piatkowski, Nico 1 Puolamäki, Kai 1 Roos, Teemu 1 Shen, Chunhua 1 Siebes, Arno P. J. M. 1 Tack, Guido 1 Tomczak, Jakub M. 1 van den Hengel, Anton 1 van Leeuwen, Matthijs 1 Webb, Geoffrey I. 1 Zaki, Mohammed Javeed 1 Zięba, Maciej 1 Zimek, Arthur all top 5 Cited in 6 Serials 11 Data Mining and Knowledge Discovery 3 Machine Learning 2 Statistical Analysis and Data Mining 1 Artificial Intelligence 1 International Journal of Computer Vision 1 International Journal of Mathematics for Industry all top 5 Cited in 8 Fields 16 Computer science (68-XX) 10 Statistics (62-XX) 1 Combinatorics (05-XX) 1 Linear and multilinear algebra; matrix theory (15-XX) 1 Numerical analysis (65-XX) 1 Game theory, economics, finance, and other social and behavioral sciences (91-XX) 1 Biology and other natural sciences (92-XX) 1 Information and communication theory, circuits (94-XX) Citations by Year