×

Biclustering meets triadic concept analysis. (English) Zbl 1302.68253

Summary: Biclustering numerical data became a popular data-mining task at the beginning of 2000’s, especially for gene expression data analysis and recommender systems. A bicluster reflects a strong association between a subset of objects and a subset of attributes in a numerical object/attribute data-table. So-called biclusters of similar values can be thought as maximal sub-tables with close values. Only few methods address a complete, correct and non-redundant enumeration of such patterns, a well-known intractable problem, while no formal framework exists. We introduce important links between biclustering and Formal Concept Analysis (FCA). Indeed, FCA is known to be, among others, a methodology for biclustering binary data. Handling numerical data is not direct, and we argue that Triadic Concept Analysis (TCA), the extension of FCA to ternary relations, provides a powerful mathematical and algorithmic framework for biclustering numerical data. We discuss hence both theoretical and computational aspects on biclustering numerical data with triadic concept analysis. These results also scale to \(n\)-dimensional numerical datasets.

MSC:

68T30 Knowledge representation
62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T05 Learning and adaptive systems in artificial intelligence

Software:

ParTriCluster
PDFBibTeX XMLCite
Full Text: DOI HAL

References:

[1] Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734-749 (2005) · doi:10.1109/TKDE.2005.99
[2] Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. Data Min. Knowl. Discov. 11(1), 5-33 (2005) · doi:10.1007/s10618-005-1396-1
[3] Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Buneman, P., Jajodia, S. (eds.) Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 207-216. ACM Press (1993)
[4] Alqadah, F., Bhatnagar, R.: Similarity measures in formal concept analysis. Ann. Math. Artif. Intell. 61(3), 245-256 (2011) · Zbl 1234.62095 · doi:10.1007/s10472-011-9257-7
[5] Besson, J., Robardet, C., Boulicaut, J.-F.: Mining a new fault-tolerant pattern type as an alternative to formal concept discovery. In: Schärfe, H., Hitzler, P., Øhrstrøm, P. (eds.) Conceptual Structures: Inspiration and Application, 14th International Conference on Conceptual Structures (ICCS). Lecture Notes in Computer Science, vol. 4068, pp. 144-157. Springer (2006) · Zbl 1194.68219
[6] Besson, J., Robardet, C., Raedt, L.D., Boulicaut, J.-F.: Mining bi-sets in numerical data. In: Dzeroski, S., Struyf, J. (eds.) KDID. Lecture Notes in Computer Science, vol. 4747, pp. 11-23. Springer (2007)
[7] Blachon, S., Pensa, R., Besson, J., Robardet, C., Boulicaut, J.-F., Gandrillon, O.: Clustering formal concepts to discover biologically relevant knowledge from gene expression data. Silico Biology 7(4-5), 467-483 (2007)
[8] Braga Araújo, R., Trielli Ferreira, G., Orair, G., Meira, J., Wagner, R., Ferreira, C., Olavo Guedes Neto, D., Zaki, M.: The partricluster algorithm for gene expression analysis. Int. J. Parallel Prog. 36, 226-249 (2008) · Zbl 1147.68672 · doi:10.1007/s10766-007-0067-9
[9] Califano, A., Stolovitzky, G., Tu, Y.: Analysis of gene expression microarrays for phenotype classification. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB), pp. 75-85. AAAI (2000) · Zbl 1024.68020
[10] Cerf, L., Besson, J., Robardet, C., Boulicaut, J.-F.: Closed patterns meet n-ary relations. TKDD 3(1), 3:1-3:36 (2009) · doi:10.1145/1497577.1497580
[11] Cheng, Y., Church, G.: Biclustering of expression data. In: Proc. 8th International Conference on Intelligent Systems for Molecular Biology (ISBM), pp. 93-103 (2000)
[12] Ganter, B., Wille, R.: Formal Concept Analysis. Springer (1999) · Zbl 0909.06001
[13] Hartigan, J.A.: Direct clustering of a data matrix. J. Am. Stat. Assoc. 67(337), 123-129 (1972) · doi:10.1080/01621459.1972.10481214
[14] Ignatov, D.I., Kuznetsov, S.O., Poelmans, J.: Concept-based biclustering for internet advertisement. In: Vreeken, J., Ling, C., Zaki, M.J., Siebes, A., Yu, J.X., Goethals, B., Webb, G.I., Wu, X. (eds.) ICDM Workshops, pp. 123-130. IEEE Computer Society (2012)
[15] Jäschke, R., Hotho, A., Schmitz, C., Ganter, B., Stumme, G.: Trias—an algorithm for mining iceberg tri-lattices. In: ICDM, pp. 907-911 (2006)
[16] Ji, L., Tan, K.-L., Tung, A.K.H.: Mining frequent closed cubes in 3d datasets. In: Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB), pp. 811-822. ACM (2006)
[17] Kaytoue, M., Assaghir, Z., Napoli, A., Kuznetsov, S.O.: Embedding tolerance relations in formal concept analysis: an application in information fusion. In: CIKM, pp. 1689-1692. ACM (2010)
[18] Kaytoue, M., Kuznetsov, S.O., Macko, J., Meira,W., Napoli, A.: Mining biclusters of similar values with Triadic Concept Analysis. In: Napoli, A., Vychodil, V. (eds.) The Eighth International Conference on Concept Lattices and their Applications—CLA 2011. INRIA Nancy Grand Est - LORIA, Nancy, France (2011) · Zbl 1147.68672
[19] Kaytoue, M., Kuznetsov, S.O., Napoli, A.: Biclustering numerical data in formal concept analysis. In: Valtchev, P., Jäschke, R. (eds.) ICFCA. LNCS, vol. 6628, pp. 135-150. Springer (2011) · Zbl 1326.68286
[20] Kaytoue, M., Kuznetsov, S.O., Napoli, A., Duplessis, S.: Mining gene expression data with pattern structures in formal concept analysis. Inf. Sci. 181(10), 1989-2001 (2011) · doi:10.1016/j.ins.2010.07.007
[21] Kaytoue-Uberall, M., Duplessis, S., Kuznetsov, S.O., Napoli, A.: Two fca-based methods for mining gene expression data. In: Ferré, S., Rudolph, S. (eds.) Proceedings of the 7th International Conference on Formal Concept Analysis (ICFCA). Lecture Notes in Computer Science, vol. 5548, pp. 251-266. Springer (2009) · Zbl 1248.68408
[22] Krajca, P., Vychodil, V.: Distributed algorithm for computing formal concepts using map-reduce framework. In: IDA, pp. 333-344. Springer (2009) · Zbl 1273.68379
[23] Kuznetsov, S.O.: A fast algorithm for computing all intersections of objects in a finite semi-lattice. Autom. Doc. Math. Linguist. 27(5), 11-21 (1993)
[24] Kuznetsov, S.O., Obiedkov, S.A.: Comparing performance of algorithms for generating concept lattices. J. Exp. Theor. Artif. Intell. 14(2-3), 189-216 (2002) · Zbl 1024.68020 · doi:10.1080/09528130210164170
[25] Lehmann, F., Wille, R.: A triadic approach to formal concept analysis. In: ICCS. LNCS, vol. 954, pp. 32-43. Springer (1995)
[26] Madeira, S., Oliveira, A.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinforma. 1(1), 24-45 (2004) · doi:10.1109/TCBB.2004.2
[27] Mirkin, B.: Mathematical Classification and Clustering. Kluwer Academic Publisher, Boston (1996) · Zbl 0874.90198 · doi:10.1007/978-1-4613-0457-9
[28] Mirkin, B.: Clustering for Data Mining: A Data Recovery Approach. Chapman & Hall/Crc Computer Science (2005) · Zbl 1083.68099
[29] Mirkin, B., Kramarenko, A.V.: Approximate bicluster and tricluster boxes in the analysis of binary data. In: Kuznetsov, S.O., Slezak, D., Hepting, D.H., Mirkin, B. (eds.) Proceedings of the 13th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing (RSFDGrC 2011). Lecture Notes in Computer Science, vol. 6743, pp. 248-256. Springer (2011) · Zbl 1234.62095
[30] Motameny, S., Versmold, B., Schmutzler, R.: Formal concept analysis for the identification of combinatorial biomarkers in breast cancer. In: Medina, R., Obiedkov, S.A. (eds.) Formal Concept Analysis, 6th International Conference (ICFCA). Lecture Notes in Computer Science, vol. 4933, pp. 229-240. Springer (2008) · Zbl 1131.68542
[31] Pensa, R.G., Leschi, C., Besson, J., Boulicaut, J.-F.: Assessment of discretization techniques for relevant pattern discovery from gene expression data. In: Zaki, M.J., Morishita, S., Rigoutsos, I. (eds.) Proceedings of the 4th ACM SIGKDD Workshop on Data Mining in Bioinformatics (BIOKDD 2004), pp. 24-30 (2004)
[32] Prelic, A., Bleuler, S., Zimmermann, P., Wille, A., Buhlmann, P., Gruissem, W., Hennig, L., Thiele, L., Zitzler, E.: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9), 1122-1129 (2006) · doi:10.1093/bioinformatics/btl060
[33] Raïssi, C., Pei, J., Kister, T.: Computing closed skycubes. PVLDB 3(1), 838-847 (2010)
[34] Soulet, A., Raïssi, C., Plantevit, M., Crémilleux, B.: Mining dominant patterns in the sky. In: Cook, D.J., Pei, J., Wang, W., Zaïane, O.R., Wu, X. (eds.) In: 11th IEEE International Conference on Data Mining (ICDM), pp. 655-664. IEEE (2011)
[35] Tchagang, A.B., Phan, S., Famili, F., Shearer, H., Fobert, P.R., Huang, Y., Zou, J., Huang, D., Cutler, A., Liu, Z., Pan, Y.: Mining biological information from 3d short time-series gene expression data: the optricluster algorithm. BMC Bioinforma. 13, 54 (2012) · doi:10.1186/1471-2105-13-54
[36] Valtchev, P., Missaoui, R., Godin, R.: Formal concept analysis for knowledge discovery and data mining: the new challenges. In: Eklund, P.W. (ed.) ICFCA. LNCS, vol. 2961, pp. 352-371. Springer (2004) · Zbl 1198.68246
[37] Voutsadakis, G.: Polyadic concept analysis. Order 19(3), 295-304 (2002) · Zbl 1013.06006 · doi:10.1023/A:1021252203599
[38] Wille, R.: Restructuring lattice theory: an approach based on hierarchies of concepts. In: Rival, I. (eds.) Ordered Sets, pp. 445-470. Reidel (1982) · Zbl 0491.06008
[39] Wille, R.: Why can concept lattices support knowledge discovery in databases? J. Exp. Theor. Artif. Intell. 14(2-3), 81-92 (2002) · Zbl 1022.68042 · doi:10.1080/09528130210164161
[40] Zhao, L.; Zaki, MJ, Tricluster: an effective algorithm for mining coherent clusters in 3d microarray data, 694-705 (2005), New York, USA · doi:10.1145/1066157.1066236
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.