×

Approximation algorithms for tensor clustering. (English) Zbl 1262.68151

Gavaldà, Ricard (ed.) et al., Algorithmic learning theory. 20th international conference, ALT 2009, Porto, Portugal, October 3–5, 2009. Proceedings. Berlin: Springer (ISBN 978-3-642-04413-7/pbk). Lecture Notes in Computer Science 5809. Lecture Notes in Artificial Intelligence, 368-383 (2009).
Summary: We present the first (to our knowledge) approximation algorithm for tensor clustering – a powerful generalization to basic 1D clustering. Tensors are increasingly common in modern applications dealing with complex heterogeneous data and clustering them is a fundamental tool for data analysis and pattern discovery. Akin to their 1D cousins, common tensor clustering formulations are NP-hard to optimize. But, unlike the 1D case, no approximation algorithms seem to be known. We address this imbalance and build on recent co-clustering work to derive a tensor clustering algorithm with approximation guarantees, allowing metrics and divergences (e.g., Bregman) as objective functions. Therewith, we answer two open questions by A. Anagnostopoulos et al. [“Approximation algorithms for co-clustering”, in: Proceedings of the 27th ACM symposium on principles of database systems, PODS 2008. New York, NY: ACM. 201–210 (2008; doi:10.1145/1376916.1376945)]. Our analysis yields a constant approximation factor independent of data size; a worst-case example shows this factor to be tight for Euclidean co-clustering. However, empirically the approximation factor is observed to be conservative, so our method can also be used in practice.
For the entire collection see [Zbl 1176.68006].

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68W25 Approximation algorithms

Software:

k-means++
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Banerjee, A., Basu, S., Merugu, S.: Multi-way Clustering on Relation Graphs. In: SIAM Conf. Data Mining, SDM (2007) · doi:10.1137/1.9781611972771.14
[2] Shashua, A., Zass, R., Hazan, T.: Multi-way Clustering Using Super-Symmetric Non-negative Tensor Factorization. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 595–608. Springer, Heidelberg (2006) · doi:10.1007/11744085_46
[3] Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: KDD, pp. 89–98 (2003) · doi:10.1145/956750.956764
[4] Banerjee, A., Dhillon, I.S., Ghosh, J., Merugu, S., Modha, D.S.: A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation. JMLR 8, 1919–1986 (2007) · Zbl 1222.68139
[5] Ackermann, M.R., Blömer, J.: Coresets and Approximate Clustering for Bregman Divergences. In: ACM-SIAM Symp. on Disc. Alg., SODA (2009) · doi:10.1137/1.9781611973068.118
[6] Ackermann, M.R., Blömer, J., Sohler, C.: Clustering for metric and non-metric distance measures. In: ACM-SIAM Symp. on Disc. Alg. (SODA) (April 2008) · Zbl 1192.68633
[7] Arthur, D., Vassilvitskii, S.: k-means++: The Advantages of Careful Seeding. In: ACM-SIAM Symp. on Discete Algorithms (SODA), pp. 1027–1035 (2007) · Zbl 1302.68273
[8] Nock, R., Luosto, P., Kivinen, J.: Mixed Bregman clustering with approximation guarantees. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML / PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 154–169. Springer, Heidelberg (2008) · Zbl 05373019 · doi:10.1007/978-3-540-87481-2_11
[9] Sra, S., Jegelka, S., Banerjee, A.: Approximation algorithms for Bregman clustering, co-clustering and tensor clustering. Technical Report 177, MPI for Biological Cybernetics (2008) · Zbl 1262.68151
[10] Ben-David, S.: A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering. Mach. Learn. 66(2-3), 243–257 (2007) · Zbl 1470.62080 · doi:10.1007/s10994-006-0587-3
[11] Puolamäki, K., Hanhijärvi, S., Garriga, G.C.: An approximation ratio for biclustering. Inf. Process. Letters 108(2), 45–49 (2008) · Zbl 1191.68873 · doi:10.1016/j.ipl.2008.03.013
[12] Anagnostopoulos, A., Dasgupta, A., Kumar, R.: Approximation algorithms for co-clustering. In: Symp. on Principles of Database Systems, PODS (2008) · Zbl 1297.68256 · doi:10.1145/1376916.1376945
[13] Zha, H., Ding, C., Li, T., Zhu, S.: Workshop on Data Mining using Matrices and Tensors. In: KDD (2008)
[14] Hasan, M., Velazquez-Armendariz, E., Pellacini, F., Bala, K.: Tensor Clustering for Rendering Many-Light Animations. In: Eurographics Symp. on Rendering, vol. 27 (2008)
[15] Kolda, T.G., Bader, B.W.: Tensor Decompositions and Applications. SIAM Review 51(3) (to appear, 2009) · Zbl 1173.65029 · doi:10.1137/07070111X
[16] Hartigan, J.A.: Direct clustering of a data matrix. J. of the Am. Stat. Assoc. 67(337), 123–129 (1972) · doi:10.1080/01621459.1972.10481214
[17] Cheng, Y., Church, G.: Biclustering of expression data. In: Proc. ISMB, pp. 93–103. AAAI Press, Menlo Park (2000)
[18] Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: KDD, pp. 269–274 (2001) · doi:10.1145/502512.502550
[19] Bekkerman, R., El-Yaniv, R., McCallum, A.: Multi-way distributional clustering via pairwise interactions. In: ICML (2005) · doi:10.1145/1102351.1102357
[20] Agarwal, S., Lim, J., Zelnik-Manor, L., Perona, P., Kriegman, D., Belongie, S.: Beyond pairwise clustering. In: IEEE CVPR (2005) · doi:10.1109/CVPR.2005.89
[21] Govindu, V.M.: A tensor decomposition for geometric grouping and segmentation. In: IEEE CVPR (2005) · doi:10.1109/CVPR.2005.50
[22] Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2001) · Zbl 1019.68094
[23] Hein, M., Bousquet, O.: Hilbertian metrics and positive definite kernels on probability measures. In: AISTATS (2005)
[24] Censor, Y., Zenios, S.A.: Parallel Optimization: Theory, Algorithms, and Applications. Oxford University Press, Oxford (1997) · Zbl 0945.90064
[25] Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman Divergences. JMLR 6(6), 1705–1749 (2005) · Zbl 1190.62117
[26] de Silva, V., Lim, L.H.: Tensor Rank and the Ill-Posedness of the Best Low-Rank Approximation Problem. SIAM J. Matrix Anal. & Appl. 30(3), 1084–1127 (2008) · Zbl 1167.14038 · doi:10.1137/06066518X
[27] Jegelka, S., Sra, S., Banerjee, A.: Approximation algorithms for Bregman co-clustering and tensor clustering (2009); arXiv:cs.DS/0812.0389v3 · Zbl 1262.68151
[28] Chaudhuri, K., McGregor, A.: Finding metric structure in information theoretic clustering. In: Conf. on Learning Theory, COLT (July 2008)
[29] Cho, H., Dhillon, I.S., Guan, Y., Sra, S.: Minimum Sum Squared Residue based Co-clustering of Gene Expression data. In: SDM, 114–125 (2004)
[30] Kluger, Y., Basri, R., Chang, J.T.: Spectral biclustering of microarray data: Coclustering genes and conditions. Genome Research 13, 703–716 (2003) · doi:10.1101/gr.648603
[31] Cho, H., Dhillon, I.: Coclustering of human cancer microarrays using minimum sum-squared residue coclustering. IEEE/ACM Tran. Comput. Biol. Bioinf. 5(3), 385–400 (2008) · Zbl 05335779 · doi:10.1109/TCBB.2007.70268
[32] Baranzini, S.E., et al: Transcription-based prediction of response to IFN{\(\beta\)} using supervised computational methods. PLoS Biology 3(1) (2004) · doi:10.1371/journal.pbio.0030002
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.