×

zbMATH — the first resource for mathematics

Dimensionality reduction and topographic mapping of binary tensors. (English) Zbl 1328.92037
Summary: In this paper, a decomposition method for binary tensors, generalized multi-linear model for principal component analysis (GMLPCA) is proposed. To the best of our knowledge at present there is no other principled systematic framework for decomposition or topographic mapping of binary tensors. In the model formulation, we constrain the natural parameters of the Bernoulli distributions for each tensor element to lie in a sub-space spanned by a reduced set of basis (principal) tensors. We evaluate and compare the proposed GMLPCA technique with existing real-valued tensor decomposition methods in two scenarios: (1) in a series of controlled experiments involving synthetic data; (2) on a real-world biological dataset of DNA sub-sequences from different functional regions, with sequences represented by binary tensors. The experiments suggest that the GMLPCA model is better suited for modelling binary tensors than its real-valued counterparts. Furthermore, we extended our GMLPCA model to the semi-supervised setting by forcing the model to search for a natural parameter subspace that represents a user-specified compromise between the modelling quality and the degree of class separation.
MSC:
92C55 Biomedical imaging and signal processing
68T05 Learning and adaptive systems in artificial intelligence
62H25 Factor analysis and principal components; correspondence analysis
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62P10 Applications of statistics to biology and medical sciences; meta analysis
Software:
POIMs
PDF BibTeX Cite
Full Text: DOI
References:
[1] Lu, H; Plataniotis, KN; Venetsanopoulos, AN, MPCA: multilinear principal component analysis of tensor objects, IEEE Trans Neural Netw, 19, 18-39, (2008)
[2] Nolker, C; Ritter, H, Visual recognition of continuous hand postures, IEEE Trans Neural Netw, 13, 983-994, (2002)
[3] Jia K, Gong S (2005) Multi-modal tensor face for simultaneous super-resolution and recognition. In: 10th IEEE international conference computer vision 2:1683-1690 (Beijing)
[4] Renard N, Bourennane S (2008) An ICA-based multilinear algebra tools for dimensionality reduction in hyperspectral imagery. In: IEEE international conference acoustics, speech and signal processing, vol 4. Las Vegas, NV, pp 1345-1348
[5] Cai D, He X, Han J (2006) Tensor space model for document analysis. In: Proceedings 29th annual ACM SIGIR international conference research and development in information retrieval. Seatlle, WA, pp 625-626
[6] Pearson, K, On lines and planes of closest fit to systems of points in space, Philos Mag, 2, 559-572, (1901) · JFM 32.0710.04
[7] Lu, H; Plataniotis, KN; Venetsanopoulos, AN, Uncorrelated multilinear discriminant analysis with regularization and aggregation for tensor object recognition, IEEE Trans Neural Netw, 20, 103-123, (2009)
[8] Zafeiriou, S, Discriminant nonnegative tensor factorization algorithms, IEEE Trans Neural Netw, 20, 217-235, (2009)
[9] Panagakis, Y; Kotropoulos, C; Arce, G, Non-negative multilinear principal component analysis of auditory temporal modulations for music genre classification, IEEE Trans Audio Speech Lang Process, 18, 576-588, (2010)
[10] Brachat J, Comon P, Mourrain B, Tsigaridas E (2010) Symmetric tensor decomposition. Linear Algebra Appl (In Press, Corrected Proof) · Zbl 1206.65141
[11] Schein A, Saul L, Ungar L (2003) A generalized linear model for principal component analysis of binary data. In: 9th international workshop artificial intelligence and statistics. Key West, FL
[12] Acar, E; Yener, B, Unsupervised multiway data analysis: a literature survey, IEEE Trans Knowl Data En, 21, 6-20, (2009)
[13] Wang H, Ahuja N (2005) Rank-r approximation of tensors: using image-as-matrix representation. Comput Vis Pattern Recognit 346-353
[14] Lathauwer, L; Moor, B; Vandewalle, J, A multilinear singular value decomposition, SIAM J Matrix Anal Appl, 21, 1253-1278, (2000) · Zbl 0962.15005
[15] Kofidis, E; Regalia, PA, On the best rank-1 approximation of higher-order supersymmetric tensors, SIAM J Matrix Anal Appl, 23, 863-884, (2001) · Zbl 1001.65035
[16] Wang H, Ahuja N (2004) Compact representation of multidimensional data using tensor rank-one decomposition. In: Proceedings 17th international conference pattern recognition. Cambridge, UK, pp 44-47 · Zbl 1210.68083
[17] Ye J, Janardan R, Li Q (2004) Gpca: an efficient dimension reduction scheme for image compression and retrieval .In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’04. New York, NY, USA, pp 354-363, ACM, 2004.
[18] Xu, D; Yan, S; Zhang, L; Lin, S; Zhang, H-J; Huang, T, Reconstruction and recognition of tensor-based objects with concurrent subspaces analysis, IEEE Trans Circuits Syst Video Technol, 18, 36-47, (2008)
[19] Lu, H; Plataniotis, KN; Venetsanopoulos, AN, Uncorrelated multilinear principal component analysis for unsupervised multilinear subspace learning, IEEE Trans Neural Netw, 20, 1820-1836, (2009)
[20] Inoue K, Hara K, Urahama K (2009) Robust multilinear principal component analysis. In: 12th international conference on computer vision 2009, pp 591-597, 2-29 oct 2009
[21] Lu, H; Plataniotis, KN; Venetsanopoulos, AN, A survey of multilinear subspace learning for tensor data, Pattern Recognit, 44, 1540-1551, (2011) · Zbl 1210.68083
[22] Cortes C, Mohri M (2003) AUC optimization vs. error rate minimization. In: Advances in neural information processing systems, vol 16. Banff, AL, Canada, pp 313-320
[23] Li, X; Zeng, J; Yan, H, PCA-HPR: a principle component analysis model for human promoter recognition, Bioinformation, 2, 373-378, (2008)
[24] Sonnenburg, S; Zien, A; Philips, P; Ratsch, G, POIMs: positional oligomer importance matrices-understanding support vector machine-based signal detectors, Bioinformatics, 24, i6-i14, (2008)
[25] Baldi P, Brunak S (2001) Bioinformatics: the machine learning approach. The MIT Press, 2nd ed. · Zbl 0992.92024
[26] Isaev A (2007) Introduction to mathematical methods in Bioinformatics. Springer, Secaucus · Zbl 1053.92022
[27] Wakaguri H, Yamashita R, Sugano SYS, Nakai K (2008) DBTSS: database of transcription start sites. Nucleic Acids Res 36:97-101 (no. Database-Issue)
[28] Saxonov, S; Daizadeh, I; Fedorov, A; Gilbert, W, EID: the exon-intron database—an exhaustive database of protein-coding intron-containing genes, Nucleic Acids Res., 28, 185-190, (2000)
[29] Ron, D; Singer, Y; Tishby, N, The power of amnesia: learning probabilistic automata with variable memory length, Mach Learning, 25, 117-149, (1996) · Zbl 0869.68066
[30] Tino, P; Dorffner, G, Predicting the future of discrete sequences from fractal representations of the past, Mach Learning, 45, 187-217, (2001) · Zbl 1020.68075
[31] Cross, S; Clark, V; Bird, A, Isolation of cpg islands from large genomic clones, Nucleic Acids Res, 27, 2099-2107, (1999)
[32] Bodén, M; Bailey, TL, Associating transcription factor-binding site motifs with target GO terms and target genes, Nucleic Acids Res, 36, 4108-4117, (2007)
[33] Ashburner, M; Ball, CA; Blake, JA; Botstein, D; Butler, H; Cherry, JM; Davis, AP; Dolinski, K; Dwight, SS; Eppig, JT; Harris, MA; Hill, DP; Issel-Tarver, L; Kasarskis, A; Lewis, S; Matese, JC; Richardson, JE; Ringwald, M; Rubin, GM; Sherlock, G, Gene ontology: tool for the unification of biology. the gene ontology consortium, Nat Genet, 25, 25-29, (2000)
[34] Straussman, R; Nejman, D; Roberts, D; Steinfeld, I; Blum, B; Benvenisty, N; Simon, I; Yakhini, Z; Cedar, H, Developmental programming of cpg island methylation profiles in the human genome, Nat Struct Mol Biol, 16, 564-571, (2009)
[35] Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS (2009) Meme suite: tools for motif discovery and searching. Nucleic Acids Res 37:W202-W208 (Web Server issue)
[36] Gupta S, Stamatoyannopoulos J, Bailey T, Noble W (2007) Quantifying similarity between motifs. Genome Biol8(2): R24
[37] Globerson, A; Roweis, S, Metric learning by collapsing classes, Adv Neural Info Process Syst, 18, 451-458, (2006)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.