×

Visualizing non-metric similarities in multiple maps. (English) Zbl 1238.68140

Summary: Techniques for multidimensional scaling visualize objects as points in a low-dimensional metric map. As a result, the visualizations are subject to the fundamental limitations of metric spaces. These limitations prevent multidimensional scaling from faithfully representing non-metric similarity data such as word associations or event co-occurrences. In particular, multidimensional scaling cannot faithfully represent intransitive pairwise similarities in a visualization, and it cannot faithfully visualize “central” objects. In this paper, we present an extension of a recently proposed multidimensional scaling technique called t-SNE. The extension aims to address the problems of traditional multidimensional scaling techniques when these techniques are used to visualize non-metric similarities. The new technique, called multiple maps t-SNE, alleviates these problems by constructing a collection of maps that reveal complementary structure in the similarity data. We apply multiple maps t-SNE to a large data set of word association data and to a data set of NIPS co-authorships, demonstrating its ability to successfully visualize non-metric similarities.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Banerjee, A.; Krumpelman, C.; Basu, S.; Mooney, R.; Ghosh, J., Model based overlapping clustering (2005)
[2] Belkin, M.; Niyogi, P., Laplacian Eigenmaps and spectral techniques for embedding and clustering, No. 14, 585-591 (2002)
[3] Belongie, S., Malik, J., & Puzicha, J. (2001). Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(4), 509-522. · doi:10.1109/34.993558
[4] Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993-1022. · Zbl 1112.68379
[5] Blei, D. M.; Griffiths, T. L.; Jordan, M. I.; Tenenbaum, J. B.; Thrun, S. (ed.); Saul, L. (ed.); Schölkopf, B. (ed.), Hierarchical topic models and the nested Chinese restaurant process, No. 16, 17-24 (2004), Cambridge
[6] Borg, I., & Groenen, P. J. F. (2005). Modern multidimensional scaling (2nd ed.). New York: Springer. · Zbl 1085.62079
[7] Bostock, M., Ogievetsky, V., & Heer, J. (2011). D3: Data-driven documents. IEEE Transactions on Visualization and Computer Graphics, 17(12), 2301-2309. · doi:10.1109/TVCG.2011.185
[8] Breitkreutz, B.-J., Stark, C., & Tyers, M. (2003). Osprey: a network visualization system. Genome Biology, 4(3), R22.1-R22.4.
[9] Carreira-Perpiñán, M. Á., The elastic embedding algorithm for dimensionality reduction, 167-174 (2010)
[10] Cayton, L.; Dasgupta, S., Robust Euclidean embedding, 169-176 (2006)
[11] Collobert, R.; Weston, J., A unified architecture for natural language processing: deep neural networks with multitask learning, 160-167 (2008) · doi:10.1145/1390156.1390177
[12] Cook, J. A., Sutskever, I., Mnih, A., & Hinton, G. E. (2007). Visualizing similarity data with a mixture of maps. JMLR Workshop and Conference Proceedings, 2, 67-74.
[13] Erhan, D., Bengio, Y., Courville, A., Manzagol, P.-A., Vincent, P., & Bengio, S. (2010) Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 11, 625-660. · Zbl 1242.68219
[14] Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315, 972-976. · Zbl 1226.94027 · doi:10.1126/science.1136800
[15] Gashi, I.; Stankovic, V.; Leita, C.; Thonnard, O., An experimental study of diversity with off-the-shelf antivirus engines, 4-11 (2009)
[16] Globerson, A.; Roweis, S., Visualizing pairwise similarity via semidefinite programming, 139-146 (2007)
[17] Globerson, A., Chechik, G., Pereira, F., & Tishby, N. (2007). Euclidean embedding of co-occurrence data. Journal of Machine Learning Research, 8, 2265-2295. · Zbl 1222.68203
[18] Griffiths, T. L., Steyvers, M., & Tenenbaum, J. L. (2007). Topics in semantic representation. Psychological Review, 114(2), 211-244. · doi:10.1037/0033-295X.114.2.211
[19] Heller, K. A.; Ghahramani, Z., A nonparametric Bayesian approach to modeling overlapping clusters (2007)
[20] Hinton, G. E.; Roweis, S. T., Stochastic neighbor embedding, No. 15, 833-840 (2003)
[21] Hofmann, T., Probabilistic latent semantic indexing, 50-57 (1999), New York
[22] Jacobs, R. A. (1988). Increased rates of convergence through learning rate adaptation. Neural Networks, 1, 295-307. · doi:10.1016/0893-6080(88)90003-2
[23] Jäkel, F., Schölkopf, B., & Wichmann, F. A. (2008). Similarity, kernels, and the triangle inequality. Journal of Mathematical Psychology, 52(2), 297-303. · Zbl 1152.91770 · doi:10.1016/j.jmp.2008.03.001
[24] Jamieson, A. R., Giger, M. L., Drukker, K., Li, H., Yuan, Y., & Bhooshan, N. (2010). Exploring nonlinear feature space dimension reduction and data representation in breast CADx with Laplacian Eigenmaps and t-SNE. Medical Physics, 37(1), 339-351. · doi:10.1118/1.3267037
[25] Keim, D. A., Kohlhammer, J., Ellis, G., & Mansmann, F. (2010). Mastering the information age; solving problems with visual analytics. Eurographics Association.
[26] Klimt, B., & Yang, Y. (2004). Lecture notes in computer science: Vol.3201. The Enron corpus: a new dataset for email classification research (pp. 217-226). · Zbl 1132.68562
[27] Kruskal, J. B., & Wish, M. (1986). Multidimensional scaling. Beverly Hills: Sage.
[28] Lacoste-Julien, S.; Sha, F.; Jordan, M. I., DiscLDA: Discriminative learning for dimensionality reduction and classification, No. 21, 897-904 (2009)
[29] Lafon, S., & Lee, A. B. (2006). Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9), 1393-1403. · doi:10.1109/TPAMI.2006.184
[30] Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104, 211-240. · doi:10.1037/0033-295X.104.2.211
[31] Laub, J., & Müller, K.-R. (2004). Feature discovery in non-metric pairwise data. Journal of Machine Learning Research, 5, 801-818. · Zbl 1222.68246
[32] Laub, J.; Macke, J.; Müller, K.-R.; Wichmann, F. A., Inducing metric violations in human similarity judgements, No. 19, 777-784 (2007)
[33] Lawrence, N. D., Spectral dimensionality reduction via maximum entropy, 51-59 (2011)
[34] Lund, K.; Burgess, C.; Atchley, R. A., Semantic and associative priming in high-dimensional semantic space, 660-665 (1995), Mahwah
[35] Mao, Y.; Balasubramanian, K.; Lebanon, G., Dimensionality reduction for text using domain knowledge, 801-809 (2010)
[36] McCallum, A., Multi-label text classification with a mixture model trained by em (1999), New York
[37] McCallum, A., Corrada-Emmanuel, A., & Wang, X. (2004). The author-recipient-topic model for topic and role discovery in social networks: experiments with Enron and academic email (Technical Report UM-CS-2004-096). Department of Computer Science, University of Massachusetts, Amherst, MA.
[38] Mnih, A.; Hinton, G. E., A scalable hierarchical distributed language model, 1081-1088 (2009)
[39] Nelson, D. L., McEvoy, C. L., & Schreiber, T. A. (1998). The University of South Florida word association, rhyme, and word fragment norms.
[40] Pekalska, E., & Duin, R. P. W. (2005). The dissimilarity representation for pattern recognition: foundations and applications. Singapore: World Scientific. · Zbl 1095.68105 · doi:10.1142/9789812703170
[41] Plaisant, C., The challenge of information visualization evaluation (2004)
[42] Rosen-Zvi, M.; Griffiths, T.; Steyversand, M.; Smyth, P., The author-topic model for authors and documents (2004), Arlington
[43] Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by Locally Linear Embedding. Science, 290(5500), 2323-2326. · doi:10.1126/science.290.5500.2323
[44] Sammon, J. W. (1969). A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, 18(5), 401-409. · doi:10.1109/T-C.1969.222678
[45] Schmidtlein, S., Zimmermann, P., Schüpferling, R., & Weiss, C. (2007). Mapping the floristic continuum: ordination space position estimated from imaging spectroscopy. Journal of Vegetation Science, 18, 131-140. · doi:10.1111/j.1654-1103.2007.tb02523.x
[46] Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge: MIT Press.
[47] Schölkopf, B., Smola, A. J., & Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), 1299-1319. · doi:10.1162/089976698300017467
[48] Shaw, B.; Jebara, T., Structure preserving embedding, 937-944 (2009)
[49] Steyvers, M., & Tenenbaum, J. B. (2005). The large-scale structure of semantic networks: statistical analyses and a model of semantic growth. Cognitive Science, 29(1), 41-78. · doi:10.1207/s15516709cog2901_3
[50] Teh, Y.; Jordan, M. I.; Beal, M.; Blei, D. M., Hierarchical Dirichlet processes, No. 17, 1385-1392 (2004), Cambridge
[51] Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319-2323. · Zbl 0955.37025 · doi:10.1126/science.290.5500.2319
[52] Thomas, J. J., & Cook, K. A. (2005). Illuminating the path: the research and development agenda for visual analytics.
[53] Thonnard, O.; Mees, W.; Dacier, M., Addressing the attack attribution problem using knowledge discovery and multi-criteria fuzzy decision-making, 11-21 (2009) · doi:10.1145/1599272.1599277
[54] Torgerson, W. S. (1952). Multidimensional scaling I: theory and method. Psychometrika, 17, 401-419. · Zbl 0049.37603 · doi:10.1007/BF02288916
[55] Tversky, A., & Hutchinson, J. W. (1986). Nearest neighbor analysis of psychological spaces. Psychological Review, 93(11), 3-22. · doi:10.1037/0033-295X.93.1.3
[56] Maaten, L. J. P., Learning a parametric embedding by preserving local structure, No. 5, 384-391 (2009)
[57] van der Maaten, L. J. P., & Hinton, G. E. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2431-2456. · Zbl 1225.68219
[58] Maaten, L. J. P.; Postma, E. O., Texton-based analysis of paintings, No. 7798-16 (2010)
[59] Venna, J., Peltonen, J., Nybo, K., Aidos, H., & Kaski, S. (2010). Information retrieval perspective to nonlinear dimensionality reduction for data visualization. Journal of Machine Learning Research, 11, 451-490. · Zbl 1242.62006
[60] Villmann, T., & Haase, S. (2010). Mathematical foundations of the generalization of t-SNE and SNE for arbitrary divergences (Technical Report 02/2010). University of Applied Sciences Mittweida.
[61] von Luxburg, U. (2010). Clustering stability: an overview. Foundations and Trends in Machine Learning, 2(3), 235-274. · Zbl 1191.68615
[62] Weinberger, K. Q.; Packer, B. D.; Saul, L. K., Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization (2005), Barbados
[63] Yang, Z.; King, I.; Oja, E.; Xu, Z., Heavy-tailed symmetric stochastic neighbor embedding, No. 22 (2010), Cambridge
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.