×

Parametric UMAP embeddings for representation and semisupervised learning. (English) Zbl 1522.68480

Summary: UMAP is a nonparametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) computing a graphical representation of a data set (fuzzy simplicial complex) and (2) through stochastic gradient descent, optimizing a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that parametric UMAP performs comparably to its nonparametric counterpart while conferring the benefit of a learned parametric mapping (e.g., fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semisupervised learning by capturing structure in unlabeled data.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Amid, E., & Warmuth, M. K. (2019). Trimap: Large-scale dimensionality reduction using triplets. arXiv:1910.00204.
[2] Becht, E., McInnes, L., Healy, J., Dutertre, C.-A., Kwok, I. W., Ng, L. G., … Newell, E. W. (2019). Dimensionality reduction for visualizing single-cell data using UMAP. Nature Biotechnology, 37(1), 38-44.
[3] Berthelot, D., Carlini, N., Cubuk, E. D., Kurakin, A., Sohn, K., Zhang, H., & Raffel, C. (2020). Remixmatch: Semi-supervised learning with distribution matching and augmentation anchoring. In Proceedings of the International Conference on Learning Representations.
[4] Brown, A. E., & De Bivort, B. (2018). Ethology as a physical science. Nature Physics, 14(7), 653-657.
[5] Bunte, K., Biehl, M., & Hammer, B. (2012). A general framework for dimensionality-reducing data visualization mapping. Neural Computation, 24(3), 771-804. · Zbl 1238.68117
[6] Carter, S., Armstrong, Z., Schubert, L., Johnson, I., & Olah, C. (2019). Activation atlas. Distill, 4(3), e15.
[7] De Silva, V., & Tenenbaum, J. B. (2003). Global versus local methods in nonlinear dimensionality reduction. In S. Becker, S. Thrun, & K. Overmayer (Eds.), Advances in neural information processing systems, 15(pp. 721-728). Red Hook, NY: Curran.
[8] Ding, J., Condon, A., & Shah, S. P. (2018). Interpretable dimensionality reduction of single cell transcriptome data with deep generative models. Nature Communications, 9(1), 1-13. []
[9] Ding, J., & Regev, A. (2019). Deep generative model embedding of single-cell RNA-Seq profiles on hyperspheres and hyperbolic spaces. bioRxiv:853457.
[10] Dong, W., Moses, C., & Li, K. (2011). Efficient k-nearest neighbor graph construction for generic similarity measures. In Proceedings of the 20th International Conference on World Wide Web (pp. 577-586). New York: ACM.
[11] Duque, A. F., Morin, S., Wolf, G., & Moon, K. R. (2020). Extendable and invertible manifold learning with geometry regularized autoencoders. arXiv:2007.07142.
[12] Gisbrecht, A., Lueks, W., Mokbel, B., & Hammer, B. (2012). Out-of-sample kernel extensions for nonparametric dimensionality reduction. In Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence, and Machine Learning.
[13] Gisbrecht, A., Schulz, A., & Hammer, B. (2015). Parametric nonlinear dimensionality reduction using kernel t-SNE. Neurocomputing, 147, 71-82.
[14] Graving, J. M., & Couzin, I. D. (2020). Vae-SNE: A deep generative model for simultaneous dimensionality reduction and clustering. bioRxiv.
[15] Hedley, R. W. (2016a). Complexity, predictability and time homogeneity of syntax in the songs of Cassin’s Vireo (Vireo cassinii). PLOS One, 11(4), e0150822. []
[16] Hedley, R. W. (2016b). Composition and sequential organization of song repertoires in Cassin’s Vireo (Vireo cassinii). Journal of Ornithology, 157(1), 13-22.
[17] Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504-507. [] · Zbl 1226.68083
[18] Hofer, C., Kwitt, R., Niethammer, M., & Dixit, M. (2019). Connectivity-optimized representation learning via persistent homology. In Proceedings of the International Conference on Machine Learning (pp. 2751-2760). Berlin: Springer.
[19] Huang, X., Liu, M.-Y., Belongie, S., & Kautz, J. (2018). Multimodal unsupervised image-to-image translation. In Proceedings of the European Conference on Computer Vision (pp. 172-189). Berlin: Springer.
[20] Jia, K., Sun, L., Gao, S., Song, Z., & Shi, B. E. (2015). Laplacian auto-encoders: An explicit learning of nonlinear data manifold. Neurocomputing, 160, 250-260.
[21] Kingma, D. P., & Welling, M. (2013). Auto-encoding variational Bayes. arXiv:1312.6114.
[22] KlugerLab. (2020). Open source survey. , commit=4f57d6a0e4c030202a07a60bc1bb1ed1544bf679.
[23] Kobak, D., & Linderman, G. C. (2021). Initialization is critical for preserving global data structure in both t-SNE and UMAP. Nature Biotechnology, 1-2.
[24] Lee, J. A., Peluffo-Ordóñez, D. H., & Verleysen, M. (2015). Multi-scale similarities in stochastic neighbour embedding: Reducing dimensionality while preserving both local and global structure. Neurocomputing, 169, 246-261.
[25] Linderman, G. C., Rachh, M., Hoskins, J. G., Steinerberger, S., & Kluger, Y. (2017). Efficient algorithms for t-distributed stochastic neighborhood embedding. arXiv:1712.09005.
[26] Linderman, G. C., Rachh, M., Hoskins, J. G., Steinerberger, S., & Kluger, Y. (2019). Fast interpolation-based t-SNE for improved visualization of single-cell RNA-Seq data. Nature Methods, 16(3), 243-245.
[27] Macosko, E. Z., Basu, A., Satija, R., Nemesh, J., Shekhar, K., Goldman, M., … McCarroll, S. A. (2015). Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell, 161(5), 1202-1214. []
[28] McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv:1802.03426.
[29] Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in neural information processing systems, 26 (pp. 3111-3119). Red Hook, NY: Curran.
[30] Mishne, G., Shaham, U., Cloninger, A., & Cohen, I. (2019). Diffusion nets. Applied and Computational Harmonic Analysis, 47(2), 259-285. · Zbl 1430.68278
[31] Moon, K. R., van Dijk, D., Wang, Z., Gigante, S., Burkhardt, D. B., Chen, W. S., … Krishnaswamy (2019). Visualizing structure and transitions in high-dimensional biological data. Nature Biotechnology, 37(12), 1482-1492.
[32] Moor, M., Horn, M., Rieck, B., & Borgwardt, K. (2020). Topological autoencoders. In Proceedings of the International Conference on Machine Learning.
[33] Oliver, A., Odena, A., Raffel, C. A., Cubuk, E. D., & Goodfellow, I. (2018). Realistic evaluation of deep semi-supervised learning algorithms. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.), Advances in neural information processing systems, 31 (pp. 3235-3246). Red Hook, NY: Curran.
[34] Pai, G., Talmon, R., Bronstein, A., & Kimmel, R. (2019). DIMAL: Deep isometric manifold learning using sparse geodesic sampling. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (pp. 819-828). Piscataway, NJ: IEEE.
[35] Pandarinath, C., O’Shea, D. J., Collins, J., Jozefowicz, R., Stavisky, S. D., Kao, J. C., … Sussilo, D. (2018). Inferring single-trial neural population dynamics using sequential auto-encoders. Nature Methods, 15(10), 805-815.
[36] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … Cournapeau, D. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825-2830. · Zbl 1280.68189
[37] Poličar, P. G., Stražar, M., & Zupan, B. (2019). openTSNE: A modular Python library for t-SNE dimensionality reduction and embedding. bioRxiv.
[38] Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434.
[39] Robinson, I. (2020). Interpretable visualizations with differentiating embedding networks. arXiv:2006.06640.
[40] Sainburg, T., Thielk, M., & Gentner, T. Q. (2019). Latent space visualization, characterization, and generation of diverse vocal communication signals. doi:
[41] Sainburg, T., Thielk, M., Theilman, B., Migliori, B., & Gentner, T. (2018). Generative adversarial interpolative autoencoding: Adversarial training on latent space interpolations encourage convex latent distributions. arXiv:1807.06650.
[42] Sajjadi, M., Javanmardi, M., & Tasdizen, T. (2016). Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in neural information processing systems, 29 (pp. 1163-1171). Red Hook, NY: Curran.
[43] Schulz, A., Hinder, F., & Hammer, B. (2019). Deepview: Visualizing classification boundaries of deep neural networks as scatter plots using discriminative dimensionality reduction. arXiv:1909.09154.
[44] Sohn, K., Berthelot, D., Li, C.-L., Zhang, Z., Carlini, N., Cubuk, E. D., … Raffel, C. (2020). Fixmatch: Simplifying semi-supervised learning with consistency and confidence. arXiv:2001.07685.
[45] Szubert, B., Cole, J. E., Monaco, C., & Drozdov, I. (2019). Structure-preserving visualisation of high dimensional single-cell datasets. Scientific Reports, 9(1), 1-10.
[46] Tang, J., Liu, J., Zhang, M., & Mei, Q. (2016). Visualizing large-scale and high-dimensional data. In Proceedings of the 25th International Conference on World Wide Web (pp. 287-297). New York: ACM.
[47] van der Maaten, L. (2009). Learning a parametric embedding by preserving local structure. In Proceedings on the 12th International Conference on Artificial Intelligence and Statistics (pp. 384-391).
[48] van der Maaten, L. (2014). Accelerating t-SNE using tree-based algorithms. Journal of Machine Learning Research, 15(1), 3221-3245. · Zbl 1319.62134
[49] van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579-2605. · Zbl 1225.68219
[50] Venna, J., & Kaski, S. (2006). Local multidimensional scaling. Neural Networks, 19(6-7), 889-899. · Zbl 1102.68601
[51] White, T. (2016). Sampling generative networks. arXiv:1609.04468.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.