×

Diffusion nets. (English) Zbl 1430.68278

Summary: Non-linear manifold learning enables high-dimensional data analysis, but requires out-of-sample-extension methods to process new data points. In this paper, we propose a manifold learning algorithm based on deep learning to create an encoder, which maps a high-dimensional dataset to its low-dimensional embedding, and a decoder, which takes the embedded data back to the high-dimensional space. Stacking the encoder and decoder together constructs an autoencoder, which we term a diffusion net, that performs out-of-sample-extension as well as outlier detection. We introduce new neural net constraints for the encoder, which preserve the local geometry of the points, and we prove rates of convergence for the encoder. Also, our approach is efficient in both computational complexity and memory requirements, as opposed to previous methods that require storage of all training points in both the high-dimensional and the low-dimensional spaces to calculate the out-of-sample-extension and the pre-image of new points.

MSC:

68T07 Artificial neural networks and deep learning

Software:

MNIST; darch
PDF BibTeX XML Cite
Full Text: DOI arXiv

References:

[1] Schölkopf, B.; Smola, A.; Müller, K.-R., Nonlinear component analysis as a kernel eigenvalue problem, Neural Comput., 10, 5, 1299-1319 (1998)
[2] Tenenbaum, J. B.; de Silva, V.; Langford, J. C., A global geometric framework for nonlinear dimensionality reduction, Science, 290, 5500, 2319-2323 (Dec. 2000)
[3] Roweis, S. T.; Saul, L. K., Nonlinear dimensionality reduction by locally linear embedding, Science, 290, 2323-2326 (2000)
[4] Belkin, M.; Niyogi, P., Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput., 15, 6, 1373-1396 (2003) · Zbl 1085.68119
[5] Donoho, D. L.; Grimes, C., Hessian eigenmaps: new locally linear embedding techniques for high-dimensional data, Proc. Natl. Acad. Sci. USA, 100, 5591-5596 (2003) · Zbl 1130.62337
[6] Coifman, R. R.; Lafon, S., Diffusion maps, Appl. Comput. Harmon. Anal., 21, 1, 5-30 (July 2006)
[7] Bengio, Y.; Paiement, J.-F.; Vincent, P.; Delalleau, O.; Roux, N. L.; Ouimet, M., Out-of-sample extensions for LLE, Isomap, MDS, eigenmaps, and spectral clustering, (Thrun, S.; Saul, L.; Schölkopf, B., Advances in Neural Information Processing Systems 16 (2004), MIT Press), 177-184
[8] Fowlkes, C.; Belongie, S.; Chung, F.; Malik, J., Spectral grouping using the Nyström method, IEEE Trans. Pattern Anal. Mach. Intell., 26, 2, 214-225 (Jan. 2004)
[9] Coifman, R. R.; Lafon, S., Geometric harmonics: a novel tool for multiscale out-of-sample extension of empirical functions, Appl. Comput. Harmon. Anal., 21, 31-52 (2006) · Zbl 1095.68095
[10] Lafon, S.; Keller, Y.; Coifman, R. R., Data fusion and multicue data matching by diffusion maps, IEEE Trans. Pattern Anal. Mach. Intell., 28, 11, 1784-1797 (Nov. 2006)
[11] Rabin, N.; Coifman, R. R., Heterogeneous datasets representation and learning using diffusion maps and Laplacian pyramids, (Proc. 12th SIAM International Conference on Data Mining (2012))
[12] Fernández-Pascual, A.; Rabin, N.; Dorronsoro, J. R., Auto-adaptative Laplacian pyramids for high-dimensional data analysis, (European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2016 (April 2016))
[13] Mousazadeh, S.; Cohen, I., Out-of-sample extension of band-limited functions on homogeneous manifolds using diffusion maps, Signal Process., 108, 521-529 (2015)
[14] Aizenbud, Y.; Bermanis, A.; Averbuch, A., PCA-based out-of-sample extension for dimensionality reduction (2015), eprint
[15] He, J.; Zhang, L.; Wang, Q.; Li, Z., Using diffusion geometric coordinates for hyperspectral imagery representation, IEEE Geosci. Remote Sens. Lett., 6, 4, 767-771 (Oct. 2009)
[16] Farbman, Z.; Fattal, R.; Lischinski, D., Diffusion maps for edge-aware image editing, ACM Trans. Graph., 29, 6, 145:1-145:10 (Dec. 2010)
[17] Hinton, G.; Osindero, S.; Teh, Y., A fast learning algorithm for deep belief nets, Neural Comput., 18, 7, 1527-1554 (July 2006)
[18] Hinton, G. E.; Salakhutdinov, R. R., Reducing the dimensionality of data with neural networks, Science, 313, 5786, 504-507 (2006) · Zbl 1226.68083
[19] Bengio, Y., Learning deep architectures for AI, Found. Trends Mach. Learn., 2, 1, 1-127 (Jan. 2009), [Online]. Available
[20] Bengio, Y.; Courville, A.; Vincent, P., Representation learning: a review and new perspectives, IEEE Trans. Pattern Anal. Mach. Intell., 35, 8, 1798-1828 (Aug. 2013)
[21] Vincent, P.; Larochelle, H.; Bengio, Y.; Manzagol, P.-A., Extracting and composing robust features with denoising autoencoders, (Proceedings of the 25th International Conference on Machine Learning. Proceedings of the 25th International Conference on Machine Learning, ICML-08 (2008), ACM), 1096-1103
[22] Rifai, S.; Vincent, P.; Muller, X.; Glorot, X.; Bengio, Y., Contractive auto-encoders: explicit invariance during feature extraction, (Proceedings of the 28th International Conference on Machine Learning. Proceedings of the 28th International Conference on Machine Learning, ICML-11 (2011)), 833-840
[23] Jia, K.; Sun, L.; Gao, S.; Song, Z.; Shi, B. E., Laplacian auto-encoders: an explicit learning of nonlinear data manifold, Neurocomputing, 160, 250-260 (2015)
[24] Kwok, J. T.-Y.; Tsang, I. W.-H., The pre-image problem in kernel methods, IEEE Trans. Neural Netw., 15, 6, 1517-1525 (2004)
[25] Arias, P.; Randall, G.; Sapiro, G., Connecting the out-of-sample and pre-image problems in kernel methods, (IEEE Conference on Computer Vision and Pattern Recognition. IEEE Conference on Computer Vision and Pattern Recognition, CVPR’07 (2007), IEEE), 1-8
[26] Singer, A.; Shkolnisky, Y.; Nadler, B., Diffusion interpretation of nonlocal neighborhood filters for signal denoising, SIAM J. Imaging Sci., 2, 1, 118-139 (Jan. 2009)
[27] Talmon, R.; Cohen, I.; Gannot, S., Single-channel transient interference suppression with diffusion maps, IEEE/ACM Trans. Audio Speech Lang. Process., 21, 1, 130-142 (Apr. 2012)
[28] David, G.; Averbuch, A., Hierarchical data organization, clustering and denoising via localized diffusion folders, Appl. Comput. Harmon. Anal., 33, 1, 1-23 (2012) · Zbl 1239.68060
[29] Gepshtein, S.; Keller, Y., Image completion by diffusion maps and spectral relaxation, IEEE Trans. Image Process., 22, 8, 2839-2846 (2013)
[30] Mishne, G.; Cohen, I., Multiscale anomaly detection using diffusion maps, IEEE J. Sel. Top. Signal Process., 7, 111-123 (Feb. 2013)
[31] Haddad, A.; Kushnir, D.; Coifman, R. R., Texture separation via a reference set, Appl. Comput. Harmon. Anal., 36, 2, 335-347 (Mar. 2014)
[32] Coifman, R. R.; Hirn, M. J., Diffusion maps for changing data, Appl. Comput. Harmon. Anal., 36, 1, 79-107 (2014) · Zbl 1302.62122
[33] Zelnik-Manor, L.; Perona, P., Self-tuning spectral clustering, NIPS, 17, 1601-1608 (2005)
[34] Singer, A.; Coifman, R. R., Non-linear independent component analysis with diffusion maps, Appl. Comput. Harmon. Anal., 25, 2, 226-239 (2008) · Zbl 1144.62044
[35] Talmon, R.; Cohen, I.; Gannot, S.; Coifman, R. R., Supervised graph-based processing for sequential transient interference suppression, IEEE/ACM Trans. Audio Speech Lang. Process., 20, 9, 2528-2538 (2012)
[36] Talmon, R.; Coifman, R. R., Empirical intrinsic geometry for nonlinear modeling and time series filtering, Proc. Natl. Acad. Sci., 110, 31, 12 535-12 540 (2013)
[37] Mishne, G.; Talmon, R.; Cohen, I., Graph-based supervised automatic target detection, IEEE Trans. Geosci. Remote Sens., 53, 5, 2738-2754 (May 2015)
[38] Rojas, R., Neural Networks: A Systematic Introduction (1996), Springer Science & Business Media · Zbl 0861.68072
[39] Weiss, Y., Segmentation using eigenvectors: a unifying view, (The Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999, vol. 2 (1999), IEEE), 975-982
[40] Zhao, B.; Fei-Fei, L.; Xing, E., Online detection of unusual events in videos via dynamic sparse coding, (Proc. Computer Vision and Pattern Recognition. Proc. Computer Vision and Pattern Recognition, CVPR, 2011 (2011), IEEE), 3313-3320
[41] Le, Q. V.; Ngiam, J.; Coates, A.; Lahiri, A.; Prochnow, B.; Ng, A. Y., On optimization methods for deep learning, (Proceedings of the 28th International Conference on Machine Learning. Proceedings of the 28th International Conference on Machine Learning, ICML-11 (2011)), 265-272
[42] Wright, S. J.; Nocedal, J., Numerical Optimization, Vol. 2 (1999), Springer: Springer New York · Zbl 0930.65067
[43] Bentley, J. L., Multidimensional binary search trees used for associative searching, Commun. ACM, 18, 9, 509-517 (1975) · Zbl 0306.68061
[44] Jones, P. W.; Osipov, A.; Rokhlin, V., Randomized approximate nearest neighbors algorithm, Proc. Natl. Acad. Sci. (PNAS), 108, 38, 15 679-15 686 (Sep. 2011)
[45] Kawaguchi, K., Deep learning without poor local minima, (Lee, D. D.; Sugiyama, M.; Luxburg, U. V.; Guyon, I.; Garnett, R., Advances in Neural Information Processing Systems 29 (2016), Curran Associates, Inc.), 586-594
[46] LeCun, Y.; Cortes, C.; Burges, C. J., The Mnist Database of Handwritten Digits (1998)
[47] Barron, A., Universal approximation bounds for superpositions of a sigmoidal function, IEEE Trans. Inform. Theory, 39, 3, 930-945 (May 1993)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.