×

Correlational neural networks. (English) Zbl 1414.68044

Summary: Common representation learning (CRL), wherein different descriptions (or views) of the data are embedded in a common subspace, has been receiving a lot of attention recently. Two popular paradigms here are canonical correlation analysis (CCA)-based approaches and autoencoder (AE)-based approaches. CCA-based approaches learn a joint representation by maximizing correlation of the views when projected to the common subspace. AE-based methods learn a common representation by minimizing the error of reconstructing the two views. Each of these approaches has its own advantages and disadvantages. For example, while CCA-based approaches outperform AE-based approaches for the task of transfer learning, they are not as scalable as the latter. In this work, we propose an AE-based approach, correlational neural network (CorrNet), that explicitly maximizes correlation among the views when projected to the common subspace. Through a series of experiments, we demonstrate that the proposed CorrNet is better than AE and CCA with respect to its ability to learn correlated common representations. We employ CorrNet for several cross-language tasks and show that the representations learned using it perform better than the ones learned using other state-of-the-art approaches.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Akaho, S. (2001). A kernel method for canonical correlation analysis. In Proc. Int’l Meeting on Psychometric Society. · Zbl 1001.68659
[2] Andrew, G., Arora, R., Bilmes, J., & Livescu, K. (2013). Deep canonical correlation analysis. In Proceedings of the International Conference on Machine Learning.
[3] Arora, R., & Livescu, K. (2012). Kernel CCA for multi-view learning of acoustic features using articulatory measurements. In Proceedings of the 2012 Symposium on Machine Learning in Speech and Language Processing (pp. 34-37).
[4] Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., … Bengio, Y. (2010). Theano: A CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference.
[5] Bird Steven, E. L., & Klein, E. (2009). Natural language processing with Python. Cambridge, MA: OReilly Media. · Zbl 1187.68630
[6] Chandar, S., Lauly, S., Larochelle, H., Khapra, M. M., Ravindran, B., Raykar, V. C., & Saha, A. (2013). Multilingual deep learning. Presented at NIPS Deep Learning Workshop, Lake Tahoe, CA.
[7] Chandar, S., Lauly, S., Larochelle, H., Khapra, M. M., Ravindran, B., Raykar, V., & Saha, A. (2014). An autoencoder approach to learning bilingual word representations. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in neural information proceesing systems, 27 (pp. 1853-1861). Red Hook, NY: Curran.
[8] Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12, 2493-2537. · Zbl 1280.68161
[9] Cruz-Cano, R., & Lee, M.-L. T. (2014). Fast regularized canonical correlation analysis. Computational Statistics and Data Analysis 70, 88-100. , · Zbl 1471.62048
[10] Dauphin, Y., Glorot, X., & Bengio, Y. (2011). Large-scale learning of embeddings with reconstruction sampling. In Proceedings of the 28th International Conference on Machine Learning (pp. 945-952). Madison, WI: Omnipress.
[11] Dhillon, P., Foster, D., & Ungar, L. (2011). Multi-view learning of word embeddings via CCA. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in neural information processing systems, 24 (pp. 199-207). Red Hook, NY: Curran.
[12] Gao, J., He, X., Yih, W.-t., & Deng, L. (2014). Learning continuous phrase representations for translation modeling. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (vol. 1, pp. 699-709). Stroudsburg, PA: Association for Computational Linguistics. ,
[13] Hardoon, D. R., Szedmak, S., & Shawe-Taylor, J. (2004). Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16(12), 2639-2664. , · Zbl 1062.68134
[14] Hermann, K. M., & Blunsom, P. (2014a). Multilingual distributed representations without word alignment. In Proceedings of International Conference on Learning Representations. arXiv.
[15] Hermann, K. M., & Blunsom, P. (2014b). Multilingual models for compositional distributed semantics. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (vol. 1, pp. 58-68). Stroudsburg, PA: Association for Computational Linguistics. ,
[16] Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28, 321-377. , · Zbl 0015.40705
[17] Hsieh, W. (2000). Nonlinear canonical correlation analysis by neural networks. Neural Networks, 13(10), 1095-1105. ,
[18] Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR. Abs/1502.03167.
[19] King, D. E. (2009). Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research, 10, 1755-1758.
[20] Klementiev, A., Titov, I., & Bhattarai, B. (2012). Inducing crosslingual distributed representations of words. In Proceedings of the International Conference on Computational Linguistics.
[21] Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation. In Proceedings of the Machine Translation Summit.
[22] Kumaran, A., Khapra, M. M., & Li, H. (2010). Report of news 2010 transliteration mining shared task. In Proceedings of the 2010 Named Entities Workshop (pp. 21-28). Stroudsburg, PA: Association for Computational Linguistics.
[23] Li, H., Kumaran, A., Zhang, M., & Pervouvhine, V. (2009). Whitepaper of news 2009 machine transliteration shared task. In Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (pp. 19-26). Stroudsburg, PA: Association for Computational Linguistics. ,
[24] Lu, A., Wang, W., Bansal, M., Gimpel, K., & Livescu, K. (2015). Deep multilingual correlation for improved word embeddings. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics. ,
[25] Lu, Y., & Foster, D. P. (2014). large scale canonical correlation analysis with iterative least squares. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in neural information processing systems, 27 (pp. 91-99). Red Hook, NY: Curran.
[26] Luo, Y., Tao, D., Wen, Y., Ramamohanarao, K., & Xu, C. (2015). Tensor canonical correlation analysis for multi-view dimension reduction. arXiv:1502.02330.
[27] Mikolov, T., Le, Q., & Sutskever, I. (2013). Exploiting similarities among languages for machine translation (Technical report). arXiv.
[28] Mishra, S. (2009). Representation-constrained canonical correlation analysis: A hybridization of canonical correlation and principal component analyses. Journal of Applied Economic Sciences, 4, 115-124.
[29] Mitchell, J., & Lapata, M. (2010). Composition in distributional models of semantics. Cognitive Science, 34(8), 1388-1429. ,
[30] Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., & Andrew, N. (2011). Multimodal deep learning. In Proceedings of the International Conference on Machine Learning.
[31] Nielsen, F. Å., Hansen, L. K., & Strother, S. C. (1998). Canonical ridge analysis with ridge parameter optimization. NeuroImage, 7, 5758.
[32] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … Duchesnay, E. (2011). SCIKIT-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825-2830. · Zbl 1280.68189
[33] Soyer, H., Stenetorp, P., & Aizawa, A. (2015). Leveraging monolingual data for crosslingual compositional word representations. In Proceedings of the 3rd International Conference on Learning Representations. New York: ACM.
[34] Tenenhaus, A., & Tenenhaus, M. (2011). Regularized generalized canonical correlation analysis. Psychometrika, 76(2), 257-284. , · Zbl 1284.62753
[35] Udupa, R., & Khapra, M. M. (2010). Transliteration equivalence using canonical correlation analysis. In Proceedings of the 32nd European Conference on IR Research (pp. 75-86). New York: Springer. ,
[36] Vinod, H. (1976). Canonical ridge and econometrics of joint production. Journal of Econometrics, 4(2), 147-166. , · Zbl 0331.62079
[37] Wang, W., Arora, R., Livescu, K., & Bilmes, J. (2015). On deep multi-view representation learning. In Proceedings of the International Conference on Machine Learning.
[38] Zou, W. Y., Socher, R., Cer, D., & Manning, C. D. (2013). Bilingual word embeddings for phrase-based machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. New York: ACM.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.