×

Synchronous bidirectional inference for neural sequence generation. (English) Zbl 1435.68336

Summary: In sequence to sequence generation tasks (e.g. machine translation and abstractive summarization), inference is generally performed in a left-to-right manner to produce the result token by token. The neural approaches, such as LSTM and self-attention networks, are now able to make full use of all the predicted history hypotheses from left side during inference, but cannot meanwhile access any future (right side) information and usually generate unbalanced outputs (e.g. left parts are much more accurate than right ones in Chinese-English translation). In this work, we propose a synchronous bidirectional inference model to generate outputs using both left-to-right and right-to-left decoding simultaneously and interactively. First, we introduce a novel beam search algorithm that facilitates synchronous bidirectional decoding. Then, we present the core approach which enables left-to-right and right-to-left decoding to interact with each other, so as to utilize both the history and future predictions simultaneously during inference. We apply the proposed model to both LSTM and self-attention networks. Furthermore, we propose a novel fine-tuning based parameter optimization algorithm in addition to the simple two-pass strategy. The extensive experiments on machine translation and abstractive summarization demonstrate that our synchronous bidirectional inference model can achieve remarkable improvements over the strong baselines.

MSC:

68T50 Natural language processing
68T05 Learning and adaptive systems in artificial intelligence
68T07 Artificial neural networks and deep learning
68T20 Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.)
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Sutskever, I.; Vinyals, O.; Le, Q. V., Sequence to sequence learning with neural networks, (Proceedings of NIPS (2014))
[2] Bahdanau, D.; Cho, K.; Bengio, Y., Neural machine translation by jointly learning to align and translate, (Proceedings of ICLR (2015))
[3] Gehring, J.; Auli, M.; Grangier, D.; Yarats, D.; Dauphin, Y. N., Convolutional sequence to sequence learning, (Proceedings of ICML (2017))
[4] Vawani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, L.; Polosukhin, I., Attention is all you need, arXiv preprint
[5] Watanabe, T.; Sumita, E., Bidirectional decoding for statistical machine translation, (Proceedings of the 19th International Conference on Computational Linguistics, vol. 1 (2002)), 1-7
[6] Hochreiter, S.; Schmidhuber, J., Long short-term memory, Neural Comput., 9, 8, 1735-1780 (1997)
[7] Liu, L.; Finch, A. M.; Utiyama, M.; Sumita, E., Agreement on target-bidirectional LSTMs for sequence-to-sequence learning, (AAAI (2016)), 2630-2637
[8] Zhang, Z.; Wu, S.; Liu, S.; Li, M.; Zhou, M.; Chen, E., Regularizing neural machine translation by target-bidirectional agreement, preprint
[9] Liu, L.; Utiyama, M.; Finch, A.; Sumita, E., Agreement on target-bidirectional neural machine translation, (Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2016)), 411-416
[10] Wang, Y.; Cheng, S.; Jiang, L.; Yang, J.; Chen, W.; Li, M.; Shi, L.; Wang, Y.; Yang, H., Sogou neural machine translation systems for WMT17, (Proceedings of the Second Conference on Machine Translation (2017)), 410-415
[11] Zhang, X.; Su, J.; Qin, Y.; Liu, Y.; Ji, R.; Wang, H., Asynchronous bidirectional decoding for neural machine translation, (Proceedings of AAAI (2018))
[12] Zhou, L.; Zhang, J.; Zong, C., Synchronous bidirectional neural machine translation, (TACL (2019))
[13] Elbayad, M.; Besacier, L.; Verbeek, J., Pervasive attention: 2d convolutional neural networks for sequence-to-sequence prediction, (Proceedings of the 22nd Conference on Computational Natural Language Learning (2018)), 97-107
[14] Luong, M.-T.; Pham, H.; Manning, C. D., Effective approaches to attention-based neural machine translation, (Proceedings of EMNLP (2015))
[15] Sennrich, R.; Haddow, B.; Birch, A., Neural machine translation of rare words with subword units, (Proceedings of ACL (2016))
[16] Shen, S.; Cheng, Y.; He, Z.; He, W.; Wu, H.; Sun, M.; Liu, Y., Minimum risk training for neural machine translation, (Proceedings of ACL (2016))
[17] Zhou, J.; Cao, Y.; Wang, X.; Li, P.; Xu, W., Deep recurrent models with fast-forward connections for neural machine translation, arXiv preprint
[18] Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.-J., Bleu: a method for automatic evaluation of machine translation, (Proceedings of ACL (2002)), 311-318
[19] Koehn, P., Statistical significance tests for machine translation evaluation, (Proceedings of EMNLP (2004)), 388-395
[20] Wu, Y.; Schuster, M.; Chen, Z.; Le, Q. V.; Norouzi, M.; Macherey, W.; Krikun, M.; Cao, Y.; Gao, Q.; Macherey, K., Google’s neural machine translation system: bridging the gap between human and machine translation, arXiv preprint
[21] Napoles, C.; Gormley, M.; Van Durme, B., Annotated gigaword, (Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (2012)), 95-100
[22] Rush, A. M.; Chopra, S.; Weston, J., A neural attention model for abstractive sentence summarization, (Proceedings of EMNLP (2015))
[23] Zhou, Q.; Yang, N.; Wei, F.; Zhou, M., Selective encoding for abstractive sentence summarization, (Proceedings of ACL (2017))
[24] Lin, C.-Y., Rouge: a package for automatic evaluation of summaries, (Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, vol. 8 (2004))
[25] Nallapati, R.; Zhou, B.; Gulcehre, C.; Xiang, B., Abstractive text summarization using sequence-to-sequence RNNs and beyond, (Computational Natural Language Learning (2016))
[26] Chen, M. X.; Firat, O.; Bapna, A.; Johnson, M.; Macherey, W.; Foster, G.; Jones, L.; Parmar, N.; Schuster, M.; Chen, Z., The best of both worlds: combining recent advances in neural machine translation, (Proceedings of ACL (2018)), 76-86
[27] Shaw, P.; Uszkoreit, J.; Vaswani, A., Self-attention with relative position representations, (Proceedings of NAACL (2018)), 464-468
[28] Wu, F.; Fan, A.; Baevski, A.; Dauphin, Y. N.; Auli, M., Pay less attention with lightweight and dynamic convolutions, arXiv preprint
[29] Toutanova, K.; Klein, D.; Manning, C. D.; Singer, Y., Feature-rich part-of-speech tagging with a cyclic dependency network, (Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1 (2003)), 173-180
[30] Tsuruoka, Y.; Tsujii, J., Bidirectional inference with the easiest-first strategy for tagging sequence data, (Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (2005)), 467-474
[31] Shen, L.; Satta, G.; Joshi, A., Guided learning for bidirectional sequence classification, (Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (2007)), 760-767
[32] Serdyuk, D.; Ke, N. R.; Sordoni, A.; Trischler, A.; Pal, C.; Bengio, Y., Twin networks: matching the future for sequence generation, (International Conference on Learning Representations (2018))
[33] Sennrich, R.; Haddow, B.; Birch, A., Edinburgh neural machine translation systems for WMT 16, (Proceedings of the First Conference on Machine Translation: Vol. 2, Shared Task Papers (2016)), 371-376
[34] Sennrich, R.; Birch, A.; Currey, A.; Germann, U.; Haddow, B.; Heafield, K.; Barone, A. V.M.; Williams, P., The university of Edinburgh’s neural mt systems for WMT17, (Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers (2017)), 389-399
[35] Hoang, C. D.V.; Haffari, G.; Cohn, T., Towards decoding as continuous optimisation in neural machine translation, (Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (2017)), 146-156
[36] Tan, Z.; Wang, B.; Hu, J.; Chen, Y.; Shi, X., XMU neural machine translation systems for WMT 17, (Proceedings of the Second Conference on Machine Translation (2017)), 400-404
[37] Deng, Y.; Cheng, S.; Lu, J.; Song, K.; Wang, J.; Wu, S.; Yao, L.; Zhang, G.; Zhang, H.; Zhang, P., Alibaba’s neural machine translation systems for WMT18, (Proceedings of the Third Conference on Machine Translation: Shared Task Papers (2018)), 368-376
[38] Liu, Y.; Zhou, L.; Wang, Y.; Zhao, Y.; Zhang, J.; Zong, C., A comparable study on model averaging, ensembling and reranking in NMT, (CCF International Conference on Natural Language Processing and Chinese Computing (2018), Springer), 299-308
[39] Bahdanau, D.; Brakel, P.; Xu, K.; Goyal, A.; Lowe, R.; Pineau, J.; Courville, A.; Bengio, Y., An actor-critic algorithm for sequence prediction, (Proceedings of ICLR (2017))
[40] He, D.; Lu, H.; Xia, Y.; Qin, T.; Wang, L.; Liu, T.-Y., Decoding with value networks for neural machine translation, (Guyon, I.; Luxburg, U. V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; Garnett, R., Advances in Neural Information Processing Systems, vol. 30 (2017)), 178-187
[41] Li, J.; Monroe, W.; Jurafsky, D., Learning to decode for future success, preprint
[42] Xia, Y.; Tian, F.; Wu, L.; Lin, J.; Qin, T.; Yu, N.; Liu, T.-Y., Deliberation networks: sequence generation beyond one-pass decoding, (Advances in Neural Information Processing Systems (2017)), 1784-1794
[43] Zheng, Z.; Zhou, H.; Huang, S.; Mou, L.; Dai, X.; Chen, J.; Tu, Z., Modeling past and future for neural machine translation, (TACL, vol. 6 (2018)), 145-157
[44] Zhang, W.; Feng, Y.; Meng, F.; You, D.; Liu, Q., Bridging the gap between training and inference for neural machine translation, (Proceedings of ACL (2019)), 4334-4343
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.