GNMT swMATH ID: 26579 Software Authors: Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Jeffrey Dean Description: Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT’s use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google’s Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using attention and residual connections. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units (”wordpieces”) for both input and output. This method provides a good balance between the flexibility of ”character”-delimited models and the efficiency of ”word”-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. On the WMT’14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60 Homepage: https://arxiv.org/abs/1609.08144 Source Code: https://github.com/mcdavid109/Google-Neural-Machine-Translation-GNMT- Keywords: Computation and Language; arXiv_cs.CL; Artificial Intelligence; arXiv_cs.AI; Machine Learning; arXiv_cs.LG; Neural Machine Translation; NMT Related Software: ImageNet; Adam; AlexNet; BERT; DeepFace; TensorFlow; PyTorch; BLEU; Tensor2Tensor; CIFAR; AdaGrad; GLUE; RoBERTa; ALBERT; Moses; OpenNMT; Python; DGM; WaveNet; GitHub Cited in: 26 Publications Standard Articles 1 Publication describing the Software Year Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Jeffrey Dean 2016 all top 5 Cited by 96 Authors 2 Liang, Liang 2 Liu, Minliang 2 Sirignano, Justin A. 2 Spiliopoulos, Konstantinos V. 2 Sun, Wei 1 Alyahya, Khulood 1 Auli, Michael 1 Baines, Mandeep 1 Bartlett, Peter L. 1 Benigni, Lucas 1 Bhosale, Shruti 1 Birch, Tom 1 Butucea, Cristina 1 Byeon, Wonmin 1 Celebi, Onur 1 Chaudhary, Vishrav 1 Chen, Hengjie 1 Chen, Mengqiang 1 Cook, Diane J. 1 Cruz, Meenalosini Vimal 1 Dash, Tirtharaj 1 Daubechies, Ingrid Chantal 1 DeVore, Ronald A. 1 Ding, Man 1 Duarte, Diogo 1 Duarte, Victor 1 Duraisamy, Karthik 1 Durlofsky, Louis J. 1 Edunov, Sergey 1 El-Kishky, Ahmed 1 Fan, Angela 1 Fan, Jianqing 1 Fonseca, Julia 1 Foucart, Simon 1 Ghica, Dan R. 1 Ghods, Alireza 1 Gillingham, Matt 1 Goyal, Naman 1 Goyal, Siddharth 1 Guo, Binbin 1 Guo, Tiande 1 Han, Congying 1 Hanin, B. 1 Hu, Changwei 1 Hu, Yifan 1 Jagtap, Ameya D. 1 Joseph, Elizabeth 1 Joulin, Armand 1 Karniadakis, George Em 1 Kasturi, Tejaswi 1 Kharazmi, Ehsan 1 Kool, Wouter 1 Koumoutsakos, Petros D. 1 Kuang, Di 1 Lapata, Mirella 1 Li, Zhong 1 Liptchinsky, Vitaliy 1 Liu, Yimin 1 Ma, Cong 1 Ma, Zhiyi 1 Mei, Yuan 1 Montecinos, Alexis 1 Namburu, Anupama 1 P., Mangalraj 1 Péché, Sandrine 1 Perez-Beltrachini, Laura 1 Petrova, Guergana 1 Qiu, Kexin 1 R., Nandha Kumar 1 S., Sudhakar Ilango 1 Sapsis, Themistoklis P. 1 Schmidt-Hieber, Johannes 1 Schwenk, Holger 1 Sethuraman, Sibi Chakkaravarthy 1 Srinivasan, Ashwin 1 Sun, Chengjie 1 Tang, Meng 1 Tripathy, Jatin Karthik 1 van Hoof, Herke 1 Vijayakumar, Vaidehi 1 Vlachas, Pantelis R. 1 Wan, Zhong Yi 1 Wang, Baoxun 1 Welling, Max 1 Wenzek, Guillaume 1 Wu, Weigang 1 Xiao, Danyang 1 Xu, Jiayang 1 Xu, Zhen 1 Zhang, Deyuan 1 Zhang, Huan 1 Zhang, Jiajun 1 Zhao, Yang 1 Zhong, Yiqiao 1 Zhou, Long 1 Zong, Chengqing all top 5 Cited in 20 Serials 4 Computer Methods in Applied Mechanics and Engineering 3 Journal of Machine Learning Research (JMLR) 2 Data Mining and Knowledge Discovery 1 Artificial Intelligence 1 Journal of Computational Physics 1 Information Sciences 1 Mathematics of Operations Research 1 Constructive Approximation 1 Statistical Science 1 Journal of Economic Dynamics & Control 1 Machine Learning 1 SIAM Journal on Applied Mathematics 1 The Journal of Artificial Intelligence Research (JAIR) 1 Electronic Journal of Probability 1 International Journal of Wavelets, Multiresolution and Information Processing 1 Oberwolfach Reports 1 Discrete Mathematics, Algorithms and Applications 1 Computer Science Review 1 Journal of Logical and Algebraic Methods in Programming 1 Proceedings of the Royal Society of London. A. Mathematical, Physical and Engineering Sciences all top 5 Cited in 15 Fields 19 Computer science (68-XX) 7 Statistics (62-XX) 6 Biology and other natural sciences (92-XX) 3 Probability theory and stochastic processes (60-XX) 3 Fluid mechanics (76-XX) 3 Statistical mechanics, structure of matter (82-XX) 2 Approximations and expansions (41-XX) 2 Numerical analysis (65-XX) 2 Mechanics of deformable solids (74-XX) 2 Operations research, mathematical programming (90-XX) 1 General and overarching topics; collections (00-XX) 1 Linear and multilinear algebra; matrix theory (15-XX) 1 Geophysics (86-XX) 1 Game theory, economics, finance, and other social and behavioral sciences (91-XX) 1 Systems theory; control (93-XX) Citations by Year