Neural Machine Translation by Jointly Learning to Align and Translate

Abstract

Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.

Keywords

Machine translationComputer scienceTransfer-based machine translationExample-based machine translationSentenceBottleneckArtificial intelligenceTranslation (biology)Artificial neural networkNatural language processingEncoderPhraseWord (group theory)Speech recognition

Affiliated Institutions

Related Publications

Attention Is All You Need

Ashish Vaswani , Noam Shazeer , Niki Parmar +5 more

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also co...

2025 6466 citations

UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation

Zongwei Zhou , Md Mahfuzur Rahman Siddiquee , Nima Tajbakhsh +1 more

The state-of-the-art models for medical image segmentation are variants of U-Net and fully convolutional networks (FCN). Despite their success, these models have two limitations...

2019 IEEE Transactions on Medical Imaging 3567 citations

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Liang-Chieh Chen , Yukun Zhu , George Papandreou +2 more

Spatial pyramid pooling module or encode-decoder structure are used in deep neural networks for semantic segmentation task. The former networks are able to encode multi-scale co...

2018 Lecture notes in computer science 13300 citations

UNet++: A Nested U-Net Architecture for Medical Image Segmentation

Zongwei Zhou , Md Mahfuzur Rahman Siddiquee , Nima Tajbakhsh +1 more

In this paper, we present UNet++, a new, more powerful architecture for medical image segmentation. Our architecture is essentially a deeply-supervised encoder-decoder network w...

2018 Lecture notes in computer science 7871 citations

Low-density parity-check codes

Robert G. Gallager

A low-density parity-check code is a code specified by a parity-check matrix with the following properties: each column contains a small fixed number <tex xmlns:mml="http://www....

1962 IEEE Transactions on Information Theory 10397 citations

Publication Info

Year: 2014
Type: preprint
Citations: 14564
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Neural Machine Translation by Jointly Learning to Align and Translate

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

14564

OpenAlex

Cite This

APA Style

                            
                                    Dzmitry Bahdanau, 
                                
                                    Kyunghyun Cho, 
                                
                                    Yoshua Bengio
                                
                            (2014). 
                            Neural Machine Translation by Jointly Learning to Align and Translate. 
                            arXiv (Cornell University)
                            
                            .
                            https://doi.org/10.48550/arxiv.1409.0473

Identifiers

DOI: 10.48550/arxiv.1409.0473