Convolutional Sequence to Sequence Learning

Jonas Gehring; Michael Auli; David Grangier; Denis Yarats; Yann Dauphin

doi:10.48550/arxiv.1705.03122

Abstract

The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks. We introduce an architecture based entirely on convolutional neural networks. Compared to recurrent models, computations over all elements can be fully parallelized during training and optimization is easier since the number of non-linearities is fixed and independent of the input length. Our use of gated linear units eases gradient propagation and we equip each decoder layer with a separate attention module. We outperform the accuracy of the deep LSTM setup of Wu et al. (2016) on both WMT'14 English-German and WMT'14 English-French translation at an order of magnitude faster speed, both on GPU and CPU.

Keywords

Sequence (biology)Computer scienceSequence learningArtificial intelligenceBiologyGenetics

Related Publications

Attention Is All You Need

Ashish Vaswani , Noam Shazeer , Niki Parmar +5 more

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also co...

2025 6466 citations

Sequence to Sequence Learning with Neural Networks

Ilya Sutskever , Oriol Vinyals , Quoc V. Le

Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training set...

2014 arXiv (Cornell University) 13295 citations

Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation

Jie Zhou , Ying Cao , Xuguang Wang +2 more

Neural machine translation (NMT) aims at solving machine translation (MT) problems using neural networks and has exhibited promising results in recent years. However, most of th...

2016 Transactions of the Association for C... 223 citations

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Yonghui Wu , Mike Schuster , Zhifeng Chen +28 more

Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based...

2016 arXiv (Cornell University) 5624 citations

A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures

Yong Yu , Xiaosheng Si , Changhua Hu +1 more

Recurrent neural networks (RNNs) have been widely adopted in research areas concerned with sequential data, such as text, audio, and video. However, RNNs consisting of sigma cel...

2019 Neural Computation 4793 citations

Publication Info

Year: 2017
Type: preprint
Citations: 1896
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Convolutional Sequence to Sequence Learning

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

1896

OpenAlex

Cite This

APA Style

                            
                                    Jonas Gehring, 
                                
                                    Michael Auli, 
                                
                                    David Grangier
                                
                                et al.
                            
                            (2017). 
                            Convolutional Sequence to Sequence Learning. 
                            arXiv (Cornell University)
                            
                            .
                            https://doi.org/10.48550/arxiv.1705.03122

Identifiers

DOI: 10.48550/arxiv.1705.03122