Sequence Transduction with Recurrent Neural Networks

Abstract

Many machine learning tasks can be expressed as the transformation---or \emph{transduction}---of input sequences into output sequences: speech recognition, machine translation, protein secondary structure prediction and text-to-speech to name but a few. One of the key challenges in sequence transduction is learning to represent both the input and output sequences in a way that is invariant to sequential distortions such as shrinking, stretching and translating. Recurrent neural networks (RNNs) are a powerful sequence learning architecture that has proven capable of learning such representations. However RNNs traditionally require a pre-defined alignment between the input and output sequences to perform transduction. This is a severe limitation since \emph{finding} the alignment is the most difficult aspect of many sequence transduction problems. Indeed, even determining the length of the output sequence is often challenging. This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence. Experimental results for phoneme recognition are provided on the TIMIT speech corpus.

Keywords

Transduction (biophysics)TIMITSequence (biology)Recurrent neural networkSequence learningComputer scienceArtificial intelligenceSpeech recognitionArtificial neural networkHidden Markov modelBiologyGenetics

Affiliated Institutions

University of Toronto CA

Related Publications

A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures

Yong Yu , Xiaosheng Si , Changhua Hu +1 more

Recurrent neural networks (RNNs) have been widely adopted in research areas concerned with sequential data, such as text, audio, and video. However, RNNs consisting of sigma cel...

2019 Neural Computation 4793 citations

An application of recurrent nets to phone probability estimation

A.J. Robinson

This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context ...

1994 IEEE Transactions on Neural Networks 444 citations

Global optimization of a neural network-hidden Markov model hybrid

Yoshua Bengio , Renato De Mori , Giovanni Flammia +1 more

An original method for integrating artificial neural networks (ANN) with hidden Markov models (HMM) is proposed. ANNs are suitable for performing phonetic classification, wherea...

2002 18 citations

Sparse Multilayer Perceptron for Phoneme Recognition

G. S. V. S. Sivaram , Hynek Heřmanský

This paper introduces the sparse multilayer perceptron (SMLP) which jointly learns a sparse feature representation and nonlinear classifier boundaries to optimally discriminate ...

2011 IEEE Transactions on Audio Speech and... 65 citations

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

Geoffrey E. Hinton , Li Deng , Dong Yu +8 more

Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well ...

2012 IEEE Signal Processing Magazine 10065 citations

Publication Info

Year: 2012
Type: preprint
Citations: 1292
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Sequence Transduction with Recurrent Neural Networks

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

1292

OpenAlex

Cite This

APA Style

                            
                                    Alex Graves
                                
                            (2012). 
                            Sequence Transduction with Recurrent Neural Networks. 
                            arXiv (Cornell University)
                            
                            .
                            https://doi.org/10.48550/arxiv.1211.3711

Identifiers

DOI: 10.48550/arxiv.1211.3711