The problem of learning long-term dependencies in recurrent networks

Abstract

The authors seek to train recurrent neural networks in order to map input sequences to output sequences, for applications in sequence recognition or production. Results are presented showing that learning long-term dependencies in such recurrent networks using gradient descent is a very difficult task. It is shown how this difficulty arises when robustly latching bits of information with certain attractors. The derivatives of the output at time t with respect to the unit activations at time zero tend rapidly to zero as t increases for most input values. In such a situation, simple gradient descent techniques appear inappropriate. The consideration of alternative optimization methods and architectures is suggested.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>

Keywords

Computer scienceTerm (time)Recurrent neural networkGradient descentSequence (biology)Task (project management)Artificial intelligenceZero (linguistics)Artificial neural networkStochastic gradient descentSimple (philosophy)AttractorMathematicsEngineering

Affiliated Institutions

Related Publications

A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures

Yong Yu , Xiaosheng Si , Changhua Hu +1 more

Recurrent neural networks (RNNs) have been widely adopted in research areas concerned with sequential data, such as text, audio, and video. However, RNNs consisting of sigma cel...

2019 Neural Computation 4793 citations

Convergence Results for Neural Networks via Electrodynamics

Djork-Arné Clevert , Thomas Unterthiner , Sepp Hochreiter

We study whether a depth two neural network can learn another depth two network using gradient descent. Assuming a linear output node, we show that the question of whether gradi...

2018 arXiv (Cornell University) 2912 citations

Global optimization of a neural network-hidden Markov model hybrid

Yoshua Bengio , Renato De Mori , Giovanni Flammia +1 more

An original method for integrating artificial neural networks (ANN) with hidden Markov models (HMM) is proposed. ANNs are suitable for performing phonetic classification, wherea...

2002 18 citations

Training Very Deep Networks

Rupesh K. Srivastava , Klaus Greff , Jürgen Schmidhuber

Theoretical and empirical evidence indicates that the depth of neural networks is crucial for their success. However, training becomes more difficult as depth increases, and tra...

2015 arXiv (Cornell University) 1100 citations

Long Short-Term Memory

Sepp Hochreiter , Jürgen Schmidhuber

Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We brief...

1997 Neural Computation 90535 citations

Publication Info

Year: 2002
Type: article
Pages: 1183-1188
Citations: 232
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

The problem of learning long-term dependencies in recurrent networks

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

232

OpenAlex

Cite This

APA Style

                            
                                    Yoshua Bengio, 
                                
                                    Paolo Frasconi, 
                                
                                    P. Simard
                                
                            (2002). 
                            The problem of learning long-term dependencies in recurrent networks. 
                            IEEE International Conference on Neural Networks
                            
                            , 1183-1188.
                            https://doi.org/10.1109/icnn.1993.298725

Identifiers

DOI: 10.1109/icnn.1993.298725