Abstract

The authors seek to train recurrent neural networks in order to map input sequences to output sequences, for applications in sequence recognition or production. Results are presented showing that learning long-term dependencies in such recurrent networks using gradient descent is a very difficult task. It is shown how this difficulty arises when robustly latching bits of information with certain attractors. The derivatives of the output at time t with respect to the unit activations at time zero tend rapidly to zero as t increases for most input values. In such a situation, simple gradient descent techniques appear inappropriate. The consideration of alternative optimization methods and architectures is suggested.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">&gt;</ETX>

Keywords

Computer scienceTerm (time)Recurrent neural networkGradient descentSequence (biology)Task (project management)Artificial intelligenceZero (linguistics)Artificial neural networkStochastic gradient descentSimple (philosophy)AttractorMathematicsEngineering

Affiliated Institutions

Related Publications

Long Short-Term Memory

Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We brief...

1997 Neural Computation 90535 citations

Publication Info

Year
2002
Type
article
Pages
1183-1188
Citations
232
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

232
OpenAlex

Cite This

Yoshua Bengio, Paolo Frasconi, P. Simard (2002). The problem of learning long-term dependencies in recurrent networks. IEEE International Conference on Neural Networks , 1183-1188. https://doi.org/10.1109/icnn.1993.298725

Identifiers

DOI
10.1109/icnn.1993.298725