Transformer-XL: Attentive Language Models beyond a Fixed-Length Context
Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. We propose a novel neural archit...