Abstract

We present a letter-based encoding for words in continuous space language models. We represent the words completely by letter n-grams instead of using the word index. This way, similar words will automatically have a similar representation. With this we hope to better generalize to unknown or rare words and to also capture morphological information. We show their influence in the task of machine translation using continuous space language models based on restricted Boltzmann machines. We evaluate the translation quality as well as the training time on a German-to-English translation task of TED and university lectures as well as on the news translation task translating from English to German. Using our new approach a gain in BLEU score by up to 0.4 points can be achieved.

Keywords

Computer scienceMachine translationNatural language processingEncoding (memory)Task (project management)Artificial intelligenceGermann-gramWord (group theory)Language modelTranslation (biology)Space (punctuation)Representation (politics)Speech recognitionLinguistics

Affiliated Institutions

Related Publications

Publication Info

Year
2013
Type
article
Pages
30-39
Citations
22
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

22
OpenAlex

Cite This

Henning Sperr, Jan Niehues, Alex Waibel (2013). Letter N-Gram-based Input Encoding for Continuous Space Language Models. , 30-39. https://doi.org/10.5445/ir/1000037718

Identifiers

DOI
10.5445/ir/1000037718