Abstract

Neural language representation models such as BERT pre-trained on large-scale corpora can well capture rich semantic patterns from plain text, and be fine-tuned to consistently improve the performance of various NLP tasks. However, the existing pre-trained language models rarely consider incorporating knowledge graphs (KGs), which can provide rich structured knowledge facts for better language understanding. We argue that informative entities in KGs can enhance language representation with external knowledge. In this paper, we utilize both large-scale textual corpora and KGs to train an enhanced language representation model (ERNIE), which can take full advantage of lexical, syntactic, and knowledge information simultaneously. The experimental results have demonstrated that ERNIE achieves significant improvements on various knowledge-driven tasks, and meanwhile is comparable with the state-of-the-art model BERT on other common NLP tasks. The code and datasets will be available in the future.

Keywords

Computer scienceNatural language processingRepresentation (politics)Artificial intelligenceLanguage modelCode (set theory)Language understandingKnowledge graphScale (ratio)Programming language

Affiliated Institutions

Related Publications

Publication Info

Year
2019
Type
preprint
Citations
1349
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1349
OpenAlex

Cite This

Zhengyan Zhang, Xu Han, Zhiyuan Liu et al. (2019). ERNIE: Enhanced Language Representation with Informative Entities. . https://doi.org/10.18653/v1/p19-1139

Identifiers

DOI
10.18653/v1/p19-1139