Contextual String Embeddings for Sequence Labeling

Alan Akbik; Duncan A. J. Blythe; Roland Vollgraf

Abstract

Recent advances in language modeling using recurrent neural networks have made it viable to model language as distributions over characters. By learning to predict the next character on the basis of previous characters, such models have been shown to automatically internalize linguistic concepts such as words, sentences, subclauses and even sentiment. In this paper, we propose to leverage the internal states of a trained character language model to produce a novel type of word embedding which we refer to as contextual string embeddings. Our proposed embeddings have the distinct properties that they (a) are trained without any explicit notion of words and thus fundamentally model words as sequences of characters, and (b) are contextualized by their surrounding text, meaning that the same word will have different embeddings depending on its contextual use. We conduct a comparative evaluation against previous embeddings and find that our embeddings are highly useful for downstream tasks: across four classic sequence labeling tasks we consistently outperform the previous state-of-the-art. In particular, we significantly outperform previous work on English and German named entity recognition (NER), allowing us to report new state-of-the-art F1-scores on the CoNLL03 shared task. We release all code and pre-trained language models in a simple-to-use framework to the research community, to enable reproduction of these experiments and application of our proposed embeddings to other tasks: https://github.com/zalandoresearch/flair

Keywords

Computer scienceNatural language processingLanguage modelLeverage (statistics)String (physics)Artificial intelligenceEmbeddingSequence labelingCharacter (mathematics)GermanNamed-entity recognitionTask (project management)Word (group theory)Word embeddingLinguistics

Affiliated Institutions

Related Publications

Universal Sentence Encoder

Daniel Cer , Yinfei Yang , Sheng-yi Kong +10 more

We present models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP tasks. The models are efficient and result in accurate pe...

2018 arXiv (Cornell University) 1289 citations

Enriching Word Vectors with Subword Information

Piotr Bojanowski , Édouard Grave , Armand Joulin +1 more

Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models that learn such representations ignore ...

2017 Transactions of the Association for C... 9444 citations

Rico Sennrich , Barry Haddow , Alexandra Birch

Neural machine translation (NMT) models typically operate with a fixed vocabulary, but translation is an open-vocabulary problem.Previous work addresses the translation of out-o...

Edinburgh Research Explorer (Universi... 6994 citations

Better Word Representations with Recursive Neural Networks for Morphology

Thang Luong , Richard Socher , Christopher D. Manning

Vector-space word representations have been very successful in recent years at improving performance across a variety of NLP tasks. However, common to most existing work, words ...

2013 Conference on Computational Natural L... 810 citations

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

Jinhyuk Lee , Wonjin Yoon , Sungdong Kim +4 more

Abstract Motivation Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processi...

2019 Bioinformatics 6148 citations

Publication Info

Year: 2018
Type: article
Pages: 1638-1649
Citations: 1003
Access: Closed

External Links

Citation Metrics

1003

OpenAlex

Cite This

APA Style

                            
                                    Alan Akbik, 
                                
                                    Duncan A. J. Blythe, 
                                
                                    Roland Vollgraf
                                
                            (2018). 
                            Contextual String Embeddings for Sequence Labeling. 
                            International Conference on Computational Linguistics
                            
                            , 1638-1649.