Distributional Memory: A General Framework for Corpus-Based Semantics

Marco Baroni; Alessandro Lenci

doi:10.1162/coli_a_00016

Abstract

Research into corpus-based semantics has focused on the development of ad hoc models that treat single tasks, or sets of closely related tasks, as unrelated challenges to be tackled by extracting different kinds of distributional information from the corpus. As an alternative to this “one task, one model” approach, the Distributional Memory framework extracts distributional information once and for all from the corpus, in the form of a set of weighted word-link-word tuples arranged into a third-order tensor. Different matrices are then generated from the tensor, and their rows and columns constitute natural spaces to deal with different semantic problems. In this way, the same distributional information can be shared across tasks such as modeling word similarity judgments, discovering synonyms, concept categorization, predicting selectional preferences of verbs, solving analogy problems, classifying relations between word pairs, harvesting qualia structures with patterns or example pairs, predicting the typical properties of concepts, and classifying verbs into alternation classes. Extensive empirical testing in all these domains shows that a Distributional Memory implementation performs competitively against task-specific algorithms recently reported in the literature for the same tasks, and against our implementations of several state-of-the-art methods. The Distributional Memory approach is thus shown to be tenable despite the constraints imposed by its multi-purpose nature.

Keywords

Distributional semanticsComputer scienceNatural language processingArtificial intelligenceWord (group theory)Semantics (computer science)TupleCategorizationSimilarity (geometry)Task (project management)Tensor (intrinsic definition)Semantic similarityLinguisticsMathematics

Affiliated Institutions

Related Publications

Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics

Angeliki Lazaridou , Marco Marelli , Roberto Zamparelli +1 more

Speakers of a language can construct an unlimited number of new words through morphological derivation. This is a major cause of data sparseness for corpus-based approaches to l...

2013 109 citations

Composition in Distributional Models of Semantics

Jeff Mitchell , Mirella Lapata

Abstract Vector‐based models of word meaning have become increasingly popular in cognitive science. The appeal of these models lies in their ability to represent meaning simply ...

2010 Cognitive Science 967 citations

SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity

Eneko Agirre , Daniel Cer , Mona Diab +1 more

Semantic Textual Similarity (STS) measures the degree of semantic equivalence between two texts. This paper presents the results of the STS pilot task in Semeval. The training d...

2012 679 citations

Word Space

Hinrich Schütze

Representations for semantic information about words are necessary for many applications of neural networks in natural language processing. This paper describes an efficient, co...

1992 Neural Information Processing Systems 212 citations

Glove: Global Vectors for Word Representation

Jeffrey Pennington , Richard Socher , Christopher D. Manning

Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the o...

2014 32840 citations

Publication Info

Year: 2010
Type: article
Volume: 36
Issue: 4
Pages: 673-721
Citations: 652
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Distributional Memory: A General Framework for Corpus-Based Semantics

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

652

OpenAlex

Cite This

APA Style

                            
                                    Marco Baroni, 
                                
                                    Alessandro Lenci
                                
                            (2010). 
                            Distributional Memory: A General Framework for Corpus-Based Semantics. 
                            Computational Linguistics
                            , 36
                            (4)
                            , 673-721.
                            https://doi.org/10.1162/coli_a_00016

Identifiers

DOI: 10.1162/coli_a_00016