Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus

Peter D. Turney; Michael L. Littman

doi:10.48550/arxiv.cs/0212012

Abstract

The evaluative character of a word is called its semantic orientation. A positive semantic orientation implies desirability (e.g., "honest", "intrepid") and a negative semantic orientation implies undesirability (e.g., "disturbing", "superfluous"). This paper introduces a simple algorithm for unsupervised learning of semantic orientation from extremely large corpora. The method involves issuing queries to a Web search engine and using pointwise mutual information to analyse the results. The algorithm is empirically evaluated using a training corpus of approximately one hundred billion words -- the subset of the Web that is indexed by the chosen search engine. Tested with 3,596 words (1,614 positive and 1,982 negative), the algorithm attains an accuracy of 80%. The 3,596 test words include adjectives, adverbs, nouns, and verbs. The accuracy is comparable with the results achieved by Hatzivassiloglou and McKeown (1997), using a complex four-stage supervised learning algorithm that is restricted to determining the semantic orientation of adjectives.

Keywords

Pointwise mutual informationPointwiseNatural language processingArtificial intelligenceComputer scienceOrientation (vector space)Word (group theory)NounCharacter (mathematics)Simple (philosophy)Information retrievalMutual informationLinguisticsMathematics

Affiliated Institutions

National Research Council Canada CA

Related Publications

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews

Peter Peter , Turney

This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down). The classification of a review ...

2002 Meeting of the Association for Comput... 3653 citations

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews

Peter D. Turney

This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down). The classification of a review ...

2002 arXiv (Cornell University) 1580 citations

Learning Character-level Representations for Part-of-Speech Tagging

Cícero dos Santos , Bianca Zadrozny

Distributed word representations have recently been proven to be an invaluable resource for NLP. These representations are normally learned using neural networks and capture syn...

2014 555 citations

Word association norms, mutual information, and lexicography

Kenneth Church , Patrick Hanks

The term word association is used in a very particular sense in the psycholinguistic literature. (Generally speaking, subjects respond quicker than normal to the word nurse if i...

1990 Computational Linguistics 3665 citations

Learning Subjective Adjectives from Corpora

Janyce Wiebe

Subjectivity tagging is distinguishing sentences used to present opinions and evaluations from sentences used to objectively present factual information. There are numerous appl...

2000 519 citations

Publication Info

Year: 2002
Type: preprint
Citations: 361
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

361

OpenAlex

Cite This

APA Style

                            
                                    Peter D. Turney, 
                                
                                    Michael L. Littman
                                
                            (2002). 
                            Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus. 
                            arXiv (Cornell University)
                            
                            .
                            https://doi.org/10.48550/arxiv.cs/0212012

Identifiers

DOI: 10.48550/arxiv.cs/0212012