Abstract

We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.

Keywords

Computer scienceNaive Bayes classifierSupport vector machineCategorizationArtificial intelligenceSentiment analysisMachine learningPrinciple of maximum entropyEntropy (arrow of time)Natural language processingStatistical classification

Affiliated Institutions

Related Publications

Publication Info

Year
2002
Type
article
Volume
10
Pages
79-86
Citations
6965
Access
Closed

Social Impact

Altmetric
PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

6965
OpenAlex
813
Influential
3781
CrossRef

Cite This

Bo Pang, Lillian Lee, Shivakumar Vaithyanathan (2002). Thumbs up?. Proceedings of the ACL-02 conference on Empirical methods in natural language processing - EMNLP '02 , 10 , 79-86. https://doi.org/10.3115/1118693.1118704

Identifiers

DOI
10.3115/1118693.1118704
arXiv
cs/0205070

Data Quality

Data completeness: 84%