Seeing stars | RDL Research Database

Abstract

We address the rating-inference problem, wherein rather than simply decide whether a review is "thumbs up" or "thumbs down", as in previous sentiment analysis work, one must determine an author's evaluation with respect to a multi-point scale (e.g., one to five "stars"). This task represents an interesting twist on standard multi-class text categorization because there are several different degrees of similarity between class labels; for example, "three stars" is intuitively closer to "four stars" than to "one star".We first evaluate human performance at the task. Then, we apply a meta-algorithm, based on a metric labeling formulation of the problem, that alters a given n-ary classifier's output in an explicit attempt to ensure that similar items receive similar labels. We show that the meta-algorithm can provide significant improvements over both multi-class and regression versions of SVMs when we employ a novel similarity measure appropriate to the problem.

Keywords

Computer scienceArtificial intelligenceMetric (unit)Classifier (UML)InferenceSimilarity (geometry)Class (philosophy)StarsMachine learningTask (project management)CategorizationImage (mathematics)

Affiliated Institutions

Related Publications

Unsupervised Feature Learning via Non-parametric Instance Discrimination

Zhirong Wu , Yuanjun Xiong , Stella X. Yu +1 more

Neural net classifiers trained on data with annotated class labels can also capture apparent visual similarity among categories without being directed to do so. We study whether...

2018 3435 citations

Gaussian Process Priors with Uncertain Inputs Application to Multiple-Step Ahead Time Series Forecasting

Agathe Girard , Carl Edward Rasmussen , Joaquin Quiñonero Candela +1 more

We consider the problem of multi-step ahead prediction in time series analysis using the non-parametric Gaussian process model. k-step ahead forecasting of a discrete-time non-l...

2002 370 citations

Deciding on the Number of Classes in Latent Class Analysis and Growth Mixture Modeling: A Monte Carlo Simulation Study

Karen Nylund‐Gibson , Tihomir Asparouhov , Bengt Muthén

Abstract Mixture modeling is a widely applied data analysis technique used to identify unobserved heterogeneity in a population. Despite mixture models' usefulness in practice, ...

2007 Structural Equation Modeling A Multid... 10080 citations

Combining Possibly Related Estimation Problems

B. Efron , Carl N. Morris

Summary We have two sets of parameters we wish to estimate, and wonder whether the James-Stein estimator should be applied separately to the two sets or once to the combined pro...

1973 Journal of the Royal Statistical Soci... 189 citations

MCMC Methods for Multi-Response Generalized Linear Mixed Models: The<b>MCMCglmm</b><i>R</i>Package

Jarrod D. Hadfield

Generalized linear mixed models provide a flexible framework for modeling a range of data, although with non-Gaussian response variables the likelihood cannot be obtained in clo...

2010 Journal of Statistical Software 4603 citations

Publication Info

Year: 2005
Type: article
Pages: 115-124
Citations: 2121
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Seeing stars

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

2121

OpenAlex

Cite This

APA Style

                            
                                    Bo Pang, 
                                
                                    Lillian Lee
                                
                            (2005). 
                            Seeing stars. 
                            
                            , 115-124.
                            https://doi.org/10.3115/1219840.1219855

Identifiers

DOI: 10.3115/1219840.1219855