Abstract

Online reviews are often the primary factor in a customer’s decision to purchase a product or service, and are a valuable source of information that can be used to determine public opinion on these products or services. Because of their impact, manufacturers and retailers are highly concerned with customer feedback and reviews. Reliance on online reviews gives rise to the potential concern that wrongdoers may create false reviews to artificially promote or devalue products and services. This practice is known as Opinion (Review) Spam, where spammers manipulate and poison reviews (i.e., making fake, untruthful, or deceptive reviews) for profit or gain. Since not all online reviews are truthful and trustworthy, it is important to develop techniques for detecting review spam. By extracting meaningful features from the text using Natural Language Processing (NLP), it is possible to conduct review spam detection using various machine learning techniques. Additionally, reviewer information, apart from the text itself, can be used to aid in this process. In this paper, we survey the prominent machine learning techniques that have been proposed to solve the problem of review spam detection and the performance of different approaches for classification and detection of review spam. The majority of current research has focused on supervised learning methods, which require labeled data, a scarcity when it comes to online review spam. Research on methods for Big Data are of interest, since there are millions of online reviews, with many more being generated daily. To date, we have not found any papers that study the effects of Big Data analytics for review spam detection. The primary goal of this paper is to provide a strong and comprehensive comparative study of current research on detecting review spam using various machine learning techniques and to devise methodology for conducting further investigation.

Keywords

Computer scienceSentiment analysisSpammingArtificial intelligenceScarcityProcess (computing)Machine learningSpambotTrustworthinessSystematic reviewData scienceThe InternetWorld Wide WebInternet privacy

Affiliated Institutions

Related Publications

Opinion spam and analysis

Evaluative texts on the Web have become a valuable source of opinions on products, services, events, individuals, etc. Recently, many researchers have studied such opinion sourc...

2008 1481 citations

Thumbs up?

We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data,...

2002 Proceedings of the ACL-02 conference ... 6965 citations

Rumor Cascades

Online social networks provide a rich substrate for rumor propagation. Information received via friends tends to be trusted, and online social networks allow individuals to tran...

2014 Proceedings of the International AAAI... 261 citations

Support vector machines

My first exposure to Support Vector Machines came this spring when heard Sue Dumais present impressive results on text categorization using this analysis technique. This issue's...

1998 IEEE Intelligent Systems and their Ap... 6431 citations

Publication Info

Year
2015
Type
article
Volume
2
Issue
1
Citations
467
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

467
OpenAlex

Cite This

Mike Crawford, Taghi M. Khoshgoftaar, Joseph D. Prusa et al. (2015). Survey of review spam detection using machine learning techniques. Journal Of Big Data , 2 (1) . https://doi.org/10.1186/s40537-015-0029-9

Identifiers

DOI
10.1186/s40537-015-0029-9