By: Ade Romadhony and Muhammad Ahsan Athallah
This post is based on our paper, Spam Detection on Indonesian Beauty Product Review, presented on ICOICT 2021, https://www.icoict.org
Nowadays, especially on this pandemic situation, people activity to perform online transaction increases more and more. There are some restrictions that made people difficult to visit the offline stores. E-commerce becomes one of the main solutions when the people need to buy everything, from groceries, clothes, to any product related to particular hobbies.
Figure 1 An example of shop profile on Tokopedia
When a customer wants to buy a product, there might be several sellers/shops sell the same product, and the customer needs to choose the right seller. Customers tend to choose seller/shop with excellent rating (above 4 in a 5-scale star rating system). Review and rating mechanism were provided on Tokopedia and the sellers were given a guidance on how to manage the rating. The shop/seller rating on Tokopedia were sourced from all review/rating given by the customers on each product sell by the seller.
What is spam review?
Although the positive rating might dominates the overall rating of a seller, some customers sometimes curious about the negative rating and as we already know, the “not good” review is amplified quickly and vastly to the public. When we hear 1 people say bad thing about a product, most other people pay much attention on that negative review, neglecting the other positive reviews. Therefore, the sellers always try their best to not getting negative review.
Since negative rating/review is very powerful to influence customer decision, unfortunately it also attracts some unanticipated actions on rating/review activity, with the intention is to decrease or increase the seller/shop reputation. There are multiple reasons why someone wants to do a spam review, for example some paid spammers write good or bad reviews of a product to direct the reader’s opinion about it . In some other cases, users intentionally leave different sentiments in reviews to drop or to leverage the average rating of a product.
According to Jindal and Liu , there are three types of spam review: 1) Reviews that are intended to mislead the reader, guiding the reader’s perception to the wrong side. 2) Reviews that discuss the external aspects of a product, including the brand campaign, how easy it is to get the product, and so on, instead of examining the product. 3) Reviews that do not contain any opinions; instead, they contain questions, tutorials, or something similar, but do not include opinions about the product.
Below are some examples of reviews with 1-star rating that can be considered as spam review:
Figure 2 Review that contains external aspect of the product (type 2)
Figure 3 Review that contains no opinion
Figure 4 Mislead review
Figure 2 shows an example of a review that contains opinion on the courier service, rather than an opinion to the product or the seller, while an example shown in Figure 3 contains no meaningful comment at all. An example of a review that might mislead the reader is shown on Figure 4, where the reviewer seems to make a negative impression on the seller on providing the wrong product, although actually the mentioned name is the product name.
Another study by Peng and Zhong  revealed that an inconsistency of review sentiments and contents could detect spam reviews. Figure 5 shows an example of review text with positive sentiment, but the user gave negative rating, a 1-star rating.
Figure 5 Inconsistent review sentiment and rating
How to detect spam review?
Considering the important effect of seller/shop rating, an automated system that can prevent spam review has been studied intensively. Generally, there are two approaches on detecting spam review: reviewer-centric and review-centric. Reviewer-centric approach focus on how to analyze user behavior using the data which could identify a user, while review-centric research focus on utilizing review-related data to perform spam detection.
Recently, we conducted a study on detecting spam review on a skincare and makeup reviews, collected from Female Daily review website, https://reviews.femaledaily.com. We used a supervised machine learning approach to identify the spam review.
Figure 6 Process pipeline to build the spam review detection model
We build a classification model to detect the spam review using the pipeline process shown on Figure 6. As we used review-centric approach, we define some features to be extracted from the review text. We define the features based on several prior works on spam review detection . The features can be categorized into three types:
- The sentiment features: number of positive words, number of negative words, and review text sentiment/polarity
- The product-related features: product average rating, rating difference/gap with the average rating on the same product, and rating standard deviation
- The review-centric features: review rating, whether the review text contains question or not, whether the review text contains link or not, number of words (word count), whether the review text contains product brand or not, number of brands mentioned, and whether the review text contains product name or not.
We analyzed the correlation of review-centric features on the spam review and found that rating and word count have the highest correlation on detecting spam review. As on product-related features correlation analysis, we found that rating difference and rating standard deviation have the highest correlation on detecting spam review. On the contrary to the two other features category that show high correlation to identify spam review, the analysis on sentiment feature category showed low correlation.
As for building the classification model, we used SVM algorithm. We conducted experiments using several feature settings: review-centric feature only, product-related feature only, sentiment feature only, a combination of review-centric and product related features, a combination of review-centric and sentiment features, a combination of product-related and sentiment features, and a combination of review-centric, product-related, and sentiment features. The experimental results show that a combination of review-centric and sentiment features gave the best classification accuracy, 81.77% accuracy.
 N. Jindal, B. Liu, and S. M. Street, “opinion-spam-and-analysis-WSDM-08.pdf,” Proc. First ACM Int. Conf. Web Search Data Min., 2008.
 Q. Peng and M. Zhong, “Detecting Spam Review through Sentiment Analysis,” J. Softw., vol. 9, no. 8, pp. 2065–2072, 2014, doi: 10.4304/jsw.9.8.2065-2072.
 M. Ott, Y. Choi, C. Cardie, and J. T. Hancock, “Finding deceptive opinion spam by any stretch of the imagination,” ACL-HLT 2011 – Proc. 49th Annu. Meet. Assoc. Comput. Linguist. Hum. Lang. Technol., vol. 1, pp. 309–319, 2011.
Thank you very much Bapak Arman Rizal and Bern Jonathan from Female Daily for providing the dataset and valuable feedback to our work.
Poster Paper Summary: https://lnkd.in/gkNFrNVx