By: Ade Romadhony and Muhammad Ahsan Athallah

This post is based on our paper, Spam Detection on Indonesian Beauty Product Review, presented on ICOICT 2021,

Nowadays, especially on this pandemic situation, people activity to perform online transaction increases more and more. There are some restrictions that made people difficult to visit the offline stores. E-commerce becomes one of the main solutions when the people need to buy everything, from groceries, clothes, to any product related to particular hobbies.


Figure 1 An example of shop profile on Tokopedia

When a customer wants to buy a product, there might be several sellers/shops sell the same product, and the customer needs to choose the right seller. Customers tend to choose seller/shop with excellent rating (above 4 in a 5-scale star rating system)[1]. Review and rating mechanism were provided on Tokopedia and the sellers were given a guidance on how to manage the rating. The shop/seller rating on Tokopedia were sourced from all review/rating given by the customers on each product sell by the seller.

What is spam review?

Although the positive rating might dominates the overall rating of a seller, some customers sometimes curious about the negative rating and as we already know, the “not good” review is amplified quickly and vastly to the public. When we hear 1 people say bad thing about a product, most other people pay much attention on that negative review, neglecting the other positive reviews. Therefore, the sellers always try their best to not getting negative review.

Since negative rating/review is very powerful to influence customer decision, unfortunately it also attracts some unanticipated actions on rating/review activity, with the intention is to decrease or increase the seller/shop reputation. There are multiple reasons why someone wants to do a spam  review, for example  some  paid  spammers  write  good  or  bad reviews  of  a  product  to  direct  the  reader’s  opinion  about  it [1].  In  some  other  cases,  users  intentionally  leave different sentiments  in  reviews  to  drop  or  to  leverage  the  average rating  of  a  product.

According to Jindal and Liu [1], there are three types of spam review: 1) Reviews that are intended to mislead the reader, guiding the reader’s perception to the wrong side. 2) Reviews  that  discuss  the  external  aspects  of  a product, including the brand campaign, how easy it is to get the  product,  and  so  on,  instead  of  examining  the  product. 3) Reviews  that  do  not  contain  any  opinions;  instead, they contain questions, tutorials, or something similar, but do not include opinions about the product.  

Below are some examples of reviews with 1-star rating that can be considered as spam review:

example of type 2 spam review

Figure 2 Review that contains external aspect of the product (type 2)

spam review contain no opinion

Figure 3 Review that contains no opinion

mislead review

Figure 4 Mislead review

Figure 2 shows an example of a review that contains opinion on the courier service, rather than an opinion to the product or the seller, while an example shown in Figure 3 contains no meaningful comment at all. An example of a review that might mislead the reader is shown on Figure 4, where the reviewer seems to make a negative impression on the seller on providing the wrong product, although actually the mentioned name is the product name.

Another study by Peng and Zhong [2] revealed that  an inconsistency  of  review  sentiments  and  contents  could detect  spam  reviews. Figure 5 shows an example of review text with positive sentiment, but the user gave negative rating, a 1-star rating.

incosistent rating - review

Figure 5 Inconsistent review sentiment and rating

How to detect spam review?

Considering the important effect of seller/shop rating, an automated system that can prevent spam review has been studied intensively. Generally, there are two approaches on detecting spam review: reviewer-centric and review-centric. Reviewer-centric approach focus on how to analyze user behavior  using  the  data  which  could  identify  a  user, while review-centric  research  focus  on  utilizing  review-related data to perform spam detection.

Recently, we conducted a study on detecting spam review on a skincare and makeup reviews, collected from Female Daily review website, We used a supervised machine learning approach to identify the spam review.

modeling pipeline

Figure 6 Process pipeline to build the spam review detection model

We build a classification model to detect the spam review using the pipeline process shown on Figure 6. As we used review-centric approach, we define some features to be extracted from the review text. We define the features based on several prior works on spam review detection [1][2][3]. The features can be categorized into three types:

  1. The sentiment features: number of positive words, number of negative words, and review text sentiment/polarity
  2. The product-related features: product average rating, rating difference/gap with the average rating on the same product, and rating standard deviation
  3. The review-centric features: review rating, whether the review text contains question or not, whether the review text contains link or not, number of words (word count), whether the review text contains product brand or not, number of brands mentioned, and whether the review text contains product name or not.

We analyzed the correlation of review-centric features on the spam review and  found that rating and word count have the highest correlation on detecting spam review. As on product-related features correlation analysis, we found that rating difference and rating standard deviation have the highest correlation on detecting spam review. On the contrary to the two other features category that show high correlation to identify spam review, the analysis on sentiment feature category showed low correlation.

As for building the classification model, we used SVM algorithm. We conducted experiments using several feature settings: review-centric feature only, product-related feature only, sentiment feature only, a combination of review-centric and product related features, a combination of review-centric and sentiment features, a combination of product-related and sentiment features, and a combination of review-centric, product-related, and sentiment features.  The experimental results show that a combination of review-centric and sentiment features gave the best classification accuracy, 81.77% accuracy.


[1] N. Jindal, B. Liu, and S. M. Street, “opinion-spam-and-analysis-WSDM-08.pdf,”  Proc.  First  ACM  Int.  Conf.  Web  Search  Data Min., 2008.

[2] Q.  Peng  and  M.  Zhong, “Detecting  Spam  Review  through Sentiment  Analysis,”  J.  Softw.,  vol.  9,  no.  8,  pp.  2065–2072, 2014, doi: 10.4304/jsw.9.8.2065-2072.

[3] M.  Ott,  Y.  Choi,  C.  Cardie,  and  J.  T.  Hancock,  “Finding deceptive opinion spam by any stretch of the imagination,” ACL-HLT  2011  –  Proc.  49th  Annu.  Meet.  Assoc.  Comput.  Linguist. Hum. Lang. Technol., vol. 1, pp. 309–319, 2011.


Thank you very much Bapak Arman Rizal and Bern Jonathan from Female Daily for providing the dataset and valuable feedback to our work.

Related resource

Poster Paper Summary:


Leave a Reply


August 11th, 2021

Spam Review Detection on E-commerce

December 11th, 2015

Slide Tentang TextRunner Open IE System dan Penjelasan Mengenai Synonym Resolution

November 4th, 2015

Pengalaman Mengikuti Konferensi PACLING 2015

April 22nd, 2015

[Dari Mahasiswa] : Uniknya Kampusku, Telkom University

February 16th, 2015

Speech Recognizer Tool : HTK

February 16th, 2015

Hello world!