Amazon Review Analysis

The products analyzed below have been reviewed and rated by 881,668 Amazon customers, with an average (unweighted mean) rating of 4.3 out of 5. This histogram depicts the distribution of those reviews.

The products analyzed below have been reviewed and rated by 881,668 Amazon customers, with an average (unweighted mean) rating of 4.3 out of 5. This histogram depicts the distribution of those reviews.

Can you tell if a product's Amazon reviews are legitimate just by looking at the rating chart?

This analysis aims to answer that question by examining a universe of 500 products that are largely devoid of suspicious or unhelpful reviews. It's a small dataset, yet it's a highly curated one. 

Where the Source Data Comes From

I publish a product review website called Good, Cheap and Fast. The site features products that have been screened using a blend of data analysis, psychology and investigative journalism.

Products with an average rating of less than 3.9 out of 5 are excluded; and paid, sponsored, unverified and otherwise suspicious reviews are filtered out. Additionally, unhelpful review behaviors (from verified customers) are discounted. Here is a list of those behaviors:

  1. Off-Label Usage - Customers rate a portable jump starter 5-stars, even though they have only used the product to charge their smartphones, not to jump start a vehicle.

  2. Self Validation - Customers rate a carbon monoxide detector 5-stars because they feel a sense of relief and validation that their purchase will protect their families.

  3. Customer Service Uprating - A 1-star rating is later updated to 4- or 5-stars because the manufacturer offers the customer a replacement product (and suggests altering the review).

  4. Misunderstanding - A customer leaves a negative review because he or she didn't read the product description carefully and is consequently disappointed with the product.

  5. Ideology or Spite - A positive review is paired with a negative rating because the customer disagrees with the business practices of the manufacturer (e.g. It's a great product, but Widget Corp. is a POLLUTER!).

  6. Wrong Model - A review for one variation of a product is lumped in with reviews of another version of the product. (Hard drive failure rates can differ by 900% depending on the size of the drive.)

  7. Wrong Product - A product page is repurposed by a seller, thereby mixing the reviews of one product with a completely different one. E.g. A page about a protective phone case contains reviews about a wireless charger.

  8. Shipping Issues - Customers leave negative reviews because their packages arrived late or damaged in a way that reflects negatively on the shipping carrier, not the manufacturer.

  9. Joke Reviews - A customer uses his or her review as a platform for comedy. Sexual wellness products, or those that are gender-based, seem to be disproportionately affected.

  10. Empathy or Pity - A customer has a bad experience with a product, yet he or she leaves a positive rating (typically, 4-stars) because "someone" might like the product.

The products that survive these filters (around 15% of them) constitute the dataset analyzed below.

The Analysis Takes a Surprising Turn

To answer the question I initially posed, no, it is not possible to tell if a product's Amazon reviews are legitimate by looking at a single chart. Because too many charts look the same!

Combine the reviews of 10 or more products and the distribution of ratings will begin to converge on a common curve. Above: The source data is segmented by the number of reviews that the products had received.

Combine the reviews of 10 or more products and the distribution of ratings will begin to converge on a common curve. Above: The source data is segmented by the number of reviews that the products had received.

After dividing an already-small dataset into sub-groups, the distribution of reviews falls into the same natural pattern. Perhaps that isn't surprising: All of the products were vetted in a similar way. Nevertheless, this pattern also appears in an analysis of Amazon's bestsellers, sponsored products, the most analyzed products on Fakespot and a larger analysis of Amazon reviews from 1995-2013.

Insight Comes From the Fringes of the Data

Obviously fake reviews are easy to spot, but is there such a thing as obviously legit? Above: Fakespot eviscerates a product with near-perfect ratings, as well as is its manufacturer.

Obviously fake reviews are easy to spot, but is there such a thing as obviously legit? Above: Fakespot eviscerates a product with near-perfect ratings, as well as is its manufacturer.

Key Takeaways

A deeper analysis of this data provides useful shortcuts for spotting good (or great) products with legitimate reviews. Take the following tips with a grain of salt, especially for products with fewer than 100 reviews:

  • 5-star reviews should account for at least 45% of total reviews, but not more than 95%;

  • The combination of 4-star and 1-star reviews should amount to at least 5% of total reviews; and

  • When 3-star reviews outnumber 2-star reviews, or 2-star reviews outnumber 1-star reviews, it may imply that a product's acceptability is contingent upon comfort, user skill or a subjective factor, like personal taste. (In cases like these, it's a good idea to double-check the product's return policy).