Giter Site home page Giter Site logo

amazon_vine_analysis's Introduction

Amazon_Vine_Analysis

Resources

  • Python 3.9.0,
  • Anaconda Navigator 1.9.12,
  • Jupyter notebook 6.0.3,
  • Google Colab notebooks,
  • Amazon Web Services -AWS-
  • PgAdmin 4
  • Pandas, Pyspark, postgres, AWS-RDS, AWS-S3

Overview

This project analyzes Amazon reviews and measures how many reviews are written by members of the paid Amazon Vine program. In order to complete this analysis, we used a random subset of data that retained Amazon books reviews details. then performed an ETL on this data by using PySpark, AWS, Google Collaboratory, PostgreSQL, and Pandas. In this specific example there were no paid reviews on this dataset, and the process to be sure about that was as follows:

Analysis

After grabbing the data from the S3 amazon bucket, we then created an RDS instance that we connected to Postgres and loaded the data into using Pyspark. our database instance looks like:

picture

After the ETL process, we used pandas to make perform our analysis, we wanted to count how many reviews we had and how many of those reviews were paid for by the Vine program and the percentage of 5* reviews for each category.

  • First, we load our Data into Pandas:

picture

  • after a little data cleaning to keep only the useful reviews, we then try to split the data to either vine reviews or non-vine reviews:

when we look for the paid reviews, Pandas returns an empty Dataset:

picture

the reviews that aren’t part of the vine program:

picture

Results

We use pandas to perform additional analysis on this subset:

picture

  • We can see that this Dataset is made of 403807 reviews.
  • the number of 5 star votes is : 242889.
  • there is no Vine program reviews in this dataset.

Last we calculate the percentage of 5 star reviews in the data.

picture

we can see that the 5* reviews make up 60.15% of this dataset, and that 0% of them are paid for.

Summary

  • For this specific example there was no Positive bias, because the data is made entirely of organic reviews.

  • for additional analysis, we could look at the verified_purchase and perform a similar analysis to what we did here to see what confirmed buyers think of the books compared to other reviewers.

amazon_vine_analysis's People

Contributors

mlachha avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.