Giter Site home page Giter Site logo

gr8505 / big_data Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 256 KB

This is a Big Data project using AWS, pyspark-sql, pyspark and Google Collaboratory to determine if there is any bias in the reviews of vine and non-vine reviewers on Amazon.

aws-s3 pyspark-sql google-colaboratory pyspark

big_data's Introduction

Big Data Analysis: Amazon Vine vs Non-Vine Customers

Video Games Category

drawing


Executive Overview


Who are vine customers? Amazon Vine invites are the most trusted reviewers on Amazon who post opinions about new and pre-release items to help their fellow customers make informed purchase decisions. Customers become Vine Voices based on their reviewer rank. This program was created to provide customers with more honest and unbiased feedback from some of Amazon's most trusted reviewers.

drawing

Key Findings

  1. In the video games category, there is a total of just 4,290 vine customers compared to 1,781,596 non-vine customers.
  2. Non-vine customers recorded the highest number of Total Votes as well as Helpful Votes.
  3. Both vine and non-vine reviews received similar average ratings of 4.07 and 4.06, respectively.
  4. However, the number of Helpful Votes per customer is slightly higher for vine customers (2.35) compared to non-vine customers (2.26).
  5. Furthermore, non-vine customers received the lion's share of five star ratings (1,025,249) and the highest ratio of five star ratings per customer (0.58) compared to vine reviewers (1,607 and 0.37).
  6. Nevertheless, vine customers were less likely to provide ratings on the lower end of the spectrum. The ratio of vine customers that gave one star ratings were 0.01 compared to 0.11 for non-vine customers.

Check the following link to see full text file and Appendix with tables:

Resources


  • pyspark
  • pyspark-sql
  • Amazon Web Services (AWS)
  • Google Collaboratory

Data


The data was obtained from AWS S3 bucket file.

ETL and EDA Process


Preprocessing and Exploratory Data Analysis was performed on Google Collaboratory. The following link highlights all the steps to attain the following results.


© 2020 GitHub, Inc.

big_data's People

Contributors

gr8505 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.