Giter Site home page Giter Site logo

1.0-amazon-fine-food-reviews-analysis's Introduction

๐Ÿ“ˆ Predict rating given product reviews on Amazon

This project analyzes the Amazon Fine Food Reviews dataset, which consists of reviews of fine foods from Amazon. With 568,454 reviews from 256,059 users on 74,258 products, this dataset covers a timespan of 13 years, from Oct 1999 to Oct 2012.

๐Ÿ”Ž EDA: Take a look at the beautiful visualization of this dataset on this blog: https://nycdatascience.com/blog/student-works/amazon-fine-foods-visualization/

๐ŸŽฏ Objective: The goal of this project is to determine whether a review is positive or negative. A rating of 4 or 5 is considered positive, while a rating of 1 or 2 is considered negative. Reviews with a rating of 3 are ignored.

๐Ÿค” How to determine if a review is positive or negative? The Score/Rating of a review is used as a proxy way to determine the polarity of a review. However, it is important to note that this is an approximate way of determining the positivity or negativity of a review.

๐Ÿ’ป Data Source: You can find the dataset on Kaggle at: https://www.kaggle.com/snap/amazon-fine-food-reviews

๐Ÿ“Š Attributes:

  • Id
  • ProductId - unique identifier for the product
  • UserId - unique identifier for the user
  • ProfileName
  • HelpfulnessNumerator - number of users who found the review helpful
  • HelpfulnessDenominator - number of users who indicated whether they found the review helpful or not
  • Score - rating between 1 and 5
  • Time - timestamp for the review
  • Summary - brief summary of the review
  • Text - text of the review

๐Ÿ” Real world problem: Predict rating given product reviews on Amazon.

๐Ÿ“Š Steps:

1๏ธโƒฃ Dataset overview: Take a look at the Amazon Fine Food reviews dataset with EDA. ๐Ÿ“ˆ

2๏ธโƒฃ Data Cleaning: Remove duplicates from the dataset. ๐Ÿงน

3๏ธโƒฃ Why convert text to a vector? To perform machine learning algorithms, text data needs to be converted to a numerical form. ๐Ÿ”ข

4๏ธโƒฃ Bag of Words (BoW): A common method to convert text to a vector is BoW. ๐Ÿ›๏ธ

5๏ธโƒฃ Text Preprocessing: Text needs to be preprocessed before applying BoW. Steps include stemming, stop-word removal, tokenization, and lemmatization. ๐Ÿ“

6๏ธโƒฃ uni-gram, bi-gram, n-grams: N-grams are used to capture the context of words in the text. ๐Ÿ” 

7๏ธโƒฃ tf-idf (term frequency-inverse document frequency): Another method to convert text to a vector is tf-idf, which captures the importance of a word in a document. ๐Ÿ“ˆ๐Ÿ” 

8๏ธโƒฃ Why use the log in IDF? The log is used to reduce the effect of very high frequency words. ๐Ÿ“‰

9๏ธโƒฃ Word2Vec: Word2Vec is a neural network-based approach to convert words to vectors. ๐Ÿง 

๐Ÿ”Ÿ Avg-Word2Vec, tf-idf weighted Word2Vec: Two variants of Word2Vec are avg-Word2Vec and tf-idf weighted Word2Vec. ๐Ÿงฎ

1๏ธโƒฃ1๏ธโƒฃ Bag of Words(code sample) ๐Ÿ’ป

1๏ธโƒฃ2๏ธโƒฃ Text Preprocessing(code sample) ๐Ÿ’ป

1๏ธโƒฃ3๏ธโƒฃ Bi-Grams and n-grams(code sample) ๐Ÿ’ป

1๏ธโƒฃ4๏ธโƒฃ TF-IDF(code sample) ๐Ÿ’ป

1๏ธโƒฃ5๏ธโƒฃ Word2Vec(code sample) ๐Ÿ’ป

1๏ธโƒฃ6๏ธโƒฃ Avg-Word2Vec and TFIDF-Word2Vec(Code Sample) ๐Ÿ’ป

Thank you for checking out this project! ๐Ÿ™

Note: This Case-study/Project was covered in the Applied AI course.

Thank you for checking out this project! ๐Ÿ™

1.0-amazon-fine-food-reviews-analysis's People

Contributors

rohan7958 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.