Giter Site home page Giter Site logo

amazon-review-scraper's Introduction

scraper

An Amazon review scraper built in python

Installation

Prerequisites

Create your virtual environment by running pipenv shell within the repo.

Install requirements

pip install -r requirements

Execute program

python amazon_reviews.py

Output

A data.json file will be created once the script is executed. The file is in JSON format and has the following structure:

[
  {
    "reviews": [
      "review_header": "",
      "review_text": "",
      "review_comment_count": "",
      "review_posted_date" "",
      "review_rating": "",
      "review_author" ""
    ],
    "ratings": {
      "2 star": "4%",
      "1 star": "10%",
      "4 star": "17%",
      "3 star": "5%",
      "5 star": "64%"
    },
    "price": "",
    "url": "",
    "name": ""
  },
  ...
]

Example of json file with scraped data:

[
  {
    "reviews": [
      "review_header": "Best phone I have owned ever",
      "review_text": "I love this iPhone X",
      "review_comment_count": "",
      "review_posted_date" "19 Jul 2017",
      "review_rating": "5.0",
      "review_author" "Mart"
    ],
    "ratings": {
      "2 star": "4%",
      "1 star": "10%",
      "4 star": "17%",
      "3 star": "5%",
      "5 star": "64%"
    },
    "price": "",
    "url": "http://www.amazon.com/dp/B01ETPUQ6E",
    "name": "iPhone X - No Contract Phone - White - (AT&T)(Carrier locked phone)"
  },
  ...
]

How to find the ASIN product number

A url from Amazon might look something like below:

https://www.amazon.com/Apple-iPhone-Silver-Certified-Refurbished/dp/B07D6TQP6F/ref=

Where we want to grab the number after dp/ as that's the product ASIN. In the case above it would be B07D6TQP6F,

amazon-review-scraper's People

Stargazers

 avatar

Watchers

James Cloos avatar Martin Søndergaard avatar Zach Hudson avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.