Giter Site home page Giter Site logo

yelp-scraping's Introduction

yelp_scraping

Phase 1

We will start by fetching the data of the restaurants for LA County and comparing it with the results from yelp educational datasets.

Important Git repo with Yelp dataset examples and conversion: https://github.com/Yelp/dataset-examples

To access the API of yelp, I have registered an app on the yelp developer support website. Accordingly, I have received an API key with a certain usage limit.

AppName on yelp

Yelp-Scraping-USC-2022

Api Usage

Api Usage

Coverage of the current boundary for LA

Api Usage

Instalation

Step 1

pip install -r requirements.txt

Step 2

python main.py --businesses --reviews

Phase 2

Test Image

There are three parts to the script (or three scripts) -

  1. get_business.py - fetches the businesses in the specified area, using either the latitude and longitude boundary box of the city of the zip codes of the city (stored in file here )

  2. get_reviews.py fetches the reviews, user who left the review and the owner's response to the hotels gathered from the previous step.

  3. get_image_backlog.py

Phase 3

  1. Rerunning the get_reviews script to update all the reviews. We will use the attribute flag in yelp_business table for this. Null - Reviews not updated 1 - Reviews updated

    We also had to change the sorting order of reviews in the constants file

  2. Fixing all the missing/invalid data. Please find fixes below.

Fixes

  1. Reviews not populated after the html encodings were removed. The check_in flag in yelp_reviews table was set to 2 for all these new reviews.

Function populate_missed_reviews is used to populate these reviews.

python get_reviews.py --missed_reviews

  1. During the initial run of the script, the owner's response to the reviews were not tabulated in the DB as a separate entity. All these entries have check_in flag set as 1.

python get_reviews.py --owner_response

  1. All new reviews collected in the next iteration have been marked with flag 3. We will use this to fetch their photos and update the flag to 4.

yelp-scraping's People

Contributors

ksinghparth avatar

Watchers

James Cloos avatar Davide Proserpio avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.