Giter Site home page Giter Site logo

restaurant-closure-prediction's Introduction

Restaurant Closure Prediction Engine (2019)

Final Project for Big Data Science, Spring 2019

Cole Smith

Undergraduate

Running

Python Version

This project was written in Python 3.7. It is recommended to set up the virtual environment with that version. If your system defaults to Python 2, an interpreter can be specified with the --python flag to virtualenv

Set up virtualenv

To set up the environment run:

virtualenv venv
source venv/bin/activate
pip install -r requirements.txt

Running Clustering

The clustering output can be viewed in doc/. It can also be generated by commenting out the code labelled as such in main.py

Running Predictions

The predictions can be ran directly by executing: python main.py

For clarity, the prediction output for the Regression is the total amount of restaurant closures (hard and soft, see below) for a given month, given a number of factors. Each row is a zip at a month in time.

The output for the Classification is of soft (see below) closures. This is done using the Restaurant Inspections Data Set. Each row is a restaurant in current-day.

Hard Closures vs Soft Closures

Since different datasets cannot reliably be joined, the closure information is broken out into Hard and Soft closures.

Hard Closures

Hard Closures are those in which a restaurant did not renew its DCA license and thus cannot legally operate in New York City. These are assumed to not be re-opened, since this closure was presumably voluntary.

Soft Closures

Soft Closures are those in which the health inspection results warrants a complete closure. This offer a richer set of supporting features since they originate from the Restaurant Inspection Dataset. However, there are generally far fewer soft closures than hard closures.

These closures are assumed to be involuntary, and restaurants may re-open upon a second inspection.

Merging Soft and Hard Closures

Since there is no given unique, universal identifier for a restaurant in these data sets, the only information that can be used to merge tables is the zip code and the date (Month and Year).

However, since it is assumed that Soft and Hard Closures are drawn from the same distribution (All restaurants must be inspected and must hold a DCA license), the master data set also includes information from the Restaurant Inspection Data Set aggregated to a monthly time-scale.

The total closures are therefore the sum between soft and hard closures for a given month and zip code.

restaurant-closure-prediction's People

Contributors

css459 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.