Giter Site home page Giter Site logo

chicago / food-inspections-evaluation Goto Github PK

View Code? Open in Web Editor NEW
406.0 42.0 130.0 325.59 MB

This repository contains the code to generate predictions of critical violations at food establishments in Chicago. It also contains the results of an evaluation of the effectiveness of those predictions.

Home Page: http://chicago.github.io/food-inspections-evaluation/

License: Other

R 3.88% HTML 95.85% TeX 0.28%
chicago data-science open-data open-science food-poisoning cdph public-health

food-inspections-evaluation's Issues

Reproducible documentation that summarizes evaluation findings

Should summarize the findings in a reproducible document (e.g., knitr) that summarizes the findings.

Let's use checkboxes to flag items that should be in the final document. Here are some of my items (revised on 2/3):

  • Reference model source in knitr document
  • Update percentage difference found in the first period between BAU and DDM
  • table of variables [2]
  • additional narrative on how we masked food inspectors from public model [3.1]
  • output of regression results (significant variables) [3.1]
  • "Graph of "days sooner that a critical violation would have been discovered" [4.1]
  • "Critical violations found on a daily basis as a percent of total inspections performed" [4.1]
  • "Bar plot of finding violations in in the first period by group (BAU v model)" [4.1]
  • "Cumulative Critical Violations Found BAU Versus Model" [4.1]
  • "Cumulative difference between violations found (BAU and model) and best-case-scenario" [4.1]

Supersedes issue #5

Incorporate data.table and new project structure

Refactor code to

  • Incorporate data.table throughout project for more efficient data management and handling
  • Incorporate convention of putting functions in separate files
  • Incorporate convention of creating a fresh initialization for each script
  • Incorporate convention of ordering scripts according to process
  • Move process scripts into functions
  • Match process output of new process to "DATA/recreated_training_data_20141103v02.Rdata"

[Bug] Fix factor handling

We need to track the factor levels and make sure that they are consistent across different data partitions, and with out of sample data. Right now this isn't a bug, but it will be in production.

I actually ran into a problem last week that I thought was data.table, but it was actually just general "factor handling": Rdatatable/data.table#967

Here's a general example:
I run into these problems a lot, especially when predicting on out of sample data. For example, if a model had "AZ" "AR" "CA" "CO" in a column called "US_STATE", but the prediction data only has "CA" in "US_STATE", then CA will be coded to a factor of 1 if you don't also set the levels. In the model the factor level of 1 gets treated as AZ!

Rewrite functions to generate data

Right now, several R files several other R files which download and organize data. The dates are hardcoded, which is a bit difficult to generate data over different dates. It would be more useful to change this into a function which uses user-defined parameters for the data range. For instance:

foodInspect <- liveReadInFoodInspections(start.date="%Y-%m-%d", end.date="%Y-%m-%d")

Several functions need to be written so we pass a date parameter to retrieve data.

  • recreate_training_data.R
  • create_out-of-sample_data.R
  • food-inspections-evaluation.R
  • addWeather.R

This supercedes #3

Fonts on different computers on gh-pages

It doesn't appear the League Spartan font is appearing appropriately on all computers. Need to further test and fix this.

(I think it's just a problem on how the font is being referenced in the CSS).

[Request] Break out logic in 12_Merge step

The calculated variables should be put into separate functions or scripts, and the merge should just merge in the output form a function or the precalculated information based on the script.

Basis for data: Food inspections

Likely candidates for new scripts:

  • Food inspection history
  • Associated business characteristics
  • Weather data
  • Heat data variables
  • Inspector information

Change liveReadInFoodInspections.R to a function

Right now, the code references the source code, which conducts a number of operations. It would be more useful to change this into a function which uses user-defined parameters for the data range. For instance:

foodInspect <- liveReadInFoodInspections(start.date="%Y-%m-%d", end.date="%Y-%m-%d")

Banner image does not display on iOS devices

Confirmed on iPad and iPhone. Appears to be a problem with the type of effect I'm using for parallax. Some documentation points out the use of jQuery is better, so will explore those options.

Test dev branch before merge to master

Code should be ran from beginning to end to ensure it can run with no errors. The following components should be tested:

  • Generate training data
  • Generate out-of-sample data
  • Execute GLMnet code

Proposed edits to technical document

There are several changes that I would like to propose for the document, and updates, so creating a new issue for that work (this branch will be a descendant of issue 47).

  • Reword the summary to focus on "days sooner" rather than percent improvement.
  • Add mentions of the model and software to the introduction
  • Add subsections
  • Fix typos
  • Update formulas
  • Experiment with citation representation
  • Add citation for mass package
  • Describe the KDE method for heat variables

Correct type-o's in gh-pages

Fix the following issues:

  • Remove references to facility type and risk variables
  • Update bar chart with revised percentages

Create CONTRIBUTING document

Create a CONTRIBUTING.md document that describes the minimum contributing guidelines for contributors. Some thoughts on potential content:

  • Outline the GitHub issues is the best way to submit an issue.
  • Perhaps outline parameters we seek if someone is attempting to improve the model. Such as describing how to show their model is effective.

Any other thoughts?

gh-pages nav bar taller than defined

Unsure why the navbar is appearing taller than defined in the CSS. See below showing the difference between what is defined in the CSS and what is in the browser.
screen shot 2015-02-08 at 2 56 52 pm

Clean up facility type

Use this code to convert facility type to something that is more like the original (and is more manageable than the original)

dat[,.N,Facility_Type][order(-N)]
dat[,.N,Facility_Type][order(-N), Facility_Type]

dat[grep("restaurant", Facility_Type, ignore.case=T), Facility_Type_Clean:="Restaurant"]
dat[grep("grocery", Facility_Type, ignore.case=T), Facility_Type_Clean:="Grocery Store"]
dat[which(is.na(Facility_Type_Clean)), Facility_Type_Clean:="Other"]
dat[,.N,Facility_Type_Clean][order(-N)]

Remove unnecessary terminal outputs from 12_Merge.R

Fix sloppy weather code in 12 merge

The weather data is being read in at the start of the file, but then overwritten later on with a combination of weather and weather_new.

Although there is not a problem with the functionality, this is bad practice and should be fixed. The right thing would be to keep the objects separate (e.g. use weather_old and weather_new then combine them into weather), or at least combine them right after reading them in.

Location of technical article?

This is annoying late in the game, but proposing a conversation on where to place the article in the repo. Right now it's placed in the root. However, there are a number of files/artifacts for this article (as opposed to REPORTS).

.
├── (other files)
├── REPORTS/
├── article
|   ├── forecasting-restaurants-with-critical-violations-in-Chicago.Rmd (Kntir doc)
|   ├── forecasting-restaurants-with-critical-violations-in-Chicago.pdf (PDF)
|   ├── forecasting-restaurants-with-critical-violations-in-Chicago.doc (Word)
|   ├── forecasting-restaurants-with-critical-violations-in-Chicago.html (Webpage)
|   ├── forecasting-restaurants-with-critical-violations-in-Chicago.bib (BibTeX of references)
|   └──  forecasting-restaurants-with-critical-violations-in-Chicago-citation.bib (PROPOSED: BibTeX of citation for what we write)

As you can see, I am pointing-out the inclusion of BibTeX file needed to generate the article's bibliography. I'm also proposing a BibTeX file that shows how to cite the article we have prepared.

Thoughts?

Fix penalty in glmnet model

Currently the penalty in the model is ifelse(grepl("^Inspector.Assigned", colnames(mm)), 1, 0) however it should be ifelse(grepl("^Inspector.Assigned", colnames(mm)), 1, 0) because the Inspector.Assigned variable name was changed some time ago before entering the model matrix.

Cannot execute sqldf code on server

The recreate_training_data.R file uses sqldf to perform some SQL operations (joining and filtering). Since we're using R-3.0.2, the latest version of SQLDF was incompatible (requires R >= 3.1). I included scripts to manually install it which was included in 5b52d79

However, appears we have some driver issues, which may be related to the nature of the install. Specifically, receive this error message:

Error in validObject(.Object) : 
  invalid class “SQLiteDriver” object: invalid object for slot "Id" in class "SQLiteDriver": got class "integer", should be or extend class "externalptr"
Error in !dbPreExists : invalid argument type

I'll be switching to a Mac to do the intermediate work. If we cannot get sqldf() to work, we can use the merge() function.

Model diagnostics

Develop ability to test new models, and see diagnostic results in a GUI (shiny)

Create graphs summarizing findings

A few graphs that should be created:

  • % of critical violations found vs % of inspections conducted (for both the actual and simulated results)
  • Number of critical violations by month for actual and simulated results

Convert Latitude to numeric in the import step, rather than in the merge step

These lines belong in the import step:

## (THIS SHOULD HAPPEN IN THE 10 IMPORT STEP)
foodInspect[ , Latitude := as.numeric(Latitude)]
## FIX SANITATION LATITUDE
## (THIS SHOULD HAPPEN IN THE 10 IMPORT STEP)
sanitationComplaints[ , Latitude := as.numeric(Latitude)]
## FIX GARBAGE CART LATITUDE
## (THIS SHOULD HAPPEN IN THE 10 IMPORT STEP)
garbageCarts[ , Latitude := as.numeric(Latitude)]

Update README file

Need to update this to include the relevant files, license (MIT), and execution instructions.

Mask food inspector identity

Find a way to represent the food inspector information without being individually identifiable, and without dramatically impacting model quality.

Update README.md

Check that the README.md reflects the most recent steps, and is complete with the current process.

Streamline and package "data update" step

Right now this process assumes a manually created single point in time download.

We need a way to refresh / add new data. This should be accomplished by adding function(s) that encompass the entire data step.

It makes sense to tackle some other issues in this step:

  • Fix Weather import process (issue 32)
  • Convert latitude to numeric during import (issue 31)
  • Add "Cleaned" facility type to data (issue 30)
  • Merge logic (issue 22) This issue is almost a repeat of the current issue (38)
  • Remove old data files and put into top level DATA folder (new issue)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.