Giter Site home page Giter Site logo

fbi-api's Introduction

Florida Business Inspections (FBI) API Build Status

Get started through our wiki here: https://github.com/Code-for-Miami/fbi-api/wiki

What is this?

This is a RESTful API that scrapes restaurant inspections in Miami-Dade County. The goal is to provide an easier way for developers to use local restaurant inspection data for their own personal projects.

You can check the original requirement here: https://github.com/Code-for-Miami/tasks/issues/50

Data Source

We are using CSV data from the State of Florida: http://www.myfloridalicense.com/dbpr/sto/file_download/public-records-food-service.html

Technical Background

This is powered by Clojure, using MYSQL database. Installation instructions here: https://github.com/Code-for-Miami/fbi-api/wiki/How-to-Install%3F

You can check the demo app here: http://138.197.90.94/

License

The code for this repository has been released into the public domain by Code for Miami via the MIT License.

Contributors

This project was kickstarted by Leo Ribeiro and is now maintained by Joel Quiles. Take a look at all contributors here.

Using Docker:

Docker has been set up to easily create a development environment inside a container. We are still not creating a production environment, although that is certainly possible.

Using script

Just run start-docker.sh

which will use the docker compose yaml file to setup all necessar services

Or, doing things manually...

Building an image

sudo docker build -t=fbiapidev .

This will download ubuntu image, if necessary, then build our custom image.

Start a container just for dev server

sudo docker run -d -v .:/code -p 8080:8080 fbiapidev lein run

Check running containers

docker ps -a to display both running and stopped containers

Connect to any running container to inspect services, filesystem, etc

sudo docker exec -it <container name or id> /bin/bash

Cheatsheet

run -d = detach instead of following output run -v = link local volume to docker one run -p = Publish a container's port(s) to the host

fbi-api's People

Contributors

ccjoel avatar cristinasolana avatar ernieatlyd avatar leordev avatar marcoslhc avatar yamilethmedina avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fbi-api's Issues

Add Travis CI to continually build PRs and master on each push

Requires admin permissions on this repo (to activate on travis website). Free for open source projects.

As an example of how useful this would be, I pushed a change to build the jar file in a different way, but broke lein repl by using _ instead of - in the main class name. Travis would have caught this. Instead, it went like that for 3 days until I noticed that I coudn't start a repl.

The travis job would run lein iberjar (maybe on multiple java environments), lein test and lein repl (if we can).

Move `start-date` and `end-date` from verbs to arguments

Right now start-date and end-date are used as arguments:

/district/:start-date/:end-date/:id - Select by District
/county/:start-date/:end-date/:id - Select by County Number
/name/:start-date/:end-date/name - Select by Business Name
/location/:start-date/:end-date/:zips - Select by Location Zip Codes
/name/:start-date/:end-date/:name/:zips - Select by Business Name and Location Zip Codes

Consider have key value pairs - and then having defaults if they're not included?

/district/:id/?start-date=:start-date&end_date=:end-date - Select by District
/county/:id/?start-date=:start-date&end_date=:end-date - Select by County Number
/name/:name?start-date=:start-date&end_date=:end-date - Select by Business Name
/location/:zips/?start-date=:start-date&end_date=:end-date - Select by Location Zip Codes
/name/:zips/?start-date=:start-date&end_date=:end-date - Select by Business Name and Location Zip Codes

Add created-on and modified-on dates to inspection objects

Upon running cron jobs and scraping the csv files for new data. If a new object, insert created-on. If modifying or updating, update modified-on. This is good for multiple purposes- including the fact that we could check how regularly entities are updated and if they are updating successfully.

Add compojure middlewhere to ignore trailing slashes in routes

This should be simple. I tried out some recommendations from compojure and stackoverflow and it didnt work. With inspections as an example, the /inspections endpoint works, but /inspections/ yields 404. We could fix this with a middleware (that still allows for /inspections/:id from example of course).

Add filters to other endpoints

Add filters to businesses and counties

it could be like
http://45.55.191.140/businesses?zipCodes=32615

Also the ability to query just one business: http://45.55.191.140/businesses/:id or http://45.55.191.140/businesses/:slugged_name and just one county: http://45.55.191.140/county/:id

This will make the queries easier and faster in the front end.

Who is paying for http://45.55.191.140 nowadays?

Hi @leordev - are you still paying for the Digital Ocean droplet? We'd love to still have you at the Brigade meetings of course but if you can't make it we understand; if that's the case, would you want us to start paying for these going forward?

Allow cron job to crawl hotel inspections as well

This would change the Restaurant Inspections API to the Health Inspections API.
If the model (db columns) are the same, the first step could be as simple as adding more csv file targets to the cron task (url for hotel csv data).

I would create a separate cron job for hotels, just so that the restaurant data crawling can finish and then we can scan the hotel data.

Also, it would be great to rename the project (Health Inspections API? something else?) and rename the namespaces as well (to something shorter, even.. right now it's restaurant-inspections-api which is super long) when this change lands.

And last, one of the database tables is called "restaurants", whereas we expose the concept of "businesses". After we figure out if this information is actually info on the "license" or "business" we should rename the restaurant model to business or license.

See http://www.myfloridalicense.com/dbpr/sto/file_download/public-records-lodging.html

Add total count (number of objects) that the api holds of the requested type to meta, add resource types, separate id from attributes

Add count to metadata returned on json.
maybe add id, type to data obj, then add attributes which will hold most of the attributes we put under the objects in data today.

Also, consider adding a "type" field for each resource type returned, and separate id from rest of attributes.

Example:

GET /inspections

{
   meta: {
       parameters: { ... },
       count: 49643             // total inspection resources available
   } ,
   data: [
      {
         type: inspection,
         id: 4535656,
         attributes: {
             business_name: "Mc Happiness",
             license_numer: ...,
             inspection_class: ...,
             ...
         }
      }
   ]
}

Use swagger-ui library instead of compojure-api subset to generate swagger docs

I'm thinking this is low priority right now. We're not actually using compojure-api to its full potential. right now its kinda bloatware since we import a whole lip to generate swagger docs, while compojure-api includes everything you need to make a restful API.

We didnt know of this library and we're working with compojure by itself, with liberator for resources and http statuses. I do like liberator because it makes a lot of decisions for us, so thats fine.

When we tried to use swagger-ui it kept crashing for dependencies. So we went with compojure-api to generate these.

Gracefully handle server errors that would result in a 500 and stacktrace

Currently, in the event mysql is down, or any exception is thrown somewhere in code, the server responds to the client with an ugly text with the direct error message from the source.

We should add a handler at the server and return a 500, and maybe a json or some better message (or none?) instead of this raw message.

Hide user/password info in start

If I do

export DATABASE_URL="jdbc:mysql://localhost:3306/codeformia_restaurant_inspections?user=myuser&password=mypassword"

executing ./start.sh will log:

ago 14, 2016 6:49:53 P.M. clojure.tools.logging$eval1$fn__5 invoke
INFORMACIÓN: Environment variable DATABASE_URL detected:  jdbc:mysql://localhost:3306/codeformia_restaurant_inspections?user=myuser&password=mypassword

Giving sensitive information about the database. Not a concern using local but definitely a concern with CI services that shows logs.

Strange Duplicated Inspections

In the data specs and layout it says that the field INSPECTION_VISIT_ID is unique but I found 3 duplicated data, which has different dates for them, and one of them has a different violations result. Well it's nothing compared to a 500k+ database. But I'm just keeping track of it here.

Data image preview:
image

Raw data:

Fields order:
district county_number county_name license_type_code license_number business_name location_address location_city location_zipcode inspection_number visit_number inspection_class inspection_type inspection_disposition inspection_date critical_violations_before_2013 noncritical_violations_before_2013 total_violations high_priority_violations intermediate_violations basic_violations pda_status violation_01 violation_02 violation_03 violation_04 violation_05 violation_06 violation_07 violation_08 violation_09 violation_10 violation_11 violation_12 violation_13 violation_14 violation_15 violation_16 violation_17 violation_18 violation_19 violation_20 violation_21 violation_22 violation_23 violation_24 violation_25 violation_26 violation_27 violation_28 violation_29 violation_30 violation_31 violation_32 violation_33 violation_34 violation_35 violation_36 violation_37 violation_38 violation_39 violation_40 violation_41 violation_42 violation_43 violation_44 violation_45 violation_46 violation_47 violation_48 violation_49 violation_50 violation_51 violation_52 violation_53 violation_54 violation_55 violation_56 violation_57 violation_58 license_id inspection_visit_id

'D2','60','Palm Beach','2010','6012903','CRAZY BUFFET','2030 PALM BEACH LAKES BLVD','WEST PALM BEACH','33409','2261347','2','Food','Complaint Full','Call Back - Admin. complaint recommended','2013-03-12',NULL,NULL,'20','3','7','10','1','0','2','3','0','0','0','0','0','0','0','0','1','0','0','0','0','0','0','0','0','0','2','0','0','0','0','0','0','2','0','3','2','1','0','0','3','0','1','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','5458444','4853712'

'D2','60','Palm Beach','2010','6012903','CRAZY BUFFET','2030 PALM BEACH LAKES BLVD','WEST PALM BEACH','33409','2261347','2','Food','Complaint Full','Admin. Complaint Callback Not Complied','2014-03-11',NULL,NULL,'28','6','8','14','1','0','2','3','0','0','0','0','2','0','3','0','1','0','0','0','0','0','0','0','0','0','4','0','1','0','0','0','0','2','0','3','2','1','0','0','3','0','1','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','5458444','4853712'

'D2','16','Broward','2010','1601657','BONAVENTURE COUNTRY CLUB','200 BONAVENTURE BLVD','WESTON','33326','2264876','2','Food','Complaint Full','Call Back - Admin. complaint recommended','2013-05-24',NULL,NULL,'11','0','3','8','1','0','0','0','0','0','0','0','1','0','0','0','0','0','0','0','0','0','0','0','0','0','2','1','0','0','0','1','0','0','0','1','0','0','0','0','4','0','1','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','2137340','4885503'
'D2','16','Broward','2010','1601657','BONAVENTURE COUNTRY CLUB','200 BONAVENTURE BLVD','WESTON','33326','2264876','2','Food','Complaint Full','Call Back - Complied','2013-12-04',NULL,NULL,'11','0','3','8','1','0','0','0','0','0','0','0','1','0','0','0','0','0','0','0','0','0','0','0','0','0','2','1','0','0','0','1','0','0','0','1','0','0','0','0','4','0','1','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','2137340','4885503'
'D5','65','St. Johns','2010','6500464','SCARLETT'S ST AUGGY & DOS GATOS ST AUGGY','70 HYPOLITA ST','SAINT AUGUSTINE','32084','2477216','2','Food','Routine - Food','Call Back - Admin. complaint recommended','2015-06-15',NULL,NULL,'1','0','1','0','1','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','1','0','0','0','0','0','0','0','6032700','5557390'

'D5','65','St. Johns','2010','6500464','SCARLETT'S ST AUGGY & DOS GATOS ST AUGGY','70 HYPOLITA ST','SAINT AUGUSTINE','32084','2477216','2','Food','Routine - Food','Call Back - Complied','2016-01-20',NULL,NULL,'1','0','1','0','1','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','1','0','0','0','0','0','0','0','6032700','5557390'

Simplify application startup

There's multiple things to consider here:

  • Adding :aot and :main to project.clj will make the command to run easier to run (don't have to specify the class path nor main class).
  • Maybe a short bash script that does the whole thing, instead of telling the user to run commands to compile. It would also pass the environmental variables to run, etc.
  • As an overkill, create a simple docker instance that boots up with mysql and runs the server in a container. FOR LATER.

Rename /get/ to a noun (inspection? detail?)

/get/:id - Full detailed inspection info for the given Inspection Id
http://198.199.73.168/fra/get/5888722
Check the Violation Codes here - go to the table Food Service Violations Numbered 1-58, and check extractions after 2013.

API design suggests that the APIs actions should be nouns rather than verbs. Because of this, I think /get/ should be renamed to something like /inspection/ or /inspection-detail/ or /detail/.

Sort and/or filter Inspections on /inspection/:id response

Currently I tested on production app and got:

violation19: 0,
violation30: 0,
violation43: 0,
violation11: 0,
violation31: 0,
violation12: 0,
violation07: 0,
violation32: 0,
violation41: 0,
violation46: 0,
violation03: 1,
violation45: 0,
violation01: 1,
violation33: 0,
violation15: 0,
violation10: 0,
violation21: 0,
violation53: 0,
violation55: 0,
violation51: 0,
violation36: 0,
violation38: 0,
violation18: 0,
violation49: 0,
violation27: 0,
violation56: 0,
violation28: 0,
violation47: 0,
violation50: 0,
violation39: 0,
violation04: 0,
violation37: 0,
violation34: 0,
violation16: 0,
violation42: 0,
violation40: 0,
violation54: 0,
violation05: 0,
violation52: 0,
violation26: 0,
violation57: 0,
violation23: 0,
violation20: 0,
violation58: 0,
violation13: 0,
violation24: 0,
violation08: 0,
violation09: 0,
violation35: 0,
violation14: 0,
violation06: 0,
violation29: 0,
violation25: 0,
violation44: 0,
violation02: 1,
violation22: 2,
violation17: 0,
violation48: 0

This is not sorted, and is very confusing. Would it make sense to filter the ones that have 0 value, or at least sort from 01 to 58?

Also, would it make sense to add a link to the violation description, maybe as part of another api endpoint?

Self documenting API

The / root api path should return all api endpoints available. Currently / navigates to this project's wiki. Maybe on the '/' call, we can return a json that describes all endpoints + has meta with a link to the github page, docs, and wiki.

Add CORS to the endpoints

While trying to access the endpoints chrome throws

XMLHttpRequest cannot load http://198.199.73.168/fra/list-counties.
 No 'Access-Control-Allow-Origin' header is present on the requested resource.
 Origin 'http://localhost:3000' is therefore not allowed access.

While this could be fixed with a reverse proxy server, it would be easier for developers if the API server includes the Access-Control-Allow-Origin * header

Environmental vars might be set on build, not on run. Audit this.

Setup app with env vars. Build. Push to a different environment server. Run and check env vars. Are they the env vars or the build env, or of the running env?

This might have been introduced in an attempt to log these env vars only once. Used to be the case the these were logged when defined to build, when testing, when starting server, and upon responding to requests- which resulted in a lot of noise in logs.

Improve the database

Right now the database is only a fidelity copy of the csv file. A big table with all the inspections.

However we should separate the tables to avoid data repetition like:
Table for Restaurants
Table for Districts
Table for Counties
Table for Violations
and finally a slim table for the inspections.

Also it would be good to have the coordinates of the restaurants in the restaurants table, right now we have only the addresses. But mapping apps would appreciate to pull the lat/lon of the restaurants and skip one more step on frontend.

make startDate and endDate optional fields rather than required

To do that, we'd have to decide on the following:

  • Decide on what the default endDate would be. I suggest the current date.
  • Decide on what the default startDate would be. You have 10 days; whatever you think works the best!
  • What happens if either endDate or startDate has an invalid value? Does the API fail gracefully?

Verify that cron job code runs sucessfully on deployed/remote instance

Check that the cron job is working. Might want to use more than one processor, and might break if only one available? Assumes it has more than 4 cores? one per file? (works locally but not on server?) or does it gracefully go like "oh i have only one processor (in my droplet, for example), I'll just do one by one...

Also maybe we can set up a temporary endpoint in which if we post to the server it will start scanning the csv files. Just for debugging purposes for now?

Add other useful query params. Like query inspections by totalViolations, etc

Useful to query/search/filter:

  • All inspections that resulted in totalViolations > threshold
  • Inspections with high, basic, intermediate violations count
  • visitnumber = params.number
  • location address like
  • filter Inspection type
  • Sort by ... ?

also.. theres some business/license data mixed into the inspections results endpoint. we should fix these queries so that data flows more naturally. Ideal use case: "I want to find a business with this license number. ok Ive got the business.. not I want all inspections for it".

Right now inspections have a bunch of data on other models and search is driven by inspections.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.