Giter Site home page Giter Site logo

st215 / jailjawn Goto Github PK

View Code? Open in Web Editor NEW
8.0 4.0 2.0 34 KB

Build a web scraper for the daily Philadelphia Prison Census and make that data beautiful and useful for citizens and academic research.

Python 100.00%
python firebase prison inmates codeforphilly

jailjawn's Introduction

Jail Jawn

Data Source: (http://www.phila.gov/prisons/page.htm)

What is Jail Jawn and Why?

This is the repository for the JailJawn.com scraper code written in Python. This started as a project to learn Python and Serverless deployment.

The following code in the repository accesses the static page provided by The City of Philadelphia Department of Prisons Census page (http://www.phila.gov/prisons/page.htm). This web page is generated internally possible by a human at infrequent times using Excel to HTML which doesn't create clean tables for scraping which requires a custom solution which as been implemented.

The Python code is deployed to Amazon Web Services Lambda running on a daily CRON job. Once the data is scraped via AWS Lambda it is pushed to our Google Firebase instance for permanent storage.

From the Google Firebase instance, we use Heroku to push the data API to the web using Javascript to render the charts on the client side.

The repositories for the those are located here: API: https://github.com/JailJawn/JailJawnAPI WebApp / Site: https://github.com/JailJawn/jailjawnapp

Any questions I can be found on Website: http://www.StanleyGriggs.com/

Twitter: http://www.twitter.com/ST215

Feel free to make issue tickets and suggestions.

Goal

Historical Inmate Data, Beautiful Charts, and The Ability see trends over time.

Tech:

Python Requests (http://docs.python-requests.org/en/latest/) Python lxml (http://lxml.de/)

Steps to run on Windows

Download Python

1. http://docs.python-requests.org/en/latest/user/install/#install

Set up Python Path

1. Open Control Panel
2. Go To Security and Systems
3. Go to System
4. Open Advanced System Settings
5. Go to the "Advanced" tab and open Environmental Variables
6. Scoll down to "Path" in System Variables and then double-click
7. Add the local address of your Python library to the Variable Value field (For example: C:\Python27)
	-If there are any other paths in the field then seperate them with a semicolon (For example C:\Java_lib;C:\Python27)

####Download Requests

1. clone git://github.com/kennethreitz/requests.git
2. Open terminal and run python setup.py install

Download lxml

1. https://pypi.python.org/pypi/lxml/3.2.3

jailjawn's People

Contributors

mauricej9 avatar st215 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

jailjawn's Issues

HTML Output

Try using BeautifulSoup as your HTML parser. It does a good job of cleaning up the code.

Dynamically Discover Valid Rows

Write a piece of code that goes through every row of the tree and tells you if the first column had a whitespace character or not.

example '\xa0'

Print row # and True or False - will print Boolean.

Finds valid rows!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.