Giter Site home page Giter Site logo

ucr_scraper's Introduction

FBI Crime Data Explorer Scraper

This is a simple Python script that scrapes the U.S. arrest data by state and by agency using the Federal Bureau of Investigation's Crime Data Explorer (CDE) (API). I originally wrote this script for work to benchmark FBI's Uniform Crime Reporting (UCR) data against the data we have acquired at the Criminal Justice Administrative Records System (CJARS) at the University of Michigan (for current data holdings, see here). I'm assuming there might be similar codes out there but here is another one in case some one is looking for U.S. arrest data by offense type. So please use responsibly! ๐Ÿ˜‰

Output

The run.py file will save 3 different types of .xlsx files (~100 files altogether):

  • ucr_ori_crosswalk.xlsx: Crosswalk of Agency ORI
    • API Endpoint: 'sapi/api/agencies'
  • arrest_by_agency_*.xlsx: Agency-level arrest data for each sate by offense type
    • API Endpoint: 'sapi/api/data/arrest/agencies/offense/{ori}/all/{min_yr}/{MAX_YEAR}'
  • arrest_by_state_*.xlsx: State-level arrest data by offense type
    • API Endpoint: 'sapi/api/data/arrest/states/offense/{state}/all/{min_yr}/{MAX_YEAR}'

Install

First, clone the repository:

$ git clone https://github.com/jaycatsby/ucr_scraper.git

Make sure you have all of the required packages (in virtualenv preferably):

$ pip install -r requirements.txt

Run

Register

If you haven'd done so already, sign up for an API Key: https://api.data.gov/signup/

Edit settings.py

  • Set API_KEY in line 3 to what you received in the registration email (e.g.): API_KEY = 'AGKQGIJPQEOJH!LNHPIJh31-9ujpfkn-h9h'

  • (Optional) Set RAW_PATH: By default, all of the data will be saved as .xlsx files in raw folder of the current directory.

  • (Optional) Set MIN_YEAR: By default, starts from 1985. I initially set this to 1975 to see if there would be differences in coverage but from my initial glance, most of the data seem to start in 1985.

  • (Optional) Set MAX_YEAR: Currently data up to 2018 is available. Edit as see fit.

  • (Optional) Set MAX_WORKERS: Please be responsible! By default, set to use 2 processes

Scrape

After editing settings.py, run run.py

$ python run.py

Features

  • Stata Support: After scraping, run clean_arrest.do file to generate *.dta files of the arrest files in ./raw

ucr_scraper's People

Contributors

jaycatsby avatar

Watchers

 avatar

Forkers

parthganeriwala

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.