Giter Site home page Giter Site logo

gfbarbieri / dc-criminalistics Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 2.0 559.19 MB

Cohort 14 Capstone Project for the Certificate of Data Science at Georgetown University School of Continuing Studies.

License: MIT License

Python 0.36% Jupyter Notebook 99.64%

dc-criminalistics's Introduction

DC-Criminalistics

Cohort 14
Georgetown University
School of Continuing Studies
Data Science Certificate Capstone Project

Team Members:

Greg Barbieri - @gfbarbieri
Dan Schorer - @danschorer
Tara Brosnan - @tarabrosnan

Folder Organization

ingest-data-progs: Programs used to ingest Census, weather, WMATA, and CaBi data. Crime data was pulled directly from DC's Crime Cards application, exported as CSV. Census, weather, WMATA, and CaBi programs store data as SQLite database table.
wrangle-data-progs: Programs used to wrangle, merge, and generate features and targets.
data: Data resulting from ingestion and wrangling.
notebooks: Notebooks versions of ingestion and wrangling process, including target generation. Notebooks on exploratory data analysis (EDA), feature standardization, feature selection, machine learning models and output. Notebooks to reproduce graphs for the technical report.
report: Files and external data used in the technical report.
model: Output of fitted encoding, standardization, and model parameters for prediction and the data product.
demo: Programs used for demonstrating data product.

Deliverables

  1. Presentation
  2. Final Report

Abstract

The goal of the project is to use machine learning to predict crime rates. Regardless of available data, it is difficult, if not impractical, to predict whether a particular individual will be a victim of violent or non-violent crime in an area. The team hypothesized that it was possible to predict crime rates by block group in Washington DC using features such as weather, time of day, and location. The team ingested data from the US Census Bureau, DC Metropolitan Police Department, and Dark Sky website. After wrangling, feature generation and target rescaling, the team had about 180,000 instances and 26 features. Feature evaluation limited the selection from 26 to 11 and the team selected 6 features to model crime rates. The team used classification models to predict crime rate buckets of low, low-medium, medium, medium-high, and high. Overall, a bagging classification model with a decision tree estimator outperformed other models tested such as K-Nearest Neighbors and Random Forest models. Overall model accuracy was 82% percent and all models, including those with economic and demographic data had trouble accurately classifying crime rates in the high category as measured by false negatives and visualized by the confusion matrix, while more certain in predicting crime rates in the medium, medium-high, and medium-low categories.

Project Overview

Purpose: Predict crime rates in Washington, DC and return recommended transportation options in the user-defined geographic area.
Target: Crime rate per 100,000 people by Census block group and time of day.
Features: Block group, day of the month, day of the week, time of day, UV index, average temperature.

Architecture
Architecture Logo

Data Sources

  1. American Community Survey (ACS) and the ACS API
  2. DC MPD Crime Cards
  3. Dark Sky and the Dark Sky API.

Methodology

Ingestion: Download from data sources using their API or download directly from their website.
Wrangling: Feature and target generation and feature standardization.
Machine Learning: Employ supervised machine learning methods to select features and model.

dc-criminalistics's People

Contributors

gfbarbieri avatar danschorer avatar tarabrosnan avatar kbelita avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.