Giter Site home page Giter Site logo

my-machine-learning-projects-ct / kaggle-crime-master Goto Github PK

View Code? Open in Web Editor NEW
12.0 1.0 1.0 64.09 MB

[Data Science] An end to end project to explore & visualize crime data and predict category of crime in San Francisco.

License: MIT License

Jupyter Notebook 100.00%
data-science sf-crime-prediction kaggle kaggle-competition machine-learning feature-engineering data-mining

kaggle-crime-master's Introduction

Kaggle San Francisco Crime Classification

An end to end project to explore, visualize, and analyze San Francisco crime data and predict category of crime given temporal and spatial features.

Installation

All Python packages required for this project are located in the requirements.txt and can be installed via the command below.

$ pip install -r requirements.txt

Files

Data Exploration & Visualization

Data Mining & Machine Learning

  • kaggle-sf-crime-prediction.ipynb - jupyter notebook with end to end data science workflow, such as data preprocessing, feature engineering, building baseline models, model selection, hyperparameter tuning, and Kaggle submission.

Folders

Data Visualizations

  • visualizations - folder containing visualizations of the data (barplots, scatterplots, heatmaps, maps, etc.)

Hyperparameter Tuning Results

  • cv_results - folder containing the hyperparameter tuning results (CV scores (mean & standard deviation) and hyperparameters) at each iteration of Bayesian Optimization.

Dataset

Dataset contains incidents derived from SFPD Crime Incident Reporting system. The data ranges from 1/1/2003 to 5/13/2015 (~12 years worth of data). The training set and test set rotate every week, meaning week 1,3,5,7,... belong to test set, week 2,4,6,8,... belong to training set.

Data Fields

  • Dates - timestamp of the crime incident
  • Category - category of the crime incident (only in train.csv). This is the target variable you are going to predict.
  • Descript - detailed description of the crime incident (only in train.csv)
  • DayOfWeek - the day of the week
  • PdDistrict - name of the Police Department District
  • Resolution - how the crime incident was resolved (only in train.csv)
  • Address - the approximate street address of the crime incident
  • X - Longitude
  • Y - Latitude

The source of the dataset can be found in the following links:

Visualizations

Some visualizations of the spatial and temporal features along with category of crime.

Heatmap of SF police district given category of crime

Lineplot of year given category of Crime

Kaggle Submission

Achieved a multi-class log loss of of 2.25674, which would ideally rank at #136 (out of 2,335 teams) or at the top 6% or 94th percentile on the public leaderboard. More details can be found in this notebook: kaggle-sf-crime-prediction.ipynb

License

See the LICENSE file for license rights and limitations (MIT).

kaggle-crime-master's People

Contributors

aditmodi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

aditya6592

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.