Giter Site home page Giter Site logo

uk_crime_data's Introduction

UK crime data

UK Police Data Ingestion

The aim of the project is to ingest data from UK police data API to Google cloud Storage -> BigQuery -> Data studio.
To be more precise, data is coming from this API about stop and searches by force:
https://data.police.uk/docs/method/stops-force/

The data from 'metropolitan' police force is used.

Project Architecture



These Technologies are used for this Project:

  • Python
  • Docker Compose
  • Airflow
  • Google cloud Storage
  • Google cloud BigQuery
  • Google cloud Data Studio
  • Terraform
  • PySpark
  • Jupyter Notebooks

Dags

As a description of the Pipeline, the DAGS tasks are presented here:



Setup Google Cloud project

Setup your account as described here

Update Files

Update the terraform variables.tf

variable "project" {
  description = "Your GCP Project ID"
}

Update the docker-compose.yaml file with your project id and google storage bucket

GCP_PROJECT_ID: 'de-bootcamp-339509'
GCP_GCS_BUCKET: 'dtc_data_lake_de-bootcamp-339509'
echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_GID=0" > .env
docker-compose up airflow-init
docker-compose up

Terraform

gcloud auth application-default login 
terraform init
terraform plan
terraform apply

Airflow

There are two DAGS in airflow
The data_ingestion dag is running monthly to collect the data
The gcs_to_bq_dag is running once, when the data_ingestion DAG has finished

Big Query

A partioned table is created in Big Query.
The partition is based on datetime to optimise performance when querying based on datetime

Queries

This is the Query needed to produce the dashboard

SELECT
IFNULL(NULLIF(age_range,''),'N/A') AS age_range,
IFNULL(NULLIF(outcome,''),'N/A') AS outcome,
IFNULL(NULLIF(gender,''),'N/A') AS gender,
datetime,
IFNULL(NULLIF(officer_defined_ethnicity,''),'N/A') AS officer_defined_ethnicity,
IFNULL(NULLIF(type,''),'N/A') AS type,
IFNULL(NULLIF(object_of_search,''),'N/A') AS object_of_search,
latitude,
longitude,
lat_long
FROM 'de-bootcamp-339509.stop_and_search.stop_and_search_partitioned_table'

More queries are in this file queries.sql

Dashboard

Link to Data Studio report



Notebooks

PySpark Jupyter Notebooks Analysis

An analysis of sample data is presented in this jupyter notebook file search-force.ipynb

Read from Google cloud Storage with Spark and Jupyter

gcs-search-force.ipynb

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.