Giter Site home page Giter Site logo

shikhar97 / spatial-hotspot-analysis Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 646 KB

Spatial Hotspot Analysis on Geo-Spatial Data using Apache Spark and Scala

License: MIT License

Dockerfile 15.18% Scala 83.40% Shell 1.41%
docker nyc-taxi-dataset scala spatial-analysis sbt-assembly spark-submit bigdata geospatial-data getis-ord hot-cell-analysis

spatial-hotspot-analysis's Introduction

Spatial Hotspot Analysis on Geo-Spatial Data using Getis-Ord Statistic

A major peer-to-peer taxi cab firm has hired your team to develop and run multiple spatial queries on their large database that contains geographic data as well as real-time location data of their customers. A spatial query is a special type of query supported by geodatabases and spatial databases. The queries differ from traditional SQL queries in that they allow for the use of points, lines, and polygons. The spatial queries also consider the relationship between these geometries. Since the database is large and mostly unstructured, your client wants you to use a popular Big Data software application, SparkSQL. The goal of the project is to extract data from this database that will be used by your client for operational (day-to-day) and strategic level (long term) decisions.

Description

This task will focus on applying spatial statistics to spatio-temporal big data in order to identify statistically significant spatial hot spots using Apache Spark.

To Get Started

Install Apache Spark and SparkSQL on Computer

You will be using Apache Spark and SparkSQL in this project. Apache Spark is a sophisticated Big Data software application. Each team member needs to install Apache Spark and SparkSQL on his/her computer by carefully following the instructions on the page https://spark.apache.org/docs/latest/

To get started, team members will need to do some research about Apache SparkSQL and spatial queries.

Required Resource:

https://www.tutorialspoint.com/spark_sql/spark_sql_quick_guide.htm

Coding template specification

Input parameters

  1. Output path (Mandatory)
  2. Task name: "hotzoneanalysis" or "hotcellanalysis"
  3. Task parameters: (1) Hot zone (2 parameters): nyc taxi data path, zone path(2) Hot cell (1 parameter): nyc taxi data path

Example

test/output hotzoneanalysis src/resources/point-hotzone.csv src/resources/zone-hotzone.csv hotcellanalysis src/resources/yellow_trip_sample_100000.csv

Input data format

Hot zone analysis

The input point data can be any small subset of NYC taxi dataset.

Hot cell analysis

The input point data is a monthly NYC taxi trip dataset (2009-2012) like "yellow_tripdata_2009-01_point.csv"

Output data format

Hot zone analysis

All zones with their count, sorted by "rectangle" string in an ascending order.

"-73.795658,40.743334,-73.753772,40.779114",1
"-73.797297,40.738291,-73.775740,40.770411",1
"-73.832707,40.620010,-73.746541,40.665414",20

Hot cell analysis

The coordinates of top 50 hotest cells sorted by their G score in a descending order.

-7399,4075,15
-7399,4075,29
-7399,4075,22

As I have created a dockerfile, you can download and run using the following script:

  1. Run docker pull shikharg1997/group6-project1-phase2-bonus:v0 to download image
  2. Run docker run -it --rm shikharg1997/group6-project1-phase2-bonus:v0 to start the container.
  3. Run sbt assembly
  4. Run spark-submit target/scala-2.12/CSE511-Hotspot-Analysis-assembly-0.1.0.jar result/output hotzoneanalysis src/resources/point-hotzone.csv src/resources/zone-hotzone.csv hotcellanalysis src/resources/yellow_tripdata_2009-01_point.csv

Docker Image: docker pull shikharg1997/group6-project1-phase2-bonus:v0

spatial-hotspot-analysis's People

Contributors

shikhar97 avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.