Giter Site home page Giter Site logo

varshavjm / acm-sigspatial-cup-2016 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ajkulkarni/acm-sigspatial-cup-2016

0.0 2.0 0.0 1.92 MB

Applying spatial statistics to spatio-temporal big data in order to identify statistically significant hot spots using Apache Spark & Java 8

Java 24.46% Ruby 44.18% Shell 5.81% HTML 6.51% JavaScript 19.05%

acm-sigspatial-cup-2016's Introduction

ACM Sigspatial Cup 2016

Distributed computation on Geo-spatial data


This repository contains an implementation for the problem statement of ACM Signspatial 2016 to calculate the top 50 hotspots for New York city cabs with some constraints.

  • The location considered for this problem was the pickup location
  • The cell attribute value was the number of journeys instead of the number of passengers
  • The step size was of 1 day
  • The input data was restricted to January 2015
  • Neighbor weight was equal and fixed to 1

The metric for calculation of the hotspots was the Getis-Ord score and the implementation was done on Apache Spark.

Algorithm

The algorithm we developed allowed complete parallelization of the computation of the Getis Ord score using 2 Mapreduce phases:

  1. Mapreduce Phase 1: Creation of cells
  2. Map step: Parse each row from the input file and output a cell object having ID as the X (Latitude), Y (Longitude) and Z (Day) coordinates of the journey
  3. Filter those cells which were outside a fixed geo-envelope
  4. Reduce step: Add up all the journeys in a cell to calculate the attribute value for the cell
  5. Intermediate Calculations
  6. Calculate the mean and standard deviation from the cell information
  7. Calculate the total number of cells in the cube
  8. Mapreduce Phase 2: Calculating the Getis-Ord score
  9. The key step in the calculation of the Getis-Ord score was finding the neighboring cells and obtaining their attribute value.
  10. The magic sauce to getting the neighbor attribute value was as follows: 1. Map step: For each cell, calculate the IDs of its neighbors and send out its attribute value to all of its neighbors 2. Reduce step: For each cell during the reduce step, all of its neighbor values would have sent their values to this cell making all required data for calculating the Getis-Ord score available. The computation of the score was done locally for each cell
  11. Retrieving top 50 hotspot cells
  12. The top 50 hotspots were retrieved using the top function of Apache Spark since this was an efficient way to lookup and retrieve top-k values from an RDD

acm-sigspatial-cup-2016's People

Contributors

ajkulkarni avatar ajinxpatil avatar omkarkaptan avatar dhanashreea avatar

Watchers

James Cloos avatar Varsha Muzumdar avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.