Giter Site home page Giter Site logo

coviddatadownloader's Introduction

Covid19 Data Downloader

Deploy master branch

This is a lightweight data downloading pipeline. We grab data from New York Times' GitHub repo for Covid 19 data. The project uses the Serverless Framework to deploy a Lambda Function and a CloudWatch Event to trigger that Lambda Function every 4 hours.

Architecture

Architecture

The data that we want to retrieve lives in a GitHub repository owned by the New York Times. I chose this data because it's relatively well maintained, and needs very little transforms to be used (at least for my simple needs). The data is pulled down with a Lambda Function which then uploads the data to a S3 bucket. The Lambda Function is triggered every 4 hours via a CloudWatch Event. I used AWS Glue to create a table, and detect the schema of the data. Because the data is clean, and simple Glue was able to infer the schema perfectly. Once the Glue job completes, I have a usable table in Amazon Athena, with which I can explore the data using some standard SQL queries. The final steps in this process are to create a dataset in AWS QuickSight, and then create some visualizations. QuickSight automatically refreshes its data once every day.

Deployment

For this project I'm using the Serverless Framework. When deployed, the Serverless Framework creates the required CloudFormation templates for the resources defined in serverless.yml and then deploys. Serverless also packages up the Lambda Function and puts the code on a S3 bucket, which is used by CloudFormation to create the Function.

The final part of the deployment process uses GitHub Actions to do the deployment. The team that created the Serverless Framework also have published a GitHub action that can be used to deploy Serverless projects. GitHub Actions creates build badges to show the status of the build (see above)

Example visualizations from QuickSight

This visualization shows the daily cumulative cases and deaths, aggregated for all States. Bar chart showing daily, cumulative cases and deaths for all states

This visualization shows cumulative cases by County Map cumulative cases fr counties

Comparison analysis of the escalation of the virus by state Comparative escalation analysis

To Do

  • Refactor serverless.yml. It's a bit of a mess right now.

coviddatadownloader's People

Contributors

kepstein avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.