Giter Site home page Giter Site logo

covid-county-trend-scraper's Introduction

DSHS county trend scraper

Purpose

To gather daily new case totals for each county in the state and calculate a set of 14 7-day rolling averages used to visualize how each county is doing.

Procedure

This scraper hits two DSHS files each day. The first is DSHS's daily feed of cases by county, found here. The second is a general configuration file for DSHS that contains the last update date. If the update date is later than the last update within our trend file, the first file is used to update our file of 7-day averages.

What's here

  • scraper.py: Set of functions to complete the daily update and repair files should a daily update be missed.
  • service.py: File run on AWS Lambda. Runs the daily update and uploads the resulting data file back to AWS S3.
  • utils.py: Simple function that handles the uploading of files to S3.
  • zappa_settings.json: Zappa configuration file containing project name, description runtime environment, and most importantly, schedule for the scraper to run.

Developing locally

Download the repository and run $ pipenv install --development.

Copy the .env-example file and rename it .env. Add your own AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY. For messaging to slack, you'll also need our SLACK_TOKEN.


Editing the scraper

This project uses zappa to upload and schedule the scraper to our AWS Lambda. After making changes to the scraper, run pipenv run zappa update to push those changes to Lambda. Scheduling is handled via the zappa-settings.json file. The events key is an array of events objects. The service.handler opject has an expression key that can either take a schedule in cron format or a rate (rate(12 hours)).


Running the scraper locally

To run the scraper locally, run the following command in the command line:

$ pipenv run python service.py

Note, this will simulate a scheduled scraper run, so any files generated by this will be uploaded to S3. To run or test just the scraper locally, run:

$ pipenv run python scraper.py

covid-county-trend-scraper's People

Contributors

johndhancock avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.