Giter Site home page Giter Site logo

data-for-good-concepts / serverless-scraping-service Goto Github PK

View Code? Open in Web Editor NEW
6.0 6.0 0.0 925 KB

A concept implementation of a serverless scraping web service

License: Creative Commons Zero v1.0 Universal

R 97.29% Dockerfile 2.71%
cloudbuild cloudrun docker github-actions plumber-api rselenium

serverless-scraping-service's People

Contributors

jeniffen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

serverless-scraping-service's Issues

Setup domain restricted sharing (DRS) for public web service

Summary

Deploy public web service without unintentionally granting permissions to Google Cloud projects to external users.

Intended Outcome

Only services with tag allUserIngress:True are allowed unauthenticated requests.

How will it work?

Using Resource Manager tags and a conditional DRS policy in order to manage invocations. Example can be found here.

Read request body options when creating scraping job

  • Use post request payload in order enable flexible customization of datasets.
  • Re-evaluate currently hardcoded values and parameters and replace them with meaningful options.
  • Make sure download is clicked only after table is rendered
  • Provide download options
  • Complete documentation

Setup static scraping workflow

Summary

Scrape one particular dataset from Eurostat.

Intended Outcome

Service returns JSON response with scraped data. Outcome is currently fixed and can not be adjusted by parameters.

How will it work?

By making a post request to the service, scraper navigates to specified data, reads values and sends back a JSON response, containing the data.

Bootstrap web service skeleton

Summary

A basic web service that can handle and respond to API requests.

Intended Outcome

A bare bone web service that will respond to a GET request to localhost as well as to a public URL (Google Cloud Run).

How will it work?

Bootstrap web service using renv, plumbr and Docker in a manner that it can be deployed on Google Cloud run. Users can send a get request to a /health/status endpoint and will receive a response indicating that the service is online.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.