Giter Site home page Giter Site logo

etl-on-cloud1's Introduction

ETL on cloud with Google Cloud Platform

Case 1

How to pull data from cloud storage, move it into BigQuery, and looking for what was the most search keyword of the day, and run composer for daily schedule. Tools: Cloud Storage, BigQuery, Cloud Composer, and Dataflow.

alt text

Case 2

How to pull data from another BigQuery table from another account using credential, take some field as it need, store into your own BigQuery table, and run composer for 3 days interval schedule. Tools: BigQuery, Cloud Composer.

alt text

Installation

Use git to clone this repository

git clone https://github.com/fdhanh/etl-on-cloud1.git

Prerequisite

Python 3.7.3

To run the script in this repository, you need to install the prerequisite library from requirements.txt

pip install -r requirements.txt

Usage

  1. Enable API(for Cloud Dataflow and Cloud Composer)
  2. Create service account as an owner
  3. Create bucket and bq table or send this code to shell ./etl-on-cloud1/main.sh
  4. Create cloud composer follow this environment
    • location: us-central1
    • node count: 3
    • zone: us-central1-a
    • machine type: n1-standard-1
    • disk size: 20
    • service account: adjust to the service account that you just created
  5. If the composer was done created, go to airflow UI. Create variable on admin page. You can upload file on folder bucket/variables.json in this repo and dont forget to fit your project id, etc.
  6. Upload dags files to dags/ folder
    Or you can send this code to shell: gsutil cp etl-on-cloud1/dags/* gs://<your composer bucket>/dags And upload your json credential files to data/ folder
  7. Check your airflow task.

etl-on-cloud1's People

Watchers

Hafidah Mustha'anah avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.