Giter Site home page Giter Site logo

blockchain-etl / iotex-etl Goto Github PK

View Code? Open in Web Editor NEW
5.0 4.0 7.0 456 KB

ETL (extract, transform and load) tools for ingesting IoTeX blockchain data to Google BigQuery and Pub/Sub

Home Page: https://console.cloud.google.com/bigquery?page=dataset&d=crypto_iotex&p=public-data-finance

License: MIT License

Python 91.67% Dockerfile 0.22% Shell 0.43% Java 6.89% HTML 0.79%
bigquery blockchain-data iotex sql

iotex-etl's Introduction

IoTeX ETL

Build Status Telegram

Overview

IoTeX ETL allows you to setup an ETL pipeline in Google Cloud Platform for ingesting IoTeX blockchain data into BigQuery and Pub/Sub. It comes with CLI tools for exporting IoTeX data into JSON newline-delimited files partitioned by day.

Data is available for you to query right away in Google BigQuery.

Architecture

iotex_etl_architecture.svg

Google Slides version

  1. The nodes are run in a Kubernetes cluster. Refer to IoTeX Node in Kubernetes for deployment instructions.

  2. Airflow DAGs export and load IoTeX data to BigQuery daily. Refer to IoTeX ETL Airflow for deployment instructions.

  3. IoTeX data is polled periodically from the nodes and pushed to Google Pub/Sub. Refer to IoTeX ETL Streaming for deployment instructions.

  4. IoTeX data is pulled from Pub/Sub, transformed and streamed to BigQuery. Refer to IoTeX ETL Dataflow for deployment instructions.

Setting Up

  1. Follow the instructions in IoTeX Node in Kubernetes to deploy an IoTeX node in GKE. Wait until it's fully synced. Make note of the Load Balancer IP from the node deployment, it will be used in Airflow and Streamer components below.

  2. Follow the instructions in IoTeX ETL Airflow to deploy a Cloud Composer cluster for exporting and loading historical IoTeX data. It may take several hours for the export DAG to catch up. During this time "load" and "verify_streaming" DAGs will fail.

  3. Follow the instructions in IoTeX ETL Streaming to deploy the Streamer component. For the value in last_synced_block.txt specify the last block number of the previous day. You can query it in BigQuery: SELECT height FROM crypto_iotex.blocks ORDER BY height DESC LIMIT 1.

  4. Follow the instructions in IoTeX ETL Dataflow to deploy the Dataflow component. Monitor "verify_streaming" DAG in Airflow console, once the Dataflow job catches up the latest block, the DAG will succeed.

iotex-etl's People

Contributors

medvedev1088 avatar oun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

iotex-etl's Issues

Create Source Repository to hold Dataflow chainConfig.json file

Also add a section in README similar to https://github.com/blockchain-etl/iotex-etl/tree/master/airflow#creating-a-cloud-source-repository-for-airflow-variables.

Replace "iotex-etl-dev" with <your_project> in chainConfigIotexDev.json


For automation we need to do it in two steps: https://github.com/blockchain-etl/hedera-etl#running-on-gcp-dataflow.

First cloudbuild will upload the dataflow template to GCS bucket

Second creates a job using that tempate and parameters file

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.