Giter Site home page Giter Site logo

as10896 / ethereum-etl-airflow Goto Github PK

View Code? Open in Web Editor NEW

This project forked from blockchain-etl/ethereum-etl-airflow

0.0 0.0 0.0 2.63 MB

Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethereum-smart-contract-into-bigquery-in-8-mins-bab5db1fdeee

License: MIT License

Shell 0.17% Python 99.47% Dockerfile 0.36%

ethereum-etl-airflow's Introduction

Ethereum ETL Airflow

Read this article: https://cloud.google.com/blog/products/data-analytics/ethereum-bigquery-how-we-built-dataset

Setting up Airflow DAGs using Google Cloud Composer

Create BigQuery Datasets

Create Google Cloud Storage bucket

Create Google Cloud Composer (version 2) environment

Create a new Cloud Composer environment:

export ENVIRONMENT_NAME=ethereum-etl-0

AIRFLOW_CONFIGS_ARR=(
    "celery-worker_concurrency=8"
    "scheduler-dag_dir_list_interval=300"
    "scheduler-min_file_process_interval=120"
)
export AIRFLOW_CONFIGS=$(IFS=, ; echo "${AIRFLOW_CONFIGS_ARR[*]}")

gcloud composer environments create \
    $ENVIRONMENT_NAME \
    --location=us-central1 \
    --image-version=composer-2.1.14-airflow-2.5.1 \
    --environment-size=medium \
    --scheduler-cpu=2 \
    --scheduler-memory=13 \
    --scheduler-storage=1 \
    --scheduler-count=1 \
    --web-server-cpu=1 \
    --web-server-memory=2 \
    --web-server-storage=512MB \
    --worker-cpu=2 \
    --worker-memory=13 \
    --worker-storage=10 \
    --min-workers=1 \
    --max-workers=8 \
    --airflow-configs=$AIRFLOW_CONFIGS

gcloud composer environments update \
    $ENVIRONMENT_NAME \
    --location=us-central1 \
    --update-pypi-packages-from-file=requirements_airflow.txt

Create variables in Airflow (Admin > Variables in the UI):

Variable Description
ethereum_output_bucket GCS bucket to store exported files
ethereum_provider_uris Comma separated URIs of Ethereum nodes
ethereum_destination_dataset_project_id Project ID of BigQuery datasets
notification_emails email for notifications

Check other variables in dags/ethereumetl_airflow/variables.py.

Updating package requirements

Suggested package requirements for Composer are stored in requirements_airflow.txt.

You can update the Composer environment using the follow script:

ENVIRONMENT_NAME="ethereum-etl-0"
LOCAL_REQUIREMENTS_PATH="$(mktemp)"

# grep pattern removes comments and whitespace:
cat "./requirements_airflow.txt" | grep -o '^[^#| ]*' > "$LOCAL_REQUIREMENTS_PATH"

gcloud composer environments update \
  "$ENVIRONMENT_NAME" \
  --location="us-central1" \
  --update-pypi-packages-from-file="$LOCAL_REQUIREMENTS_PATH"

Note: Composer can be very pedantic about conflicts in additional packages. You may have to fix dependency conflicts where you had no issues testing locally (when updating dependencies, Composer does something "cleverer" than just pip install -r requirements.txt). This is why eth-hash is currently pinned in requirements_airflow.txt. Typically we have found that pinning eth-hash and/or eth-rlp may make things work, though Your Mileage May Vary.

See this issue for further ideas on how to unblock problems you may encounter.

Upload DAGs

> ./upload_dags.sh <airflow_bucket>

Running Tests

pip install \
    -r requirements_test.txt \
    -r requirements_local.txt \
    -r requirements_airflow.txt
pytest -vv -s

Running locally

A docker compose definition has been provided to easily spin up a local Airflow instance.

To build the required image:

docker compose build

To start Airflow:

docker compose up airflow

The instance requires the CLOUDSDK_CORE_PROJECT environment variable to be set in most cases. Airflow Variables can be defined in variables.json.

Creating Table Definition Files for Parsing Events and Function Calls

Read this article: https://medium.com/@medvedev1088/query-ens-and-0x-events-with-sql-in-google-bigquery-4d197206e644

More Information

You can follow the instructions here for Polygon DAGs https://github.com/blockchain-etl/polygon-etl. The architecture there is very similar to Ethereum so in most case substituting polygon for ethereum will work. Contributions to this README file for porting documentation from Polygon to Ethereum are welcome.

ethereum-etl-airflow's People

Contributors

amiloski avatar araa47 avatar askeluv avatar ayadigithub avatar charlielewisme avatar cheungringo avatar circularr avatar controlcrepeat avatar daishuai323 avatar dkrupiza avatar ege77er avatar evgeniuz avatar htkao avatar imrankhan37 avatar iter-io avatar ivigamberdiev avatar jasonbxu avatar kome12 avatar marcinja avatar markusbkoch avatar medvedev1088 avatar moranmalik avatar ninjascant avatar nothingnix avatar rstevens2022 avatar saurabhbikram avatar simoncousaert avatar thelazyliz avatar timnooren avatar yazzyyaz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.