How to pull data from cloud storage, move it into BigQuery, and looking for what was the most search keyword of the day, and run composer for daily schedule. Tools: Cloud Storage, BigQuery, Cloud Composer, and Dataflow.
How to pull data from another BigQuery table from another account using credential, take some field as it need, store into your own BigQuery table, and run composer for 3 days interval schedule. Tools: BigQuery, Cloud Composer.
Use git to clone this repository
git clone https://github.com/fdhanh/etl-on-cloud1.git
Python 3.7.3
To run the script in this repository, you need to install the prerequisite library from requirements.txt
pip install -r requirements.txt
- Enable API(for Cloud Dataflow and Cloud Composer)
- Create service account as an owner
- Create bucket and bq table or send this code to shell
./etl-on-cloud1/main.sh
- Create cloud composer
follow this environment
- location: us-central1
- node count: 3
- zone: us-central1-a
- machine type: n1-standard-1
- disk size: 20
- service account: adjust to the service account that you just created
- If the composer was done created, go to airflow UI. Create variable on admin page.
You can upload file on folder
bucket/variables.json
in this repo and dont forget to fit your project id, etc. - Upload dags files to
dags/
folder
Or you can send this code to shell:gsutil cp etl-on-cloud1/dags/* gs://<your composer bucket>/dags
And upload your json credential files todata/
folder - Check your airflow task.