This project demonstrates using Apache Airflow to classify house prices based on predefined percentiles. The project assumes you have Docker installed and configured. To classify the houses we classify them by Suburb, Rooms, Type and YearBuilt. And the essential colums are Price, Suburb and Type.
-
Clone the repository:
git clone https://github.com/diegolupi93/abstract.git cd abstract
-
Create Required folders
mkdir config logs plugins
-
Build and Run the Docker image:
docker-compose up airflow-init
docker-compose up
-
Access Airflow UI:
Open http://localhost:8080 in your browser. Use the default Airflow credentials to login:
username: airflow
password: airflow
- Configure PostgreSQL connection:
Go to Admin -> Connections in Airflow UI. Create a new connection with the following details:
Connection Id: postgres_localhost
Host: postgres
Database: airflow
Login: airflow
Password: airflow
Port: 5432
- Define the config variables Go to Admin -> Variables in Airflow UI.
Key: dag_config
Val:
{
"URL":"https://s3.amazonaws.com/external.abstractapi.com/challenges/melb_data.csv",
"download_path":"melb_data.csv",
"essential_fields": ["Price", "Suburb", "Type"]
}
- Linux Nodification If you are linux user you should do
sudo nano /etc/hosts
and add this line
127.0.0.1 postgres
- Enable the DAG:
- Go to Airflow UI.
- Click on the toggle button to enable the melbourne_dag.
- Trigger the DAG:
Once enabled, the DAG will be scheduled according to its schedule_interval. You can manually trigger the DAG melbourne_dag to run immediately from the Airflow UI.
To Check if the data is stored you can run the SQLquery.ipynb.