This project is a collaboration between the New Construction Office, Public Works Department, Taipei City Government and the Taipei Urban Intelligence Center. By utilizing open data, the project aims to analyze and discuss potential causes of hazardous sinkholes, allowing for early prevention.
Using machine learning models, the project conducts time series analysis on a monthly basis to identify potential causes of sinkholes. Various open data sources, such as tidal information, groundwater levels, and earthquake occurrences, are integrated into the research.
The final results from the machine learning models can predict the number of sinkholes. The model's findings on the importance of different causes will provide feedback for practical discussions and strategies.
Records of road repair-related cases, including details reported by the public, case coordinates, and the status of construction and dispatch.
Records of water pipeline (such as tap water pipelines) repair cases, including the construction sections, report dates, and repair purposes.
For the usage of open data, you can refer to the list here. The data includes both time-series and spatial analysis. However, for now, the repository only releases the time-series module.
- install python 3.X and anaconda
- create environment using anaconda and Yaml file.
Run the following command to setup the conda enviroment.
conda env create -f /path/to/environment.yml
- If you want to obtain end-to-end results, you can execute
main.py
after setting up the Conda virtual environment. This file will execute the entire data downloading, preprocessing, modeling, and visualization. - Additionally, you can execute individual Python files to complete specific tasks in stages, especially
preprocess_time_series.py
. During the data preprocessing stage, significant effort has been made to consolidate various data related to the city. You may want to start by runningpreprocess_time_series.py
, then check thetime_series_table_M.csv
file in theprocessed_data
directory to verify the preprocessed data. - Finally, you can execute
train_time_series.py
to obtain the machine learning model results. This model aims to predict case counts for specific periods in the future based on factors such as precipitation, tide levels, and groundwater levels.
- Open data ETL | Integrated 23 datasets from 4 opendata platform into different training features and analysis. The format of datasets includes csv, xml, shp, geojson and geopackage.
- Data preprocess | Process time-series training features based on different time scale. (eg. Month / Week)
- Model training | Using XGBoost to find out important features.
- Visualization | Using Tableau, QGIS and also matplotlib to demonstrate important findings.
This project is licensed under the Apache 2.0 License. For more details, please refer to the LICENSE file.
The authors thank New Construction Office, Public Works Department, Taipei City Government for their invaluable collaboration throughout the project, from its initiation, through ongoing communication, to its successful execution. They played a crucial role in this endeavor. We would also like to express our gratitude to Road Excavation Administration Center, Public Works Department, Taipei City Government, and Water Resources Agency, MOEA for providing the data that enabled us to complete this project.