Sinkhole Cause Analysis

Description

This project is a collaboration between the New Construction Office, Public Works Department, Taipei City Government and the Taipei Urban Intelligence Center. By utilizing open data, the project aims to analyze and discuss potential causes of hazardous sinkholes, allowing for early prevention.

Using machine learning models, the project conducts time series analysis on a monthly basis to identify potential causes of sinkholes. Various open data sources, such as tidal information, groundwater levels, and earthquake occurrences, are integrated into the research.

The final results from the machine learning models can predict the number of sinkholes. The model's findings on the importance of different causes will provide feedback for practical discussions and strategies.

Dataset

Road Repair Case Data

Records of road repair-related cases, including details reported by the public, case coordinates, and the status of construction and dispatch.

Water Pipeline Repair Case Data

Records of water pipeline (such as tap water pipelines) repair cases, including the construction sections, report dates, and repair purposes.

Open data

For the usage of open data, you can refer to the list here. The data includes both time-series and spatial analysis. However, for now, the repository only releases the time-series module.

Quick Start

install python 3.X and anaconda
create environment using anaconda and Yaml file. Run the following command to setup the conda enviroment.
```
conda env create -f /path/to/environment.yml
```
If you want to obtain end-to-end results, you can execute main.py after setting up the Conda virtual environment. This file will execute the entire data downloading, preprocessing, modeling, and visualization.
Additionally, you can execute individual Python files to complete specific tasks in stages, especially preprocess_time_series.py. During the data preprocessing stage, significant effort has been made to consolidate various data related to the city. You may want to start by running preprocess_time_series.py, then check the time_series_table_M.csv file in the processed_data directory to verify the preprocessed data.
Finally, you can execute train_time_series.py to obtain the machine learning model results. This model aims to predict case counts for specific periods in the future based on factors such as precipitation, tide levels, and groundwater levels.

Feature

Open data ETL | Integrated 23 datasets from 4 opendata platform into different training features and analysis. The format of datasets includes csv, xml, shp, geojson and geopackage.
Data preprocess | Process time-series training features based on different time scale. (eg. Month / Week)
Model training | Using XGBoost to find out important features.
Visualization | Using Tableau, QGIS and also matplotlib to demonstrate important findings.

License

This project is licensed under the Apache 2.0 License. For more details, please refer to the LICENSE file.

Acknowledgements

The authors thank New Construction Office, Public Works Department, Taipei City Government for their invaluable collaboration throughout the project, from its initiation, through ongoing communication, to its successful execution. They played a crucial role in this endeavor. We would also like to express our gratitude to Road Excavation Administration Center, Public Works Department, Taipei City Government, and Water Resources Agency, MOEA for providing the data that enabled us to complete this project.

chu-c-git / sinkhole_cause_analysis Goto Github PK