Miscellaneous Python scripts
Either fork this repo and then clone to your working directory or download a *.zip file of the code. Create a virtual environment and then run pip
to install package dependencies listed in the requirements.txt
file.
$ cd path/to/SI664/scripts
$ source venv/bin/activate
(venv) $ pip3 install -r requirements.txt
> cd path/to/SI664/scripts
> venv\Scripts\activate
(venv) >
Next, install the mysqlclient
package. You must utilize Christoph Gohlke's
collection of [Unoffical Windows Binaries for Python Extension Packages](https://www.lfd.uci
.edu/~gohlke/pythonlibs/) to install the mysqlclient
package. Download the appropriate the mysqlclient the wheel (*.whl) file. For Python 3.7 click on "mysqlclient‑1.3.13‑cp37‑cp37m‑win_amd64.whl" and it will download to your machine. Then perform a manual install of the package via pip
:
(venv) > pip install C:\Users\someuser\Downloads\mysqlclient-1.3.13-cp37-cp37m-win_amd64.whl
Processing c:\users\someuser\downloads\mysqlclient-1.3.13-cp37-cp37m-win_amd64.whl
Installing collected packages: mysqlclient
Successfully installed mysqlclient-1.3.13
After the mysqlclient
package is installed manually run pip install
using requirements.txt
to ensure that the remaining required packages are installed.
(venv) > pip install -r requirements.txt
This python script is designed to process MySQL scripts. The run_mysql_script.py
script requires a valid database connection provided via local *.yaml config file. After opening a connection and creating a cursor, the script creates a list of SQL statements after splitting the SQL script on each semi-colon encountered (;). The script then loops through the statements, attempting to execute each. If successful, the script commits the changes, closes the cursor and then closes the connection. Otherwise, it rolls back the transaction and reports the error encountered.
Create a .yaml configuration file. The run_mysql_script.py
script reads this file in order to retrieve the database connection settings. Add the following database connection settings to your .yaml file. Make sure you set the user
and passwd
variables to the correct values.
mysql:
host: localhost
port: [yer port number, typically 3306]
user: [yer MySQL user]
passwd: [yer MySQL user password]
db: unesco_heritage_sites
local_infile: True
The run_mysql_script.py
features the following arguments:
- -h, --help (show this help message and exit)
- -c, --config (path to config file)
- -p, --path (path to script)
Run run_mysql_script.py
as follows, tailoring the *.yaml and *.sql file paths as necessary:
(venv) $ python3 run_mysql_script.py -c ./path/to/config/file/*.yaml -p ./path/to/sql/script/*.sql
(venv) > python run_mysql_script.py -c ./path/to/config/file/*.yaml -p ./path/to/sql/script/*.sql
Run inspect_un_data_sets.py
to "inspect" two UN data sets included in the project /input
directory:
- un_area_country_codes-m49.csv
- unesco_heritage_sites.csv
The script utilizes the Pandas library to peruse the data sets,
generating a set of column-based *.csv files that contain distinct column values (duplicate
values and NaN values are filtered out) sorted in ascending order. The files are stored in the
project /output
directory.
(venv) $ python3 inspect_un_data_sets.py
INFO: Source file read /absolute/path/to/input/un_area_country_codes-m49.csv
INFO: UNSD M49 regions written to file /absolute/path/to/output/unsd_region.csv
INFO: UNSD M49 sub-regions written to file /absolute/path/to/output/unsd_sub_region.csv
INFO: UNSD M49 intermediate regions written to file /absolute/path/to/output/unsd_intermed_region.csv
INFO: UNSD M49 countries and areas written to file /absolute/path/to/output/unsd_country_area.csv
INFO: UNSD M49 development status written to file /absolute/path/to/output/unsd_dev_status.csv
INFO: Source file read /absolute/path/to/input/unesco_heritage_sites.csv
INFO: UNESCO heritage site countries/areas written to file /absolute/path/to/output/unesco_heritage_site_country_area.csv
INFO: UNESCO heritage site categories written to file /absolute/path/to/output/unesco_heritage_site_category.csv
INFO: UNESCO heritage site regions written to file /absolute/path/to/output/unesco_heritage_site_region.csv
INFO: UNESCO heritage site transboundary values written to file /absolute/path/to/output/unesco_heritage_site_transboundary.csv
(venv) > python inspect_un_data_sets.py
INFO: Source file read C:\path\to\input\un_area_country_codes-m49.csv
INFO: UNSD M49 regions written to file C:\path\to\output\unsd_region.csv
INFO: UNSD M49 sub-regions written to file C:\path\to\output\unsd_sub_region.csv
INFO: UNSD M49 intermediate regions written to file C:\path\to\output\unsd_intermed_region.csv
INFO: UNSD M49 countries and areas written to file C:\path\to\output\unsd_country_area.csv
INFO: UNSD M49 development status written to file C:\path\to\output\unsd_dev_status.csv
INFO: Source file read C:\path\to\input\unesco_heritage_sites.csv
INFO: UNESCO heritage site countries/areas written to file C:\path\to\output\unesco_heritage_site_country_area.csv
INFO: UNESCO heritage site categories written to file C:\path\to\output\unesco_heritage_site_category.csv
INFO: UNESCO heritage site regions written to file C:\path\to\output\unesco_heritage_site_region.csv
INFO: UNESCO heritage site transboundary values written to file C:\path\to\output\unesco_heritage_site_transboundary.csv