Table of contents generated with markdown-toc
Multinational Retail Data contains data of the following:
- card details.
- date times.
- users.
- products.
- store details.
The orders table is the single source of truth in the star schema, connecting to the tables above.
In this project, the full ETL is performed.
- https://regexr.com/
- https://www.asciitable.com/
- place-api-key-in-headers-or-url
- To access data that requires an API Key, run:
curl -H "X-API-KEY: <Example API Key 6fa741de1bdd1d91830ba>" <Example End Point
https://api.mydomain.com/v1/users>
- To access data that requires an API Key, run:
- making-python-imports-more-structured
- standard libraries alphabetical with 'imports' before 'from' like this:
import csv
import logging
from collections import defaultdict
from datetime import date
#(blank line) - third party packages alphabetical with imports first:
import numpy as np
import pandas as pd
from statsmodels import api as sm
#(blank line again) - your own stuff (also alphabetical):
from my_other_folder import my_other_file
- standard libraries alphabetical with 'imports' before 'from' like this:
- postgres-change-column-from-type-text-to-uuid
- LTRIM, RTRIM, BTRIM trim functions
- INSERT INTO + CASE STATEMENT
- INSERT INTO / UPDATE + CASE STATEMENT
- Use double quotation marks when the column name is:.
- All uppercase
SELECT "EAN" FROM dim_products;
- Complex
SELECT example_col AS "percentage(%)"
- All uppercase
- alter-column-text-to-boolean
- Alter empty string to date type
- Add primary key
- Add foreign key
- OVER () function
- Cast as timestamp
- AGE() function to calculate the difference of timestamps
- Set up the environment.
- Extract and clean the data from the data sources.
- Create the database schema.
- Querying the data.
- Python3
- Pandas
- Run
pip install pandas
- Run
- boto3
- Run
pip install boto3
- Run
- Create
db_creds.yaml
in the parent directory to have credentials as per theDatabaseConnector
class.
- Run
git clone https://github.com/pchan2/multinational-retail-data-centralisation.git
- Run
cd multinational-retail-data-centralisation
if the present directory is not already multinational-retail-data-centralisation. - Run
sudo --login --user=postgres
and enter<password>
to connect to Postgres in the terminal. - Run
psql
. - Run
\l
to list databases (optional). - Run
\c sales_data
to connect to the sales_data database. - Run
\dt
to list tables of the database. - Run
d orders_table
to see the orders_table table description.
multinational_retail_data_centralisation
│
├── checks
│ ├── card_details.ipynb
│ ├── date_times.ipynb
│ ├── legacy_store.ipynb
│ ├── legacy_users.ipynb
│ ├── orders_table.ipynb
│ ├── products.ipynb
│ └── store_details.ipynb
│
├── ETL
│ ├── card_details.ipynb
│ ├── date_times.ipynb
│ ├── legacy_users.ipynb
│ ├── orders_table.ipynb
│ ├── products.ipynb
│ └── store_details.ipynb
│
├── notes
│ └── steps_to_clean_data.ipynb
│
├── queries_to_db
│ ├── 1_total_no_stores.sql
│ ├── 2_locations_with_most_stores.sql
│ ├── 3_months_with_highest_sales.sql
│ ├── 4_online_vs_offline_sales.sql
│ ├── 5_sales_percentage_by_store_type.sql
│ ├── 6_month_in_year_with_highest_sales.sql
│ ├── 7_staff_headcount.sql
│ ├── 8_german_store_types_with_most_sales.sql
│ └── 9_sales_lead_time.sql
│
├── sales_data_db_sql
│ ├── 1_orders_table.sql
│ ├── 2_dim_users.sql
│ ├── 3_dim_store.sql
│ ├── 4_dim_products.sql
│ ├── 5_dim_products.sql
│ ├── 6_dim_date_times.sql
│ ├── 7_dim_cards_details.sql
│ ├── 8_add_pk_to_dim_tables.sql
│ └── 9_add_fk_to_orders_table.sql
│
├── data_cleaning.py
├── data_extraction.py
├── data_transforms.py
├── database_utils.py
└── README.md
- The checks directory is for preliminary checks on individual methods and notes.
- The ETL directory is the refactored version of checks.
- The sales_data_db_sql directory has SQL query statements.
- The notes directory has study notes on data cleaning.
The MIT License (MIT)
Copyright (c) 2023 Patrick Chan
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.