Cairo Address Verification

Overview

Cairo Address Verification is an HTTP API that takes an address name and checks if it exists in Cairo or not using an LSTM Deep Learning Model.

Building the Docker Image

Open Docker Desktop on your computer
Open your CMD then navigate to the location of the project
Use the following Docker command:

docker build -t (your-image-name) .

Or use the following command:

docker-compose build

Then, use the following command to run it:

docker-compose up

Dataset

The Dataset is collected from the combination Cairo.txt that contains districts that are in Cairo and data from dowwer.com using web scraping.
You can find both the Cairo.txt file and the generated dataset in the datasets directory
It consists of two columns the district name and the label (1 if in Cairo else 0) and contains 413 districts from all Egypt governorates
You can open it in proper format in Excel by the following steps:

1- Open Excel and from the top bar navigate to data

2- Click on From Text/CSV

3- Choose the dataset with the name district_data.csv in the datasets directory from its location on your computer and click Import

4- Click on Load
Here is a sample of the dataset that should appear to you

Test Script

After running the command:
```
docker-compose up
```
You can use the test_script.py file which is in the scripts directory
Open the project folder using any IDE
Then, navigate to the file I referred to earlier in scripts/test_script.py
Type your address to test here (Remember to write it in Arabic):
Open the terminal in your IDE and Run it using the following command:
```
python scripts/test_script.py
```

Project Structure

app:
app.py: code for HTTP API
requirements.txt: contains all libraries needed to run app.py
core model architecture:
nlp_task.ipynb: notebook contain all the model generation stages
onnx_model_creation.py: convert the saved model to ONNX model
datasets:
Cairo.txt: contains district names in Cairo
dataset_sources.txt: contains link of the dataset and the word embedding model
district_data.csv: contains district names and labels
word_embeddings.csv: contains each word embedding vector
word_embeddings_for_ordinary_classifiers.csv: contains each word embedding vector and label
extra models:
birnn_model.h5: bidirectional RNN model
gru_model.h5: GRU model
model:
lstm_model.h5: LSTM model
model.onnx: ONNX model that is generated from the LSTM model and used for prediction in API
word_index.json: JSON file that maps each word to its corresponding indx
scripts:
test_script.py: script used to test the HTTP API
verfiy_model_results.py: script used to verfiy the model results
verfiy_results_requirements.txt: text file contains all required libraries that need to be installed for verfiy script
data: contains X, y data as .npy to be used in the verification script
Dockerfile: used to build the Docker Image
docker-compose.yml: used to run our Docker Image

Results

metrics:

Model	Training Precision	Training Recall	Testing Precision	Testing Recall	ROC Score
LSTM	0.72	0.9	0.53	0.62	0.93
GRU	0.71	0.87	0.5	0.69	0.91
biRNN	0.85	0.87	0.5	0.62	0.93

Note that the represented numbers are approximated to 2 decimal points

Graph of ROC and Precision vs Recall Comparison

Show difference in time between .h5 model and .onnx model

Note that the time decreased from 76 ms to 8 ms an optimization of approximately 90%

Testing with Postman using docker-compose.yml

Testing using the test script file

Verifying Results

You can verify the model results by running the verfiy_model_resutls.py using this command:
```
python scripts/verfiy_model_results.py
```
Make sure to have Python3.10 and run this command to install all requirements libraries
```
pip install -r scripts/verfiy_results_requirements.txt
```
You can verify the other models as well just change the path, you will find all other models in /extra models directory
If you have any problems please check the notebook
Also, if you see a different number for any metric you can check the notebook, load the model, uncomment the last two cells, and run.
Here is a screenshot from the notebook results:

Notes

I struggled a bit with the HERE API, so due to the shortage of time I web-scraped data to work on, but given enough amount of time I would've figured out how to MAKE the Here API work
I tried hard-coded stemming (e.g, remove 'ال') but it doesn't make a difference and I think it makes the model perform more poorly
If you want to run the code locally, please install the requirements.txt file in the app directory and make sure to have Python3.10 installed
If you face any problem in the HTTP API run try changing the url from http://127.0.0.1:8080/verify_cairo_address to http://localhost:8080/verify_cairo_address"
If any image doesn't appear with high quality, please click on it to appear in new tab.

omarkhaled646 / cairo-address-verification Goto Github PK