Cairo Address Verification is an HTTP API that takes an address name and checks if it exists in Cairo or not using an LSTM Deep Learning Model.
- Open Docker Desktop on your computer
- Open your CMD then navigate to the location of the project
- Use the following Docker command:
docker build -t (your-image-name) .
- Or use the following command:
docker-compose build
- Then, use the following command to run it:
docker-compose up
-
The Dataset is collected from the combination Cairo.txt that contains districts that are in Cairo and data from dowwer.com using web scraping.
-
You can find both the Cairo.txt file and the generated dataset in the datasets directory
-
It consists of two columns the district name and the label (1 if in Cairo else 0) and contains 413 districts from all Egypt governorates
-
You can open it in proper format in Excel by the following steps:
1- Open Excel and from the top bar navigate to data
2- Click on From Text/CSV
3- Choose the dataset with the name district_data.csv in the datasets directory from its location on your computer and click Import
4- Click on Load
-
Here is a sample of the dataset that should appear to you
-
After running the command:
docker-compose up
-
You can use the test_script.py file which is in the scripts directory
-
Open the project folder using any IDE
-
Then, navigate to the file I referred to earlier in scripts/test_script.py
-
Type your address to test here (Remember to write it in Arabic):
-
Open the terminal in your IDE and Run it using the following command:
python scripts/test_script.py
app:
app.py:
code for HTTP API
requirements.txt:
contains all libraries needed to run app.py
core model architecture:
nlp_task.ipynb:
notebook contain all the model generation stages
onnx_model_creation.py:
convert the saved model to ONNX model
datasets:
Cairo.txt:
contains district names in Cairo
dataset_sources.txt:
contains link of the dataset and the word embedding model
district_data.csv:
contains district names and labels
word_embeddings.csv:
contains each word embedding vector
word_embeddings_for_ordinary_classifiers.csv:
contains each word embedding vector and label
extra models:
birnn_model.h5:
bidirectional RNN model
gru_model.h5:
GRU model
model:
lstm_model.h5:
LSTM model
model.onnx:
ONNX model that is generated from the LSTM model and used for prediction in API
word_index.json:
JSON file that maps each word to its corresponding indx
scripts:
test_script.py:
script used to test the HTTP API
verfiy_model_results.py:
script used to verfiy the model results
verfiy_results_requirements.txt:
text file contains all required libraries that need to be installed for verfiy script
data:
contains X, y data as .npy to be used in the verification script
Dockerfile:
used to build the Docker Image
docker-compose.yml:
used to run our Docker Image
Model | Training Precision | Training Recall | Testing Precision | Testing Recall | ROC Score |
---|---|---|---|---|---|
LSTM | 0.72 | 0.9 | 0.53 | 0.62 | 0.93 |
GRU | 0.71 | 0.87 | 0.5 | 0.69 | 0.91 |
biRNN | 0.85 | 0.87 | 0.5 | 0.62 | 0.93 |
- Note that the represented numbers are approximated to 2 decimal points
- Note that the time decreased from 76 ms to 8 ms an optimization of approximately 90%
-
You can verify the model results by running the verfiy_model_resutls.py using this command:
python scripts/verfiy_model_results.py
-
Make sure to have Python3.10 and run this command to install all requirements libraries
pip install -r scripts/verfiy_results_requirements.txt
-
You can verify the other models as well just change the path, you will find all other models in /extra models directory
-
If you have any problems please check the notebook
-
Also, if you see a different number for any metric you can check the notebook, load the model, uncomment the last two cells, and run.
-
Here is a screenshot from the notebook results:
- I struggled a bit with the HERE API, so due to the shortage of time I web-scraped data to work on, but given enough amount of time I would've figured out how to MAKE the Here API work
- I tried hard-coded stemming (e.g, remove 'ال') but it doesn't make a difference and I think it makes the model perform more poorly
- If you want to run the code locally, please install the requirements.txt file in the app directory and make sure to have Python3.10 installed
- If you face any problem in the HTTP API run try changing the url from
http://127.0.0.1:8080/verify_cairo_address
tohttp://localhost:8080/verify_cairo_address"
- If any image doesn't appear with high quality, please click on it to appear in new tab.