Giter Site home page Giter Site logo

omarkhaled646 / cairo-address-verification Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 10.93 MB

an API used to detect whether the given adderss is in Cairo or not with help of NLP model.

License: MIT License

Dockerfile 0.03% Python 1.31% Jupyter Notebook 98.65%

cairo-address-verification's Introduction

Cairo Address Verification

Overview

Cairo Address Verification is an HTTP API that takes an address name and checks if it exists in Cairo or not using an LSTM Deep Learning Model.

Building the Docker Image

  • Open Docker Desktop on your computer
  • Open your CMD then navigate to the location of the project
  • Use the following Docker command:
docker build -t (your-image-name) .
  • Or use the following command:
docker-compose build
  • Then, use the following command to run it:
docker-compose up

Dataset

  • The Dataset is collected from the combination Cairo.txt that contains districts that are in Cairo and data from dowwer.com using web scraping.

  • You can find both the Cairo.txt file and the generated dataset in the datasets directory

  • It consists of two columns the district name and the label (1 if in Cairo else 0) and contains 413 districts from all Egypt governorates

  • You can open it in proper format in Excel by the following steps:

    1- Open Excel and from the top bar navigate to data

    annotely_image

    2- Click on From Text/CSV

    annotely_image (1)

    3- Choose the dataset with the name district_data.csv in the datasets directory from its location on your computer and click Import

    4- Click on Load

    annotely_image (2)

  • Here is a sample of the dataset that should appear to you

    Book1 - Excel 1_10_2024 2_23_20 PM

Test Script

  • After running the command:

    docker-compose up

  • You can use the test_script.py file which is in the scripts directory

  • Open the project folder using any IDE

  • Then, navigate to the file I referred to earlier in scripts/test_script.py

  • Type your address to test here (Remember to write it in Arabic):

    annotely_image (3)

  • Open the terminal in your IDE and Run it using the following command:

    python scripts/test_script.py

Project Structure

app:
  app.py: code for HTTP API
  requirements.txt: contains all libraries needed to run app.py
core model architecture:
  nlp_task.ipynb: notebook contain all the model generation stages
  onnx_model_creation.py: convert the saved model to ONNX model
datasets:
  Cairo.txt: contains district names in Cairo
  dataset_sources.txt: contains link of the dataset and the word embedding model
  district_data.csv: contains district names and labels
  word_embeddings.csv: contains each word embedding vector
  word_embeddings_for_ordinary_classifiers.csv: contains each word embedding vector and label
extra models:
  birnn_model.h5: bidirectional RNN model
  gru_model.h5: GRU model
model:
  lstm_model.h5: LSTM model
  model.onnx: ONNX model that is generated from the LSTM model and used for prediction in API
  word_index.json: JSON file that maps each word to its corresponding indx
scripts:
  test_script.py: script used to test the HTTP API
  verfiy_model_results.py: script used to verfiy the model results
  verfiy_results_requirements.txt: text file contains all required libraries that need to be installed for verfiy script
  data: contains X, y data as .npy to be used in the verification script
Dockerfile: used to build the Docker Image
docker-compose.yml: used to run our Docker Image

Results

metrics:

Model Training Precision Training Recall Testing Precision Testing Recall ROC Score
LSTM 0.72 0.9 0.53 0.62 0.93
GRU 0.71 0.87 0.5 0.69 0.91
biRNN 0.85 0.87 0.5 0.62 0.93
  • Note that the represented numbers are approximated to 2 decimal points

Graph of ROC and Precision vs Recall Comparison

image

Show difference in time between .h5 model and .onnx model

annotely_image (2) annotely_image (3)
  • Note that the time decreased from 76 ms to 8 ms an optimization of approximately 90%

Testing with Postman using docker-compose.yml

Image 1 Image 2
Image 3 Image 4

Testing using the test script file

test_script_res_1 test_script_res_2

Verifying Results

  • You can verify the model results by running the verfiy_model_resutls.py using this command:

    python scripts/verfiy_model_results.py

  • Make sure to have Python3.10 and run this command to install all requirements libraries

    pip install -r scripts/verfiy_results_requirements.txt

  • You can verify the other models as well just change the path, you will find all other models in /extra models directory

  • If you have any problems please check the notebook

  • Also, if you see a different number for any metric you can check the notebook, load the model, uncomment the last two cells, and run.

  • Here is a screenshot from the notebook results:

    nlp_task ipynb - Colaboratory - Google Chrome 1_10_2024 10_04_24 PM

Notes

  • I struggled a bit with the HERE API, so due to the shortage of time I web-scraped data to work on, but given enough amount of time I would've figured out how to MAKE the Here API work
  • I tried hard-coded stemming (e.g, remove 'ال') but it doesn't make a difference and I think it makes the model perform more poorly
  • If you want to run the code locally, please install the requirements.txt file in the app directory and make sure to have Python3.10 installed
  • If you face any problem in the HTTP API run try changing the url from http://127.0.0.1:8080/verify_cairo_address to http://localhost:8080/verify_cairo_address"
  • If any image doesn't appear with high quality, please click on it to appear in new tab.

cairo-address-verification's People

Contributors

omarkhaled646 avatar

Stargazers

 avatar Omar AbdulRahman avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.