Giter Site home page Giter Site logo

grintaking19 / meshakkelaty.ai Goto Github PK

View Code? Open in Web Editor NEW

This project forked from omar-al-sharif/meshakkelaty.ai

0.0 0.0 0.0 45.76 MB

A neural and statistical engine for accurately adding diacritics (Tashkeel) to Arabic text. First-place winner on Kaggle ๐Ÿฅ‡

Python 0.07% Jupyter Notebook 99.93%

meshakkelaty.ai's Introduction

AI Arabic Diacritization Engine - ู…ูุดูŽูƒูู‘ู„ุงุชูŠ.ai

Meshakkelaty

Overview

Welcome to ู…ูุดูŽูƒูู‘ู„ุงุชูŠ.ai ... An innovative Arabic text diacritization (Tashkeel) engine developed using advanced neural and statistical techniques. This project aims to accurately predict and add diacritics to Arabic text, enhancing readability and understanding. The ู…ูุดูŽูƒูู‘ู„ุงุชูŠ.ai model achieved first-place on Kaggle, showcasing its exceptional performance ๐Ÿฅ‡

Dual- Model Architecture

The ู…ูุดูŽูƒูู‘ู„ุงุชูŠ.ai diacritization system employs a dual-model architecture that consists of:

  1. A Neural Bidirectional Stacked Long Short-Term Memory (BiLSTM) model - that captures sequential dependencies and context information within the Arabic text - inspired by this research paper, but on steroids!
  2. A Statistical Post-Processing model that operates on the output generated by the neural model to further refine the diacritization results, inspired by this research paper
Meshakkelaty-Promo.mp4

Usage

To use ู…ูุดูŽูƒูู‘ู„ุงุชูŠ.ai, follow these steps:

  1. Clone the repository
    • git clone https://github.com/Omar-Al-Sharif/Meshakkelaty.ai.git
  2. Install the necessary dependencies:
    • pip install -r Meshakkelaty.ai/requirements.txt
  3. Acquire your data and place them in data directory under the names train.txt and val.txt
  4. Change the directory to scripts directory:
    • cd Meshakkelaty.ai/scripts
  5. Prepare your data by running the following command
    • python tokenize_dataset.py
  6. Train the neural model on your data
    • python train_neural_model.py
  7. Train the statistical model on your data
    • python train_statistical_model.py
  8. Put your input text inside:
    • ../data/test_input.txt
  9. Diacritize the input text by running:
    • python predict.py

Contributors

meshakkelaty.ai's People

Contributors

omar-al-sharif avatar youssefhassan01 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.