Giter Site home page Giter Site logo

patuli-ml's Introduction

Preview

Patuli Bisindo Sign Language Model

This repository contains the code and resources for building a real-time Indonesian sign language (Bisindo) object detection model specifically designed for a mobile app. The model utilizes transfer learning techniques with SSD MobileNet V2 FPNLite architecture to accurately detect and localize sign language gestures in real-time video streams.


Description

This real-time Bisindo sign language object detection model is built using transfer learning, leveraging the pre-trained weights of SSD MobileNet V2 FPNLite 320x320 on a large-scale image recognition dataset. By fine-tuning the model on a custom dataset of sign language images, it has been trained to recognize and classify sign language gestures in real-time.

More information for the model: here

The model consists of three seperate model, each trained to detect specific categories:

  • Abjad (Alphabet): Detects and classifies various alphabet signs (26 classes)
  • Angka (Number): Detects and classifies various number signs. (11 classes)
  • Kata (Word): Detects and classifies various word signs. (23 classes)

The trained models have been converted into optimized .tflite format for efficient deployment on mobile devices. The model is also quantized to reduce the model size while maintaining accuracy.

Model Performance

We have trained the models three time with each version having improved dataset and different training parameters. The following show the links for the Tensorboard visualization of the training process for each model:

  • Model V1

    V1 is our experimental model, trained with varying steps and still not optimized datasets.

  • Model V2 (We use this in the application)

    V2 is our first production model, trained with optimized dataset and 40k steps of training.

  • Model V3

    V3 is our second production model, trained with the same dataset as V2 and this time with less steps of training (20k steps).

Our Dataset

The training dataset used for this project consists of a large collection of annotated sign language images. The dataset includes diverse samples of different sign language gestures, captured under various lighting conditions, backgrounds, and hand orientations. The annotations provide bounding box coordinates and corresponding labels for each sign language gesture.

Link to the dataset:

Training the Model

To train the model, you can do it locally or on Google Colab. For our case, we trained the model locally on a machine with CUDA enabled GPU in a linux environment using WSL2 (Ubuntu 20 LTS). You need to train the model in a linux environment because some of the commands used in the training process are linux specific.

To train the model locally, you can follow the steps below:

  1. Ensure that you have installed CUDA and cuDNN on your machine. You can follow the steps here to install CUDA and here to install cuDNN. We follow this guide to install CUDA on our WSL2 Ubuntu.
  2. Install the WSL extension on your VSCode.
  3. Prepare a virtual environment to train your model. You can follow the steps here to create a virtual environment.
  4. Go to your roboflow account to get your API key in the account settings and then store it in .env file. Alternatively, You can also download it manually by visiting our dataset links above and download the dataset in VOC format, then moving it to project directory into images folder. By doing this, you can skip the downloading process of the dataset in the notebook.
  5. Clone the repository, open the notebooks in its own directory or as a root directory.
  6. Follow the steps in the notebook to train the model.

To train the model on Google Colab, you will need to set the python version to 3.8.10, you can follow the steps below:

  1. Create a new notebook on Google Colab.
  2. Import the notebook from this repository.
  3. Go to your roboflow account to get your API key in the account settings and then store it in .env file. Alternatively, You can also download it manually by visiting our dataset links above and download the dataset in VOC format, then moving it to project directory into images folder. By doing this, you can skip the downloading process of the dataset in the notebook.
  4. Follow the steps in the notebook to train the model.
  5. After you have finished training the model, you can download the model from the notebook.

patuli-ml's People

Contributors

ammarsufyan avatar dadangdut33 avatar dizzyme09 avatar

Stargazers

 avatar

Watchers

Kostas Georgiou avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.