Giter Site home page Giter Site logo

nur988 / csvc-net Goto Github PK

View Code? Open in Web Editor NEW

This project forked from space-urchin/csvc-net

0.0 0.0 0.0 577.73 MB

Bangla-English Code-Switched Voice Command Detection using Deep CNN-LSTM Networks

License: MIT License

Python 99.02% Dockerfile 0.98%

csvc-net's Introduction

CSVC-Net: Code-Switched Voice Command Classification using Deep CNN-LSTM Network

Colloquial Bengali has adopted many English words due to colonial influence. In conversational Bengali, it is quite common to speak in a mixture of English and Bengali, a phenomenon termed Code-switching (CS). To build a Voice Command Classifier in this era, when the usage of CS is ever-increasing, it is often necessary to map a single base command to its many different variants - spoken in multiple mixtures of languages. The works done with Bengali Speech have been primarily focused on single word classification and mostly incompetent to understand the complex semantic relationships displayed in sentences. This paper proposes 'CSVC-Net', a CNN-LSTM based architecture for classifying spoken commands containing code-switching between Bengali and English. To effectively reflect the scenario, it also presents a newly curated dataset named "Banglish" containing 3,840 audio files of spoken computer commands belonging to 11 classes and 64 variations in total. The proposed pipeline passes the input audio signal through a series of appropriate transformation and augmentation steps enabling the model to achieve an accuracy of 92.08% on the curated dataset. Furthermore, the robustness of the proposed model has been justified by comparing compared with different architectures and tested under different noise levels with promising accuracy, which shows the applicability of the model in real-life scenarios.

Dataset Preparation

All audio samples are converted to the โ€˜.wavโ€™ format, having a mono (single) channel, and sampled at 16kHz with a bitrate of 512kbps. The audio samples then pass through a series of feature extraction and data augmentation processes as mentioned below:

  • Extract Sound Envelope
  • Waveform Transforms
    • Pitch Shift
    • Time Stretch
    • Add Gaussian Noise
  • Extract MFCC features
  • Spectrogram Transform
    • Frequency Mask

Our Model Architecture

Our proposed architecture is a CNN-LSTM model. It consists of three main components - a CNN block, a LSTM block and Time Distributed Dense Layers. To enhance the performance of the model, we also utilized Dropout and Batch Normalization layers in different parts. A detailed diagram of our model is as follows: CSVC-Net

Run Code

Note: This was developed on Ubuntu 20.04 running Python 3.8. Incase you face any dependency issue on your local machine, run via docker to replicate our local environment.

With Docker

Step 1: Install Docker https://docs.docker.com/get-docker/

Step 2: Build Docker Image

cd Banglish
docker build -t banglish .

Step 3: Run

Train

 docker run banglish python src/train.py               

Test

 docker run banglish python src/test.py               

Without Docker

Step 1: Get Python 3.8 https://www.python.org/downloads/

Step 2: Install dependencies

cd Banglish
pip install --no-cache-dir -r requirements.txt

Step 3: Run

Train

python src/train.py               

Test

python src/test.py               

csvc-net's People

Contributors

space-urchin avatar arowayasmeen avatar nur988 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.