Giter Site home page Giter Site logo

csikasote / spoken-language-identification Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nipunmanral/spoken-language-identification

0.0 0.0 0.0 13 KB

Implement a GRU/LSTM model using Keras, and train it to classify the languages using MFCC features

Python 100.00%

spoken-language-identification's Introduction

Spoken Language Identification

Objective

Spoken Language Identification (LID) is broadly defined as recognizing the language of a given speech utterance. It has numerous applications in automated language and speech recognition, multilingual machine translations, speech-to-speech translations, and emergency call routing. In this project, we will try to classify three languages (English, Hindi and Mandarin) from the spoken utterances that have been crowd-sourced. We will implement a GRU/LSTM model, and train it to classify the languages using Keras. We will use MFCC features as they are widely employed in various speech processing applications including LID.

Environment Setup

Download the codebase and open up a terminal in the root directory. Make sure python 3.6 is installed in the current environment. Then execute

pip install -r requirements.txt

This should install all the necessary packages for the code to run.

Dataset

The dataset has a bunch of wav files and a json file containing labels. The wav file names are anonymized, and class labels are provided as integers. Training is done with the provided integer class labels. The following mapping is used to convert language IDs to integer labels: mapping = dict{’english ’: 0, ’hindi ’: 1, ’mandarin’: 2}

I have not uploaded the audio files here due to a size constraint. The train_files.json file is used to map the audio files to the language spoken in it.

Sample length

The full audio files are ∼ 10 minutes long which might be too long to train an RNN. Multiple 10 seconds samples are created from every utterance and the same label as the original utterance are assigned to them. The choice of sequence length can be changed to experiment with samples of different length.

Audio Format

The wav files have 16KHz sampling rate, single channel, and 16-bit Signed Integer PCM encoding.

Notes about the code

The code has been divided into 6 blocks. Kindly refer to the following notes to comment/uncomment the blocks as needed

  • The code in Block 1 is used to extract the mfcc features provided and write them into a dataset “mfcc_dataset.hdf5”. This part of the code can be commented out if the hdf5 file already exists.

  • The code in Block 2 is used to read the “mfcc_dataset.hdf5” dataset. Do not comment it out.

  • The code in Block 3 is used to train the model. Comment it out after the model has been trained and saved by the name “sld.hdf5”

  • The code in Block 4 sets up the inference mode.

  • The code in Block 5 runs the streaming model in inference mode by predicting the label for a single random sequence from the validation dataset.

  • The code in Block 6 runs the streaming model in inference mode by predicting the the labels for all the sequences in the validation dataset. Comment this out since it can take a long time to run.

spoken-language-identification's People

Contributors

nipunmanral avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.