Giter Site home page Giter Site logo

speech-emotion-recognition's Introduction

Speech-Emotion-Recognition

Audio emotion classification is one of the most challenging task and to solve this we build this deep neural network project. The deep learning neural network was trained to detect emotion using the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset which was collected from kaggle.

Dataset

We have used the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) which contains 7356 files (total size: 24.8 GB). The database contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and the song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression. 

Feature Extraction

For training the model we have converted the Audio singals into various Spectograms, check out this amazing playlist for an indepth understanding.

1. SPECTOGRAM

 Spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. When applied to an audio signal, spectrograms are sometimes called sonographs, voiceprints, or voicegrams. When the data are represented in a 3D plot they may be called waterfalls.

2. MEL-SPECTOGRAM

 Mel Spectrograms are spectrograms that visualize sounds on the Mel scale as opposed to the frequency domain.

3. MEL-FREQUENCY CEPSTRUM COEFFICIENTS

 Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip (a nonlinear "spectrum-of-a-spectrum").

Model

In this project we have used the pretrained ResNet18 network for making the prediction by replacing it's first and last layers, even though it was not trained on the audio data still it produced decent results after trainging.

We used the one cycle learning rate between lower bound and upper bound during complete run. Conventionally, the learning rate is decreased as the learning starts converging with time. As the higher learning rate may help to get out of saddle points. If saddle point is elaborate plateau, the lower learning rates might not be able get gradient out of saddle point. - Paper link

Results

We started with raw Audio data and applied various techniques of Deep learning and preprocessing on it to convert it into a format which we can use for classification and then created a model which is capable of predicting the emotion in the audio file just by taking it as input with an accuracy of 76%.

speech-emotion-recognition's People

Contributors

anuranjanpandey avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.