Giter Site home page Giter Site logo

moodmeter's Introduction

Emotion Detection from Audio Files

Overview

This project is designed to detect emotions from audio files in .wav format. The project leverages machine learning techniques to analyze speech and predict the emotion conveyed. The system is built using Python for the backend processing and TensorFlow/Keras for the machine learning model. The frontend is a simple web interface built with Express.js and EJS to upload audio files and display the predicted emotion.

Dataset

The dataset used for this project is the RAVDESS dataset.It has a total of 1440 audio files recorded by 24 actors(12 male and 12 female).Each of the 1440 files has a unique filename. The filename consists of a 7-part numerical identifier (e.g., 03-01-06-01-02-01-12.wav). Filename identifiers:

  • Modality (01 = full-AV, 02 = video-only, 03 = audio-only).
  • Vocal channel (01 = speech, 02 = song).
  • Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).
  • Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the 'neutral' emotion.
  • Statement (01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door").
  • Repetition (01 = 1st repetition, 02 = 2nd repetition).
  • Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).

Project Structure

  • MoodMeter.ipynb: A Jupyter Notebook that preprocesses the audio data, trains a Multi-Layer Perceptron (MLP) classifier, and saves the trained model.
  • predict.py: A Python script that loads the trained model and predicts the emotion from a given audio file.
  • server.js: An Express.js server that handles file uploads, runs the prediction script, and returns the result to the client.
  • index.ejs: An EJS template that serves as the frontend, allowing users to upload audio files and see the predicted emotion.
  • style.css: A CSS file that adds to the visual appearance of the website.

Features

  • Emotion Detection: Detects emotions such as calm, happy, fearful, and disgust from audio files.
  • Preprocessing: Cleans and preprocesses audio data using librosa and other libraries.
  • Model Training: Trains an MLP classifier using extracted audio features like MFCC, chroma, and mel-spectrogram.
  • Web Interface: Allows users to upload .wav files and view the predicted emotion through a simple web interface.

Dependencies

The project requires the following Python libraries:

  • numpy
  • pandas
  • librosa
  • matplotlib
  • tensorflow
  • keras
  • scipy
  • soundfile
  • tqdm
  • scikit-learn
  • python-speech-features
  • pydub
  • noisereduce
  • pickle
  • express
  • express-fileupload
  • ejs
  • child_process

Usage

1. Training the Model

Run the MoodMeter.ipynb notebook to preprocess the audio data, train the MLP classifier, and save the model as Emotion_Voice_Detection_Model.pkl.

2. Running the Server

Start the server using:

node server.js

3. Making Predictions

You can also directly predict the emotion of an audio file by running the predict.py script: (The audio file is required to be atmost 1 second long.(85KB))

python predict.py path_to_audio_file.wav

4. Frontend Interface

The index.ejs file provides a simple interface for uploading audio files and displaying the prediction result.

Observed Emotions

The project currently observes the following emotions:

  • Calm
  • Happy
  • Fearful
  • Disgust

You can modify the observed_emotions dictionary in the code to change or add more emotions.

Challenges Faced

  • Noise and Distortions: Handling background noise and distortions in audio files required advanced preprocessing techniques like noise reduction and filtering.

  • Dimensionality and Complexity: Extracting meaningful features from audio, such as MFCC, chroma, and mel-spectrograms, involved managing high-dimensional data and ensuring computational efficiency.

  • Overfitting: Ensuring that the model generalizes well to unseen data without overfitting on the training set required careful tuning of hyperparameters and regularization techniques.

  • File Handling: Managing file uploads and ensuring correct file paths for processing in the web application required careful handling.

  • Emotion Classification: Achieving high accuracy in classifying subtle emotional differences in speech can be challenging, given the variability in human speech patterns and the complexity of emotions.

Future Scope and Improvements

  • Deep Learning Models: Explore advanced deep learning architectures like Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) for more effective feature learning and classification.

  • Additional Emotions: Expand the dataset to include a wider range of emotions or more nuanced emotional states for a more comprehensive model.

  • Cross-Language and Cross-Cultural Data: Incorporate data from different languages and cultures to make the model more versatile and applicable globally.

  • Latency Reduction: Optimize the system for real-time processing to enable immediate emotion detection in live applications.

  • Video, Text, and Image Analysis: Extend the system to include analysis of video, text, and images alongside audio for a more holistic approach to emotion detection and understanding.

moodmeter's People

Contributors

subikshaa22 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.