Giter Site home page Giter Site logo

shreeshan / alcoaudio Goto Github PK

View Code? Open in Web Editor NEW
13.0 3.0 4.0 9.21 MB

Detect alcohol induced intoxication level from a voice sample

License: MIT License

Python 99.53% Shell 0.47%
intoxication-detection convolutional-neural-networks filter-banks mfcc-features audio-classification

alcoaudio's Introduction

AlcoAudio Research

Detection of Alcohol induced intoxication through voice using Neural Networks

Table of Contents

Dataset

Alcohol Language Corpus is a curation of audio samples from 162 speakers. Audio samples are first recorded when speaker is sober. Then the speakers are given a chosen amount of alcohol to reach a particular intoxication state, and audio samples are recorded again.

Audio samples are split into 8 seconds each. Below is the plot of a raw signal

Raw Signal

These raw audio signals are converted into Mel filters using librosa. Below is how it looks:

FBank

Architectures

Below are the architectures tried. All the files are under networks folder.

Networks Log Loss UAR(Unweighted Average Recall)
Convolutional Neural Networks(convnet) 0.89 66.28
LSTM(lstm) 1.59 58.12
Conv LSTMs(crnn) 1.17 62.27
One class Neural Networks(ocnn) 1.81 55
Conv Auto Encoders(cae) 0.92 65.53

Setup

  1. Download and run the requirements.txt to install all the dependencies.

    pip install -r requirements.txt
    
  2. Create a config file of your own

  3. Install OpenSmile and set environment variable OPENSMILE_CONFIG_DIR to point to the config directory of OpenSmile installation.

Usage

Data generation

Run data_processor.py to generate data required for training the model. It reads the raw audio samples, splits into n seconds and generates Mel filters, also called as Filter Banks (fbank paramater in config file. Other available audio features are mfcc & gaf)

python3 data_processor.py --config_file <config_filepath>

Training the network

Using main.py one can train all the architectures mentioned in the above section.

python3 main.py --config_file <config_filepath> --network convnet

Inference

One can use our model for inference. The best model is being saved under best_model folder

python3 main.py --config_file --test_net True <config_filepath> --network convnet --datapath <data filepath>

Remember to generate mel filters from raw audio data and use the generated .npy file for datapath parameter

Future work: TODO

Improve on Data Representations

  • Work on frequency variance in voice
  • Recurrence plots
  • Extract features using Praat and Opensmile
  • Normalise audio sample based on average amplitude

Try new architectures

  • Conditional Variational AutoEncoder
  • Convolutional One class Neural Network

Known Issues

  1. As training progresses, test and valid log losses increase. The confidence with which the network miss predicts increase. The below graph depicts this behaviour CF plot

  2. Mel filters or MFCC are not the best representation for this use case as these representations fail to capture variance in the amplitudes rather just try to mimic human voice. Data 2d plot

Acknowledgements

Our team would like to thank Professor Emmanuel O. Agu for guiding the team throughout. I would like to thank team members Pratik, Arjun and Mitesh for all their contributions.

alcoaudio's People

Contributors

shreeshan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

alcoaudio's Issues

working?

Hi, I am wondering if this still works?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.