Giter Site home page Giter Site logo

menardai / chromecast_vad Goto Github PK

View Code? Open in Web Editor NEW
8.0 1.0 0.0 199.36 MB

RNN implementation of a voice activity detector to control Chromecast device volume.

Dockerfile 0.01% Python 0.82% Shell 0.01% Jupyter Notebook 99.16%
voice-activity-detection chromecast keras lstm-neural-network

chromecast_vad's Introduction

chromecast_vad

A Keras implementation of a RNN voice activity detector to control Chromecast device volume.

The model, a two layers bidirectional LSTM followed by a dense layer, takes a spectrogram as input and output a single value (speech or no speech).

The client app connects to the chromecast device of your choice, waits for the music to start playing on this device and starts listening on the host computer. It listens for 2 seconds and sends the recorded audio data to a Flask web app for analysis. The web app sends the received audio data to the neural network for inference. Based on the prediction made by the neural network, the client app will decrease or increase the chromecast device's volume.

Dependencies

Preprocessing

python run_preprocessing.py:
Convert various length mp3/wav files into 2 seconds wav files (audio sampled at 44100 Hz, mono channel).

Training Set

python dataset.py
Convert preprocessed audio files in samples of 2 seconds wav files, X and Y numpy arrays.

  • X is the numpy array of a spectrogram with 101 frequencies.
  • Y is a 0/1 numpy array (speech or not)

Model Training

python run_experiments.py

Running

Build the docker image of the Flask app.py web service.

This web service take a 2s audio file as input, use the RNN model to predict there is speech in the audio files and returns a true/false prediction.

docker_build_image.sh
Create a Docker image with app.py over ufoym/deepo:keras-py36-cpu (a python 3.6, Keras on CPU image).

docker_run_webapp.sh
To start the web server we just built.

python chromecast_live_volume.py
To start listening and controlling the volume of the chromecast device of your choice.

Dataset

A synthesized dataset created from merging background noise, music and speech.

  • Common Voice by Mozilla
    an open and publicly available dataset of voices that everyone can use to train speech-enabled applications.
  • QUT-NOISE-SRE Databases
    D. Dean, A. Kanagasundaram, H. Ghaemmaghami, M. Hafizur, S. Sridharan (2015) “The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition”. In Proceedings of Interspeech 2015, September, Dresden, Germany.
  • Music played by a home speaker recorded using Audacity

Live Demo

VAD live demo

Live Demo Output

$ python chromecast_live_volume.py

Looking for chromecast devices...

Connected to: Cuisine
initial volume = 0.45
no music playing
no music playing
no music playing
no music playing
> recording
  speech probability = 0.01
  steps_without_speech = 1
> recording
  speech probability = 0.01
  steps_without_speech = 2
> recording
  speech probability = 0.00
  steps_without_speech = 3
> recording
  speech probability = 0.05
  steps_without_speech = 4
> recording
  speech probability = 1.00
  *** SPEECH DETECTED ***
  set volume to 0.30
> recording
  speech probability = 1.00
  *** SPEECH DETECTED ***
> recording
  speech probability = 1.00
  *** SPEECH DETECTED ***
> recording
  speech probability = 1.00
  *** SPEECH DETECTED ***
> recording
  speech probability = 1.00
  *** SPEECH DETECTED ***
> recording
  speech probability = 0.09
  steps_without_speech = 1
> recording
  speech probability = 0.07
  steps_without_speech = 2
> recording
  speech probability = 0.03
  steps_without_speech = 3
  set volume to 0.45
> recording
  speech probability = 0.01
  steps_without_speech = 4
> recording
  speech probability = 0.01
  steps_without_speech = 5

chromecast_vad's People

Contributors

menardai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.