esp32_kws's Introduction

ESP32 Keyword Spotting

YouTube EN | RU

~$ git clone --recursive --depth 1 https://github.com/42io/esp32_kws.git
~$ cd esp32_kws

BUILD ENV

~$ docker build --no-cache -t idf-3.3.4 - < Dockerfile
~$ docker run --rm -it -v $PWD:/home/src --device=/dev/ttyUSB0 idf-3.3.4

KEYWORD SPOTTING

Default models are pre-trained on 0-9 words: zero one two three four five six seven eight nine.

MFCC

Simple non-streaming neural network mode. Model receives the whole mfcc input sequence and then returns the classification result. Jupyter:

~$ make -C mfcc-nn defconfig size erase_flash flash monitor

MFCC STREAMING

Streaming neural network mode. Model receives portion of the input sequence and classifies it incrementally. Jupyter:

~$ make -C mfcc-nn-streaming defconfig size erase_flash flash monitor

TRANSFER LEARNING

Transfer learning is usually done for tasks where your dataset has too little data to train a full-scale model from scratch. TL consists of taking features learned on one problem, and leveraging them on a new, similar problem. For instance, features from a model that has learned to identify english speech may be useful to kick-start a model meant to identify russian.

Embedding | Synthesize | Transfer

~$ make -C mfcc-nn-streaming menuconfig # kws => ru
~$ make -C mfcc-nn-streaming size erase_flash flash monitor

MFCC STREAMING DOUBLE

A false positive error, or false positive, is a result that indicates a given condition exists when it does not. This example consists of two neural networks: first is for classification, second for confirmation. Confirmation allows to reduce false positives. Musan | Libri:

~$ make -C 2-mfcc-nn-streaming defconfig size erase_flash flash monitor

esp32_kws's People

Contributors

Stargazers

Watchers

esp32_kws's Issues

ESP32-S3

Just wondering if you know or are able to optimise for esp32?
The FFT routines on the S3 are just shy of x10 perf of the normal esp32 due to vector math.
https://docs.espressif.com/projects/esp-dsp/en/latest/esp-dsp-benchmarks.html
Supposedly ML is 4-6x but still trying to work out if tflite-micro has been optimised.
Also xtensa do a LSTM layer which is very interesting for streaming models but again confused if implemented for S3

Just thought I would ask if you knew any more with the S3 being so new.

Questions no issue

You have employed MFCC which is great as https://github.com/StuartIanNaylor/simple_audio_tensorflow just tests seems to add 3/4% accuracy via MFCC alone.

If I take a look at https://github.com/ARM-software/ML-KWS-for-MCU or generally the bewildering work of network architecture what is a DCNN? I should of looked at the code maybe but thought I would ask DFT->CNN?
I will have to read https://arxiv.org/pdf/2005.06720.pdf several times as it seeps in.

How does it compare to a CRNN with noise (SNR) as that is problematic for all but from what I have read CRNN seem to cope quite well with higher noise levels.

As said I will try reading that arxiv.org paper again but a question I keep asking as ESP32 is that do you think it would be possible to run 2x instance of the KWS? (Simple unidirectional mics and using best confidence (softmax) to forward an audio stream from the best mic)
Even VAD can be server side with just MQTT to subscribe to a 'end of sentence/ stop streaming' for the ASR sentence after KW.
With MFCC, MQTT & KWS is there any chance on a ESP32 it could run 2 instances on separate mics?

Also I always get confused about the Google Command set as there seems many bad samples of just simple bad padding, trims or null audio.
Is it just a datum or the bad contents is also a reflection on how an architecture can cope as why have google never trimmed it out or is it just to confuse the likes such as me?

Recommend Projects

42io / esp32_kws Goto Github PK

esp32_kws's Introduction

ESP32 Keyword Spotting

BUILD ENV

KEYWORD SPOTTING

MFCC

MFCC STREAMING

TRANSFER LEARNING

MFCC STREAMING DOUBLE

esp32_kws's People

Contributors

Stargazers

Watchers

Forkers

esp32_kws's Issues

ESP32-S3

Questions no issue

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent