Giter Site home page Giter Site logo

esp32_kws's Introduction

ESP32 Keyword Spotting

YouTube EN | RU

~$ git clone --recursive --depth 1 https://github.com/42io/esp32_kws.git
~$ cd esp32_kws

BUILD ENV

~$ docker build --no-cache -t idf-3.3.4 - < Dockerfile
~$ docker run --rm -it -v $PWD:/home/src --device=/dev/ttyUSB0 idf-3.3.4

KEYWORD SPOTTING

Default models are pre-trained on 0-9 words: zero one two three four five six seven eight nine.

MFCC

Simple non-streaming neural network mode. Model receives the whole mfcc input sequence and then returns the classification result. Jupyter:

~$ make -C mfcc-nn defconfig size erase_flash flash monitor
MFCC STREAMING

Streaming neural network mode. Model receives portion of the input sequence and classifies it incrementally. Jupyter:

~$ make -C mfcc-nn-streaming defconfig size erase_flash flash monitor
TRANSFER LEARNING

Transfer learning is usually done for tasks where your dataset has too little data to train a full-scale model from scratch. TL consists of taking features learned on one problem, and leveraging them on a new, similar problem. For instance, features from a model that has learned to identify english speech may be useful to kick-start a model meant to identify russian.

Embedding | Synthesize | Transfer

~$ make -C mfcc-nn-streaming menuconfig # kws => ru
~$ make -C mfcc-nn-streaming size erase_flash flash monitor
MFCC STREAMING DOUBLE

A false positive error, or false positive, is a result that indicates a given condition exists when it does not. This example consists of two neural networks: first is for classification, second for confirmation. Confirmation allows to reduce false positives. Musan | Libri:

~$ make -C 2-mfcc-nn-streaming defconfig size erase_flash flash monitor

esp32_kws's People

Contributors

mazkobot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

esp32_kws's Issues

ESP32-S3

Just wondering if you know or are able to optimise for esp32?
The FFT routines on the S3 are just shy of x10 perf of the normal esp32 due to vector math.
https://docs.espressif.com/projects/esp-dsp/en/latest/esp-dsp-benchmarks.html
Supposedly ML is 4-6x but still trying to work out if tflite-micro has been optimised.
Also xtensa do a LSTM layer which is very interesting for streaming models but again confused if implemented for S3

Just thought I would ask if you knew any more with the S3 being so new.

Questions no issue

You have employed MFCC which is great as https://github.com/StuartIanNaylor/simple_audio_tensorflow just tests seems to add 3/4% accuracy via MFCC alone.

If I take a look at https://github.com/ARM-software/ML-KWS-for-MCU or generally the bewildering work of network architecture what is a DCNN? I should of looked at the code maybe but thought I would ask DFT->CNN?
I will have to read https://arxiv.org/pdf/2005.06720.pdf several times as it seeps in.

How does it compare to a CRNN with noise (SNR) as that is problematic for all but from what I have read CRNN seem to cope quite well with higher noise levels.

As said I will try reading that arxiv.org paper again but a question I keep asking as ESP32 is that do you think it would be possible to run 2x instance of the KWS? (Simple unidirectional mics and using best confidence (softmax) to forward an audio stream from the best mic)
Even VAD can be server side with just MQTT to subscribe to a 'end of sentence/ stop streaming' for the ASR sentence after KW.
With MFCC, MQTT & KWS is there any chance on a ESP32 it could run 2 instances on separate mics?

Also I always get confused about the Google Command set as there seems many bad samples of just simple bad padding, trims or null audio.
Is it just a datum or the bad contents is also a reflection on how an architecture can cope as why have google never trimmed it out or is it just to confuse the likes such as me?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.