Giter Site home page Giter Site logo

linto-ai / webvoicesdk Goto Github PK

View Code? Open in Web Editor NEW
30.0 5.0 10.0 4 MB

Buildings block for voice-enabled applications in the browser

License: GNU Affero General Public License v3.0

JavaScript 91.74% HTML 8.26%
speech-recognition javascript machine-learning tensorflow speech-to-text vocal-assistant wake-word-detection wakeword

webvoicesdk's Introduction

WebVoice SDK

WebVoice SDK is a JavaScript library that provides lightweights and fairly well optimized buildings block for always-listening voice-enabled applications right in the browser. This library is the main technology behind LinTO's Web Client as it deals with everything related to user's voice input.

Functionalities

  • Hardware Microphone Handler : hook to hardware, record, playback, get file from buffer as wav... very handy
  • Downsampler : re-inject acquired audio at any given samplerate / frame size
  • Speech Preemphaser : Prepare acquired audio for machine learning tasks
  • Voice activity detection : Detect when someone's speaking (even at very low signal-to-noise ratio)
  • Features extraction : Pure JavaScript MFCC (Mel-Frequency Cepstral Coefficients) implementation
  • wake word / hot word / trigger word : Immediatly trigger tasks whenever an associated chosen word has been pronounced

Online demo

You can find an online demo of the library on this static webpage : https://webvoicesdk.netlify.app/

It showcases the entire pipeline : microphone -> voice-activity-detection -> downsampling -> speech-preemphasis -> features-extraction -> wake-word-inference

Note : To start the tool, click on the start button, accept browser's access to the default audio input. The Voice Activity Detection "led" will blink as someone's speaking. Something magic will happen if someone says Linto. (Something like "LeanToh" for english speakers as the model was trained with our french data-set)

Note : You can select the model you want to use. The library comes prepacked with two wake word models (one model for LinTO and a triple headed model that bounces on LinTO, Snips or Firefox)

Highlights

  • Complete multithreading JavaScript implementation using Workers for real-time processing on any machine
  • WebAssembly optimisations whenever possible
  • State of the art Recurent Neural Network that uses WebAssembly portable runtime for voice activity detection. This is modern and efficient alternative to the popular Hark voice activity detection tool
  • Supports single inline script that can get deployed in any webpage without mandatory bundlers
  • Built library embbeds everything (wasm files, tensorflow.js models for wake words, workers...) into a single static javascript file
  • The wake word Engine relies on Tensorflow JS and WebAssembly portable runtime to infers towards single or multiple wake-words model with lightweight and ultra-effecient performances.
  • Portable machine-learning models : Use the same wake word models on embedded devices, mobile phones, desktop computers, web pages. See : LinTO Hotword Model Generator and Create your custom wake-word
  • Full offline speech recognition in browser, no server behind, all the magic happens in your webpage itself

Usage

Further documentation and information is in progress. For the moment, You can still build and test the library by yourself

npm run test

Or import it in your browser :

<script>https://cdn.jsdelivr.net/gh/linto-ai/webVoiceSDK@master/dist/webVoiceSDK-linto.min.js</script>

Copyright notice

This library includes modified bits from :

webvoicesdk's People

Contributors

damienlaine avatar lokhozt avatar rlopezdev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

webvoicesdk's Issues

Hotwords detection is not working with a fresh build

Description

Hotword service worker throw an error message ExitStatus {name: "ExitStatus", message: "Program terminated with exit(1)", status: 1} after a fresh build.

How to reproduce

  • Empty the node_module
  • Run npm install
  • Run npm test
  • Go to http://localhost:1234, open the console and run the hotword pipeline

I'm afraid that my knowledges and skills doesn't allow me a further investigation ...

Calling mic.stop() is not working once hotword has been spotted

Hi Damien,

I'm facing a quite weird issue :
When calling mic.stop() at a first place everything works as expected that means the following icon disappears. But once 'LinTO' hotword has been spotted calling mic.stop() has no effect.

Chrome mic on

So I tried modifying WebvoiceSDK since track we got had an "ended" status :

this.stream.getTracks().forEach((track) => {
  if (track.kind === 'audio' && typeof track.stop === 'function') track.stop()
})

But still the mic icon persists.

Do you got any idea on how to really stop listening the user ?

Thank you

How to use it in react js , with custom wake word?

Hi i am looking forward for good hotword detection library purely in node js, and i found WebVoiceSDK, it's performance in the demo is really good.
Now can you tell me how to utilise it in react js and with custom wakeword?

any refrence ? if so then this library has very good potenital to scale up.

Needs an AGPL warning at the top of the documentation

You library is awesome, it solved all my problems, and I will never be able to use it due to your license choice.

I appreciate it's your code, and you can do what you like with it, but doesn't it seem a waste that another project doing exactly the same thing will need to be created and maintained by someone, just to remove the license restriction that AGPL imposes? If I include your library in my larger web application, releasing all the source code to our competitors is simply not going to fly.

To avoid wasting other people's time spent integrating this, please include a large warning at the top of the readme, indicating it's basically not suitable for most commercial use. If you want to be super-helpful, pointing us to competing projects with different licenses would be rather nice.

It's such a shame, as it works really well.

Dependency update broke the library

After a fresh build the WASM file doesn't work. A month or two ago everything was working as expected.
npm install + npm test:
Errors in firefox:

Module.asm.c.apply is not a function

this.wasmInterface is undefined

Errors in chrome:

TypeError: ___wasm_call_ctors.apply is not a function

Uncaught TypeError: Cannot read properties of undefined (reading 'HEAPF32')
    at Rnnoise.copyPCMSampleToWasmBuffer

Also, is it possible to prevent such issues by either adding a package-lock file or specifying the version?

Recorder on mobile imply a jerky wav output

Description

When using the Recorder on a mobile, the generated wav is damaged.
Mic options :

{
      frameSize: 4096,
      constraints: {
        echoCancellation: true,
        autoGainControl: true,
        noiseSuppression: true
      }
    }

Attached to this issue you'll find a zip (github does not accept audio files to be attached) containing 3 audio files :

  • desktop.wav recorded on a desktop with WebVoiceSDK.Recorder
  • mobile.wav recorded on an iPhoneX with WebVoiceSDK.Recorder
  • usermedia-mediarecorder.webm recorded with native MediaRecorder and navigator.mediaDevices.getUserMedia({ audio: true })

Archive.zip

Do you have any ideas on a way to solve this problem ?
Maybe once I'll be able to start the project I can add "Record mic" "Stop record mic" buttons to the test page and consequently to https://webvoicesdk.netlify.app/
That way we'll be able to test the recorder across many devices.

Thank you

EDIT: After further investigation output wave audio is cleaner if I disable other intensive task (gesture recognition) on my webapp. Anyway that means transcoding AudioContext stream to WAV will be affected when available ressources are low. Maybe we can add options in the Recorder constructor to allow developers to opt for a more Native recording system.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.