Giter Site home page Giter Site logo

deep-dregs's Introduction

An aiohttp stream server for DeepSpeech

Implements a full asynchronous HTTP server for converting speech to text in realtime using DeepSpeech.

Since version 0.2, DeepSpeech has supported streaming APIs that allow processing audio data in small chunks instead of all at once. This enables the capability of a real-time streaming server (given sufficient hardware).

Real time streaming of audio data requires that both client and server support HTTP chunks so that data can be passed from the client to the server in small batches. This allows the server to start processing the data with DeepSpeech while more data is being recorded and transferred by the client.

Usage

This project uses Pipenv to manage dependencies. To install the dependencies and get a shell, run: pipenv install && pipenv shell

Once in that, shell, simply run ./app.py to start the server.

Results

The preliminary results from this implementation are very encouraging. On my dual-core i5-6200 laptop, the server is easily able to keep up with a realtime stream of audio data from the microphone:

Client:

$ ./examples/mic http://localhost:8080/stt
rec WARN wav: Length in output .wav header will be wrong since can't seek to fix it

Input File     : 'default' (pulseaudio)
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Sample Encoding: 16-bit Signed Integer PCM

In:0.00% 00:00:05.63 [00:00:00.00] Out:55.8k [      |      ]        Clip:0
Done.
the quick brown fox jumped over the lazy dog

Server

$ ./app.py
INFO:root:Loading model...
TensorFlow: v1.12.0-10-ge232881
DeepSpeech: v0.4.1-0-g0e40db6
2019-06-05 09:54:49.098465: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
INFO:root:Model was loaded in 0.236s
======== Running on http://0.0.0.0:8080 ========
(Press CTRL+C to quit)
INFO:root:Processing Stream...
INFO:root:Inference took 1.751s for 3.486s audio sample with 0.249s latency. Total time: 7.098s

As can be seen, the server was easily able to process the streamed audio data about twice as fast as it was streamed. In addition, the minimal latency (time from when the last audio sample was received to the time when the server sent a response to the client) meant that the server was able to respond back to the client very promptly once the audio clip was finished.

deep-dregs's People

Contributors

jpewdev avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.