Giter Site home page Giter Site logo

ishine / kaldiwebrtcserver-1 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from danijel3/kaldiwebrtcserver

0.0 1.0 0.0 2.59 MB

Python server for communicating with Kaldi from the browser using WebRTC

License: Apache License 2.0

Python 45.13% JavaScript 28.08% HTML 10.18% CSS 1.74% Dockerfile 11.83% Makefile 3.04%

kaldiwebrtcserver-1's Introduction

Kaldi WebRTC server demo

This is a demonstration of realtime online speech recognition using the Kaldi speech recognition toolkit.

It uses WebRTC to communicate between the server and the browser. It sends audio from the user's microphone using the WebRTC audio track and sends text from the server to the browser using the WebRTC datachannel (most commonly used for sending chat messages).

The server program is written in Python. It uses aiohttp to display the web-page and serve other static data (javascript, CSS. images). For WebRTC functionality it uses the excellent aiortc library. The system requires only one Python server running, but supports multiple Kaldi instances in the background. Once it receives a request from the browser it opens a connection to the Kaldi engine and keeps forwarding audio and text between Kaldi and the user's browser.

Usage

The easiest way to use this program is with Docker, as described below. The following section explains how the program available here works.

The server loads a JSON configuration file as an argument to the program. The configuration file defines the hosts and ports of the Kaldi engines, as well as their samplerate (different models can have different sample rates).

In the future, I plan to add the option to include different types of engines (eg. for different languages) that can be picked from the website.

After loading the configuration, the server creates a queue and simply takes engines from the queue as they are requested. If the queue gets exhausted, an error 500 is returned. Simply put, you need as many engines runnning, as the number of concurrent browser sessions you intend to support.

Kaldi

This server relies on the connection with the online2-tcp-nnet3-decode-faster program. If you want to install it on your own, please follow the official instructions for installing Kaldi. You can find a brief version of that in docker/kaldi/Dockerfile.

Docker

This is the simplest way of setting up and testing this project. I have created a couple of Docker images with all the neccessary components and uploaded them to docker hub. In order to use them, you don't need to copy anything from this repository. You just need to have Docker installed and run the commands as described below.

In addition to Docker, you will need to have the docker-compose program installed. This program allows to easily start several containers at once simply by changing the configuration in a yml file. A sample is provided in docker/docker-compose.yml.

To run the server, simply copy the docker/docker-compose.yml and the required docker/servers.json files into a folder of your choice and run: docker-compose up -d

First time you run it, the program will download the images from Dockerhub so it may take a little while. Once it's running, you can run docker-compose logs -f to monitor the logs of the running servers.

At any time you can run docker-compose stop to temporarily shutdown and docker-compose start to restart the service. Finally, you can run docker-compose down to stop and remove the containers altogether.

If you want to set up more Kaldi engines, you need to edit both the docker-compose.yml and servers.json files.

More details on the dockerfiles is provided in this document.

kaldiwebrtcserver-1's People

Contributors

danijel3 avatar milochen0418 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.