Giter Site home page Giter Site logo

commonvoice-th's Introduction

CommonVoice-TH Recipe

A commonvoice-th recipe for training ASR engine using Kaldi. The following recipe follows commonvoice recipe with slight modification

Installation

The author use docker to run the container. GPU is required to train tdnn_chain, else the script can train only up to tri3b.

Downloading Commonvoice Corpus

We will need a commonvoice corpus for training ASR Engine. We are using Commonvoice Corpus 7.0 in Thai language which can be download here. Once downloaded, unzip it as we will use it later to mount dataset to the docker container.

Downloading SRILM

Before building docker, SRILM file need to be downloaded. You can download it from here. Once the file is downloaded, remove version name (e.g. from srilm-1.7.3.tar.gz to srilm.tar.gz and place it inside docker directory. Your docker directory should contains 2 files: dockerfile, and srilm.tar.gz.

Building Docker for Training with Kaldi

Once you have prepared SRILM file, you are ready to build docker for training this recipe. This docker automatically install project's dependendies and stored it in an image. To build a docker image, run:

$ cd docker
$ docker build -t <docker-name> kaldi

Run docker and attach command line

Once the image had been built, all you have to do is interactively attach to its bash terminal via the following command:

$ docker run -it -v <path-to-repo>:/opt/kaldi/egs/commonvoice-th \
                 -v <path-to-repo>/labels:/mnt/labels \
                 -v <path-to-cv-corpus>:/mnt \
                 --gpus all --name <container-name> <built-docker-name> bash

Once you finish this step, you should be in a docker container's bash terminal now

Building Docker for inferencing via Vosk

We also provide an example of how to inference a trained kaldi model using Vosk. Berore we begin, let's build Vosk docker image:

$ cd docker
$ docker build -t <docker-name> vosk-inference
$ cd ..  # back to root directory

Preparing Directories for Vosk Inferencing

The first step is to download provided Vosk model format on this github's release. Unzip it to vosk-inference directory. Or you can just follow this code.

$ cd vosk-inference
$ wget https://github.com/vistec-AI/commonvoice-th/releases/download/vosk-v1/model.zip
$ unzip model.zip

Run docker and test inference script

To prevent dependencies problem, the Vosk inference python script must be run inside a docker image that we just built. First, let's initiate a docker

$ docker run -it -v <path-to-repo>:/workspace \
                 --name <container-name> \
                 -p 8000:8000 \
                 <build-docker-name> bash

Then, you will be attached to a linux terminal inside the container. To inference an audio file, run:

$ cd vosk-inference
$ python3.8 inference.py --wav-path <path-to-wav>  # test it with test.wav

Note that audio file must be 16k samping rate and mono channel!

Instaltiating Vosk Server to Processing audio files

We also provide a fastapi server that will allow user to transcribe their own audio file via RESTful API. To instantiate server, run this command inside a docker shell

$ cd vosk-inference
$ uvicorn server:app --host 0.0.0.0 --reload

Now, the server will instantiate at http://localhost:8000. To see if server is correctly instantiated, try to browse http://localhost:8000/healthcheck. If the webpage loaded then we are good to go!

API Endpoint

The endpoint will be in form-data format where each file is attached to a form field named audios. See python example

import requests

url = "localhost:8000/transcribe"

payload={}
files=[
    ('audios', (<file-name>, open(<file-path>, 'rb'), 'audio/wav')),
    ...
]
headers = {}

response = requests.request("POST", url, headers=headers, data=payload, files=files)

print(response.text)

Online Decoding with WebRTC Protocol

Read more at this repository. The provided repository contains an easy way to deploy Kaldi tdnn-chain model to webRTC server.

Usage

To run the training pipeline, go to recipe directory and run run.sh script

$ cd /opt/kaldi/egs/commonvoice-th/s5
$ ./run.sh --stage 0

Experiment Results

Here are some experiment results evaluated on dev set:

Model dev dev-unique
WER CER WER CER
mono 79.13% 57.31% 77.79% 48.97%
tri1 56.55% 37.88% 53.26% 27.99%
tri2b 50.64% 32.85% 47.38% 21.89%
tri3b 50.52% 32.70% 47.06% 21.67%
tri4b 46.81% 29.47% 43.18% 18.05%
tdnn-chain 29.15% 14.96% 30.84% 8.75%
tdnn-chain-online 29.02% 14.64% 30.41% 8.28%

Here is final test set result evaluated on tdnn-chain

Model test test-unique
WER CER WER CER
tdnn-chain-online 9.71% 3.12% 23.04% 7.57%
airesearch/wav2vec2-xlsr-53-th - - 13.63 2.81%
Google Web Speech API - - 13.71% 7.36%
Microsoft Bing Search API - - 12.58% 5.01%
Amazon Transcribe - - 21.86% 7.08%

Author

Chompakorn Chaksangchaichot

commonvoice-th's People

Contributors

tann9949 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.