Belarusian Speech-to-Text

Speech-to-Text (STT) or Automated Speech Recognition (ASR) is the task of building textual transcription for the input audio file.

Description

This repository contains code to train and evaluate STT model for Belarusian language.

Common Voice 8 dataset was used to train & evaluate the model.

Acoustic model (AM) was created by fine-tuning facebook/wav2vec2-base model.

Additionaly, 5-gram Language model (LM) was built using KenLM library.

Model demo & checkpoint

You can play with model in a Demo application here: huggingface.co/spaces/ales/wav2vec2-cv-be-lm. It uses full pipeline of Acoustic model + Language model.

The best model checkpoint (weights) is located here: huggingface.co/ales/wav2vec2-cv-be. This page also contains a demo widget, however only Acoustic model is utilized there because of HuggingFace Hosted inference API limitations. Thus performance of model in this widget will be worse than Demo application mentioned above (because the latter also uses Language model).

Metrics

Current metrics for Common Voice 8:

model	WER on Dev set	WER on Test set	Rate of fully recognized sentences on Test set
Acoustic model only	0.1761	0.187	36.688%
Acoustic model + 5-gram Language model	0.115	0.124	52.269%

Training

Current best model was trained for 5 epochs.

Train, Dev, Test sets of Common Voice 8 dataset were used as they are (however one may enlarge them using Validated set to achieve better model performance) - see eda/cv8be_eda.ipynb notebook.

Language model

KenLM library was used to build 5-gram Language model (LM).

Language model is used to decode predictions of wav2vec2 model (Acoustic model) and improve performance.

Textual corpus for LM consists of sentences from Train and Validated - Dev - Test sets of Common Voice 8 dataset (~314'000 unique sentences in total).

TODO:

will try to gather much larger textual corpus to build a better Language Model

navalnica / wav2vec2-belarusian Goto Github PK

wav2vec2-belarusian's Introduction

Belarusian Speech-to-Text

Description

Model demo & checkpoint

Metrics

Training

Language model

wav2vec2-belarusian's People

Contributors

Stargazers

Watchers

wav2vec2-belarusian's Issues

Gather large textual corpus to rebuild language model

Extract input ids from any dataset on the fly

Evaluate current model on Fleurs dataset

Move reusable code from jupyter notebooks to python modules

Retrain model on CommonVoice 11

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent