Giter Site home page Giter Site logo

navalnica / wav2vec2-belarusian Goto Github PK

View Code? Open in Web Editor NEW
4.0 1.0 0.0 1.4 MB

Speech to Text model for Belarusian language

Jupyter Notebook 99.75% Python 0.25%
belarusian belarusian-language common-voice speech-recognition speech-to-text stt wav2vec2 belarus

wav2vec2-belarusian's Introduction

Belarusian Speech-to-Text

Speech-to-Text (STT) or Automated Speech Recognition (ASR) is the task of building textual transcription for the input audio file.

Description

This repository contains code to train and evaluate STT model for Belarusian language.

Common Voice 8 dataset was used to train & evaluate the model.

Acoustic model (AM) was created by fine-tuning facebook/wav2vec2-base model.

Additionaly, 5-gram Language model (LM) was built using KenLM library.

Model demo & checkpoint

You can play with model in a Demo application here: huggingface.co/spaces/ales/wav2vec2-cv-be-lm. It uses full pipeline of Acoustic model + Language model.

The best model checkpoint (weights) is located here: huggingface.co/ales/wav2vec2-cv-be. This page also contains a demo widget, however only Acoustic model is utilized there because of HuggingFace Hosted inference API limitations. Thus performance of model in this widget will be worse than Demo application mentioned above (because the latter also uses Language model).

Metrics

Current metrics for Common Voice 8:

model WER on Dev set WER on Test set Rate of fully recognized sentences on Test set
Acoustic model only 0.1761 0.187 36.688%
Acoustic model + 5-gram Language model 0.115 0.124 52.269%

Training

Current best model was trained for 5 epochs.

Train, Dev, Test sets of Common Voice 8 dataset were used as they are (however one may enlarge them using Validated set to achieve better model performance) - see eda/cv8be_eda.ipynb notebook.

Language model

KenLM library was used to build 5-gram Language model (LM).

Language model is used to decode predictions of wav2vec2 model (Acoustic model) and improve performance.

Textual corpus for LM consists of sentences from Train and Validated - Dev - Test sets of Common Voice 8 dataset (~314'000 unique sentences in total).

TODO:

  • will try to gather much larger textual corpus to build a better Language Model

wav2vec2-belarusian's People

Contributors

navalnica avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

wav2vec2-belarusian's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.