Giter Site home page Giter Site logo

Comments (29)

nshmyrev avatar nshmyrev commented on May 22, 2024 5

Documentation about process https://github.com/alphacep/vosk-api/blob/master/doc/models.md#training-your-own-model

from vosk-api.

swentel avatar swentel commented on May 22, 2024 4

First of all: thanks for the Android library! I'm testing it in https://github.com/swentel/solfidola, and it actually works pretty great!

The way I use it in my app is as voice commands. Some words trigger an action. So I was wondering whether I could have a model which only consists of a few words. I basically only need to recognize words like : 'one', 'two', 'three' and 'play'. I don't care about other words as they don't trigger anything in the app.

I'm currently installing kaldi (make is compiling hehe), and then going to try and figure out if I can create a model with only a couple of words.

But I wonder: does this idea sense, and will the model size in the end be smaller? I'd rather don't want to ship 30mb for only a few words to recognize.

I'll write down steps if I can figure out myself, but any more detailed steps to create such a model would be awesome, but no worries if that's hard to write down in a few lines :)

from vosk-api.

nshmyrev avatar nshmyrev commented on May 22, 2024 1

Will it be possible to have a "simple" script that take simple input folder with wav and csv files to do all the work to create a model?

Sure, it is called mini_librispeech recipe. It is in kaldi/egs/mini_librispeech/s5/run.sh

from vosk-api.

swentel avatar swentel commented on May 22, 2024 1

Published a blog post at https://realize.be/blog/offline-speech-text-trigger-custom-commands-android-kaldi-and-vosk

In case I made some stupid mistakes, do let me know ;)

from vosk-api.

nshmyrev avatar nshmyrev commented on May 22, 2024

@KatPro models are trained with Kaldi. Follow standard kaldi training scripts, for example, mini_librispeech example.

from vosk-api.

KatPro avatar KatPro commented on May 22, 2024

Thank you! And is it possible to train the model for another language following Kaldi training scripts?

from vosk-api.

nshmyrev avatar nshmyrev commented on May 22, 2024

Reopen to increase visibility

from vosk-api.

nyroDev avatar nyroDev commented on May 22, 2024

Hi @nshmyrev,
Will it be possible to have a "simple" script that take simple input folder with wav and csv files to do all the work to create a model?

from vosk-api.

nshmyrev avatar nshmyrev commented on May 22, 2024

@swentel you can just rebuild the graph, see

https://github.com/alphacep/vosk-api/blob/master/doc/adaptation.md

You can also select words in runtime, see

https://github.com/alphacep/vosk-api/blob/master/python/example/test_words.py

let me know if you have further questions

from vosk-api.

nshmyrev avatar nshmyrev commented on May 22, 2024

@swentel also see #55

from vosk-api.

swentel avatar swentel commented on May 22, 2024

Oooh, great, thanks for the quick answer!

I'll get cracking at it after dinner. This will be awesome if it works, and I'm going to write a blog post about it, because the world needs to know about this :)

from vosk-api.

nshmyrev avatar nshmyrev commented on May 22, 2024

Thank you @swentel, let me know how it goes!

from vosk-api.

swentel avatar swentel commented on May 22, 2024

So this actually seemed to work!

Based on the adaption readme, both commands are running, although I'm not a 100% sure what the first command does (fstsymbols ...)

However, when running the second command with the text file with my custom words in, Gr.fst is now only 2.6MB (compared to 23MB) , completely reinstalled the app again on my phone and it still works. Saved 20Mb, that's great!

So looking in the model directory, I still see a couple of files which are 'relatively' large:

  • final.mdl (14mb)
  • HCLr.fst (5.9mb)
  • ivector/final.ie (8mb)

I was wondering: can I do something with those too?
Or even better, are they even needed for the recognizer to work? (To be honest, I could have tested that myself of course already by deploying a new version and leaving those files out)

(I'm almost sorry for what I guess are newbie questions, completely new to kaldi, but super excited it works!)

from vosk-api.

nshmyrev avatar nshmyrev commented on May 22, 2024

Or even better, are they even needed for the recognizer to work? (To be honest, I could have tested that myself of course already by deploying a new version and leaving those files out)

Those files are still needed.

from vosk-api.

swentel avatar swentel commented on May 22, 2024

Ok, cool, thanks!

from vosk-api.

nshmyrev avatar nshmyrev commented on May 22, 2024

@swentel amazing, thanks a lot!

from vosk-api.

nshmyrev avatar nshmyrev commented on May 22, 2024

Related #314

from vosk-api.

dazzzed avatar dazzzed commented on May 22, 2024

How do we structure the words.txt file for adaptation?

trying with

covid-19 coronavirus

in my words.txt file I get:

SymbolTable::ReadText: Bad non-negative integer "coronavirus"

from vosk-api.

plehal avatar plehal commented on May 22, 2024

The command mentioned here to create a new language model does exist in default compile of kaldi. egs directory iis empty in default compile.

from vosk-api.

jipinhetundu avatar jipinhetundu commented on May 22, 2024

Will it be possible to have a "simple" script that take simple input folder with wav and csv files to do all the work to create a model?

Sure, it is called mini_librispeech recipe. It is in kaldi/egs/mini_librispeech/s5/run.sh

I tried 1. used the mini_librispeech recipe and generated some files,and 2. tried to arrange the files as part of “Model Structure” in https://alphacephei.com/vosk/models.

But I have generated many files with the same name after the first step, like for ‘final.mdl’, I have exp/mono/final.mdl, exp/tri2b/final.mdl, exp/tri1/final.mdl, etc. I don't know which file should I put into the final structure. Any suggestions?

from vosk-api.

nshmyrev avatar nshmyrev commented on May 22, 2024

We actually have our new recipe:

https://github.com/alphacep/vosk-api/tree/master/training

trained model is in exp/chain/tdnn.

from vosk-api.

jipinhetundu avatar jipinhetundu commented on May 22, 2024

We actually have our new recipe:

https://github.com/alphacep/vosk-api/tree/master/training

trained model is in exp/chain/tdnn.

Thanks a lot for your answer!

I followed your steps and tried running the new recipe, but ran into a small problem at line 28 of run.sh. it reminds me that this is not the correct usage

local/prepare_dict.sh data/local/lm data/local/dict
Usage: local/prepare_dict.sh [options] <lm-dir><g2p-model-dir> <dst-dir>

............

I observed the writing of the corresponding part of mini_librispeech/s5/run.sh. This file is written as
local/prepare_dict.sh --stage 3 --nj 30 --cmd "$train_cmd"
data/local/lm data/local/lm data/local/dict_nosp

So I changed the corresponding part to

  1. local/prepare_dict.sh data/local/lm data/local/lm data/local/dict
  2. local/prepare_dict.sh --stage 3 --nj 30 data/local/lm data/local/lm data/local/dict
  3. local/prepare_dict.sh --stage 3 --nj 30 --cmd "$train_cmd" data/local/lm data/local/lm data/local/dict

1 and 2 correspond to different outputs, while 3 reports an error. I don't know much about kaldi and I'm not sure if it's due to different versions. I updated the latest version of kaldi three days ago. I want to know what I should do next?

from vosk-api.

nshmyrev avatar nshmyrev commented on May 22, 2024

Usage: local/prepare_dict.sh [options]

Seems like you are not using local/prepare_dict.sh from our recipe, you should have old file. Our one doesn't have any options like in the message:

https://github.com/alphacep/vosk-api/blob/master/training/local/prepare_dict.sh

from vosk-api.

jipinhetundu avatar jipinhetundu commented on May 22, 2024

好像你没有使用我们食谱中的 local/prepare_dict.sh,你应该有旧文件。我们的没有消息中的任何选项:

https://github.com/alphacep/vosk-api/blob/master/training/local/prepare_dict.sh

Looks like I asked a stupid question. Thank you for answering patiently, I have successfully run it!

from vosk-api.

Manikandan18M avatar Manikandan18M commented on May 22, 2024

Also @nshmyrev what are the changes should I make to produce models of high accuracy if I'm training a model from scratch? You have suggested to train ivector of dim 40 to save memory but does this affect accuracy?

It will also helpful if you could the share the directories to look for in order to build the final_result model compatible for vosk that has files such as final.mdl, final.ie, conf etc...

from vosk-api.

Manikandan18M avatar Manikandan18M commented on May 22, 2024

@nshmyrev

from vosk-api.

nshmyrev avatar nshmyrev commented on May 22, 2024

Also @nshmyrev what are the changes should I make to produce models of high accuracy if I'm training a model from scratch?

It depends on too many factors - domain of speech, amount of audio, amount of GPUs. It is hard to guess.

from vosk-api.

ankur995 avatar ankur995 commented on May 22, 2024

Documentation about process https://github.com/alphacep/vosk-api/blob/master/doc/models.md#training-your-own-model

not able to open this url

from vosk-api.

nshmyrev avatar nshmyrev commented on May 22, 2024

@ankur995 yes, it is obsolete. Our training setup is here:

https://github.com/alphacep/vosk-api/tree/master/training

There is colab:

https://github.com/alphacep/vosk-api/blob/master/python/example/colab/vosk-training.ipynb

from vosk-api.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.