daanzu / kaldi-active-grammar Goto Github PK

View Code? Open in Web Editor NEW

334.0 31.0 49.0 593 KB

Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time

License: GNU Affero General Public License v3.0

Python 93.36% CMake 3.37% Shell 3.27%

kaldi-asr speech-recognition grammars python speech-to-text kaldi kaldi-grammar dictation voice coding

kaldi-active-grammar's Introduction

Kaldi Active Grammar

Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time

Python package developed to enable context-based command & control of computer applications, as in the Dragonfly speech recognition framework, using the Kaldi automatic speech recognition engine.

Normally, Kaldi decoding graphs are monolithic, require expensive up-front off-line compilation, and are static during decoding. Kaldi's new grammar framework allows multiple independent grammars with nonterminals, to be compiled separately and stitched together dynamically at decode-time, but all the grammars are always active and capable of being recognized.

This project extends that to allow each grammar/rule to be independently marked as active/inactive dynamically on a per-utterance basis (set at the beginning of each utterance). Dragonfly is then capable of activating only the appropriate grammars for the current environment, resulting in increased accuracy due to fewer possible recognitions. Furthermore, the dictation grammar can be shared between all the command grammars, which can be compiled quickly without needing to include large-vocabulary dictation directly.

See the Changelog for the latest updates.

Features

Binaries: The Python package includes all necessary binaries for decoding on Windows/Linux/MacOS. Available on PyPI.
- Binaries are generated from my fork of Kaldi, which is only intended to be used by kaldi-active-grammar directly, and not as a stand-alone library.
Pre-trained model: A compatible general English Kaldi nnet3 chain model is trained on ~3000 hours of open audio. Available under project releases.
- Model info and comparison
- Improved models are under development.
Plain dictation: Do you just want to recognize plain dictation? Seems kind of boring, but okay! There is an interface for plain dictation (see below), using either your specified HCLG.fst file, or KaldiAG's included pre-trained dictation model.
Dragonfly/Caster: A compatible backend for Dragonfly is under development in the kaldi branch of my fork, and has been merged as of Dragonfly v0.15.0.
- See its documentation, try out a demo, or use the loader to run all normal dragonfly scripts.
- You can try it out easily on Windows using a simple no-install package: see Getting Started below.
- Caster is supported as of KaldiAG v0.6.0 and Dragonfly v0.16.1.
Bootstrapped since v0.2: development of KaldiAG is done entirely using KaldiAG.

Demo Video

Donations are appreciated to encourage development.

Related Repositories

daanzu/kaldi-grammar-simple
daanzu/speech-training-recorder
daanzu/dragonfly_daanzu_tools
kmdouglass/caster-kaldi: Docker image to run KaldiAG + Dragonfly + Caster inside a container on Linux, using the host's microphone.

Getting Started

Want to get started quickly & easily on Windows? Available under project releases:

kaldi-dragonfly-winpython: A self-contained, portable, batteries-included (python & libraries & model) distribution of kaldi-active-grammar + dragonfly2. Just unzip and run!
kaldi-dragonfly-winpython-dev: [more recent development version] A self-contained, portable, batteries-included (python & libraries & model) distribution of kaldi-active-grammar + dragonfly2. Just unzip and run!
kaldi-caster-winpython-dev: [more recent development version] A self-contained, portable, batteries-included (python & libraries & model) distribution of kaldi-active-grammar + dragonfly2 + caster. Just unzip and run!

Otherwise...

Setup

Requirements:

Python 3.6+; 64-bit required!
OS: Windows/Linux/MacOS all supported
Only supports Kaldi left-biphone models, specifically nnet3 chain models, with specific modifications
~1GB+ disk space for model plus temporary storage and cache, depending on your grammar complexity
~1GB+ RAM for model and grammars, depending on your model and grammar complexity

Installation:

Download compatible generic English Kaldi nnet3 chain model from project releases. Unzip the model and pass the directory path to kaldi-active-grammar constructor.
- Or use your own model. Standard Kaldi models must be converted to be usable. Conversion can be performed automatically, but this hasn't been fully implemented yet.
Install Python package, which includes necessary Kaldi binaries:
- The easy way to use kaldi-active-grammar is as a backend to dragonfly, which makes it easy to define grammars and resultant actions.
  - For this, simply run pip install 'dragonfly2[kaldi]' to install all necessary packages. See the dragonfly documentation for details on installation, plus how to define grammars and actions.
- Alternatively, if you only want to use it directly (via a more low level interface), you can just run pip install kaldi-active-grammar
To support automatic generation of pronunciations for unknown words (not in the lexicon), you have two choices:
- Local generation: Install the g2p_en package with pip install 'kaldi-active-grammar[g2p_en]'
  - The necessary data files are now included in the latest speech models I released with v3.0.0.
- Online/cloud generation: Install the requests package with pip install 'kaldi-active-grammar[online]' AND pass allow_online_pronunciations=True to Compiler.add_word() or Model.add_word()
- If both are available, the former is preferentially used.

Troubleshooting

Errors installing
- Make sure you're using a 64-bit Python.
- You should install via pip install kaldi-active-grammar (directly or indirectly), not python setup.py install, in order to get the required binaries.
- Update your pip (to at least 19.0+) by executing python -m pip install --upgrade pip, to support the required python binary wheel package.
Errors running
- Windows: The code execution cannot proceed because VCRUNTIME140.dll was not found. (or similar)
  - You must install the VC2017+ redistributable from Microsoft: download page, direct link. (This is usually already installed globally by other programs.)
- Try deleting the Kaldi model .tmp directory, and re-running.
- Try deleting the Kaldi model directory itself, re-downloading and/or re-extracting it, and re-running. (Note: You may want to make a copy of your user_lexicon.txt file before deleting, to put in the new model directory.)
For reporting issues, try running with import logging; logging.basicConfig(level=1) at the top of your main/loader file to enable full debugging logging.

Documentation

Formal documentation is somewhat lacking currently. To see example usage, examine:

Plain dictation interface: Set up recognizer for plain dictation; perform decoding on given wav file.
Full example: Set up grammar compiler & decoder; set up a rule; perform decoding on live, real-time audio from microphone.
Backend for Dragonfly: Many advanced features and complex interactions.

The KaldiAG API is fairly low level, but basically: you define a set of grammar rules, then send in audio data, along with a bit mask of which rules are active at the beginning of each utterance, and receive back the recognized rule and text. The easy way is to go through Dragonfly, which makes it easy to define the rules, contexts, and actions.

Building

Recommendation: use the binary wheels distributed for all major platforms.
- Significant work has gone into allowing you to avoid the many repo/dependency downloads, GBs of disk space, and vCPU-hours needed for building from scratch.
- They are built in public by automated Continuous Integration run on GitHub Actions: see manifest.
Alternatively, to build for use locally:
- Linux/MacOS:
  1. python -m pip install -r requirements-build.txt
  2. python setup.py bdist_wheel (see CMakeLists.txt for details)
- Windows:
  - Less easily generally automated
  - You can follow the steps for Continuous Integration run on GitHub Actions: see the build-windows section of the manifest.
Note: the project (and python wheel) is built from a duorepo (2 separate repos used together):
1. This repo, containing the external interface and higher-level logic, written in Python.
2. My fork of Kaldi, containing the lower-level code, written in C++.

Contributing

Issues, suggestions, and feature requests are welcome & encouraged. Pull requests are considered, but project structure is in flux.

Donations are appreciated to encourage development.

Author

David Zurow (@daanzu)

License

This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0-or-later). See the LICENSE.txt file for details. If this license is problematic for you, please contact me.

Acknowledgments

Based on and including code from Kaldi ASR, under the Apache-2.0 license.
Code from OpenFST and OpenFST port for Windows, under the Apache-2.0 license.
Intel Math Kernel Library, copyright (c) 2018 Intel Corporation, under the Intel Simplified Software License, currently only used for Windows build.

kaldi-active-grammar's People

Contributors

Stargazers

Watchers

Forkers

rohithkodali entn-at multimeric comodoro cocoakeith gitter-badger lexiconcode dmakarov lahiruts wolfmanstout shaunholt anniyanvr shiyuzh2007 adamchau hobbit19 ffos brucerennie hannes1 xqq2018rebuild shakacode cjbassi sumitcoder1 wil31 ishine eyonjoshua91 sciai-ai bluecamel matthewmcintire rxhmdia bbvachhani codebold madkote zhuangweiji yh646492956 centaurioun jgeofil denvaar bhavesh-97 shervinemami mqnc bogdan0083 etfre lykos153 kaiserzandrich psyrun weimeng23 jwebmeister c-nichols cliveo

kaldi-active-grammar's Issues

Fine Tuning

Do you have the procedure for fine tuning the model with our own data?
Would we use this model
https://github.com/daanzu/kaldi-active-grammar/releases/download/v1.4.0/kaldi_model_daanzu_20200328_1ep-mediumlm.zip
or
https://github.com/daanzu/kaldi-active-grammar/releases/download/v1.4.0/vosk-model-en-us-daanzu-20200328.zip

then apply this script and update the paths?
https://github.com/kaldi-asr/kaldi/blob/master/egs/rm/s5/local/chain/tuning/run_tdnn_wsj_rm_1c.sh

Whitespace in model path

I'm running into the following error when I try to use a model that contains a whitespace in its path:

[StardewSpeak] Speech engine error: ERR:     self._compiler = KaldiCompiler(self._options['model_dir'], tmp_dir=self._options['tmp_dir'],
[StardewSpeak] Speech engine error: ERR:   File "C:\Program Files (x86)\GOG Galaxy\Games\Stardew Valley\Mods\StardewSpeak\StardewSpeak\lib\speech-client\lib\site-packages\dragonfly\engines\backend_kaldi\compiler.py", line 70, in __init__
[StardewSpeak] Speech engine error: ERR:     KaldiAGCompiler.__init__(self, model_dir=model_dir, tmp_dir=tmp_dir, **kwargs)
[StardewSpeak] Speech engine error: ERR:   File "C:\Program Files (x86)\GOG Galaxy\Games\Stardew Valley\Mods\StardewSpeak\StardewSpeak\lib\speech-client\lib\site-packages\kaldi_active_grammar\compiler.py", line 201, in __init__
[StardewSpeak] Speech engine error: ERR:     self.model = Model(model_dir, tmp_dir)
[StardewSpeak] Speech engine error: ERR:   File "C:\Program Files (x86)\GOG Galaxy\Games\Stardew Valley\Mods\StardewSpeak\StardewSpeak\lib\speech-client\lib\site-packages\kaldi_active_grammar\model.py", line 221, in __init__
[StardewSpeak] Speech engine error: ERR:     self.phone_to_int_dict = { phone: i for phone, i in load_symbol_table(self.files_dict['phones.txt']) }
[StardewSpeak] Speech engine error: ERR:   File "C:\Program Files (x86)\GOG Galaxy\Games\Stardew Valley\Mods\StardewSpeak\StardewSpeak\lib\speech-client\lib\site-packages\kaldi_active_grammar\utils.py", line 187, in load_symbol_table
[StardewSpeak] Speech engine error: ERR:     with open(filename, 'r', encoding='utf-8') as f:
[StardewSpeak] Speech engine error: ERR: OSError: [Errno 22] Invalid argument: '"C:\\Users\\evfre\\.stardew speak\\models\\kaldi_model\\phones.txt"'

However, when I comment out the line self.files_dict.update({ k: '"%s"' % v for (k, v) in self.files_dict.items() if v and ' ' in v }) in the Model constructor, everything works as expected.

Kaldi is too fast

I know that this is a very good problem to have! Several times (at least since the last update) I have found Kaldi will interpret my speech well before I'm done uttering a phrase. For example, in Caster we can say "go line sixteen" to navigate to line sixteen.

Sometimes the system will interpret this as "go line six" then interpret "teen" as something completely different as Kaldi almost always trys to interpret any real speech. For me, I get "teen" interpreted as "doon" (my word for page down), for example.

Is there a setting we can tweak to prevent this from happening?

Non-exported nested rule references recognized as top level rules

In the following example i would expect nothing to happen, unless i prepend a letter keyword with "spell". However if i just say "alpha" I get the printout "RECOGNIZED SPELLING". You can see that in this example the nested referenced rule Alphabet is not exported.
_spelling.txt

For the sake of reproduction, I tested this by running:
python -m dragonfly load _spelling.py --engine kaldi -o vad_padding_end_ms=300 --no-recobs-messages

I have tried the same example with the test engine:
python -m dragonfly test _spelling.py --delay 0.1
and it worked as expected - nothing is recognized unless i prepended it with "spell", which makes me think this issue is caused by the Kaldi backend.

Name: dragonfly2 Version: 0.24.0

Name: kaldi-active-grammar Version: 1.4.0

Python 2.7.17 (v2.7.17:c2f86d86e6, Oct 19 2019, 21:01:17) [MSC v.1500 64 bit (AMD64)] on win32

Issues with Key Actions

using kaldi-caster-winpython-dev after the first shock, only does shock when I say shock 2, any time I try a something that has a key combo e.g. new tab in chrome it just prints the "t" and doesn't do ctrl + t.

Didnt have this issue on my desktop :( any ideas?

Complexity problems when using with caster (Text Manipulation)

When I follow the instructions here: https://caster.readthedocs.io/en/latest/readthedocs/Installation/Kaldi/

then say "Enable Text Manipulation", which is a moderately complex CCR grammar, Kaldi stalls and no longer accepts speech. I found that with a different grammar if I stripped out complexity right until the limit then Kaldi would still perform really well so would it be possible to tweak the settings a little to allow a bit more complexity?

cc @LexiconCode

The feature extraction technique (or the input to the neural network model) used in this implementation?

The papers with regard to DeepSpeech 1 and 2 from Baidu claimed that they used the spectrogram of power normalized audio clips as the features to the system.
The implementation of Mozilla DeepSpeech, based on TensorFlow, said that they turned to apply the MFCC features.
Now I am curious about that of your implementation for my research purpose on the comparison of various feature extraction algorithms. So, I will appreciate it if you could solve my puzzle. Thank you.

Empty Rule used in Breathe fails to compile, bombing out

From Gitter:

https://gist.github.com/droundy/5e1da5f7a67545d51184dd38a71e0c06

David Roundy @droundy Jul 29 11:51
I see an error that gives:

kaldi.compiler (DEBUG): KaldiRule(6, SG2::CommandsRef3): Compiling 4state/2arc/64byte fst.txt file to 
165c0c93589069547b50df71a9d25adf49375d35.fst
kaldi.compiler (Level 2): KaldiRule(6, SG2::CommandsRef3): FST text:
    0 1 <eps> <eps> -0.000000
    1 2 <eps> <eps> -0.000000
    3 -0.000000

David Roundy @droundy 13:03
Okay, I've enabled/disabled parts of my grammar until I got down to the culprit:

Breathe.add_commands(
    context=context,
    mapping={
       "<expr1> frac <expr2> over <expr3>":
            Exec("expr1") + Text("\\frac{") + Exec("expr2") + Text("}{") + Exec("expr3") + Text("}"),
       "<expr1> over <expr2>":
           Text("\\frac{") + Exec("expr1") + Text("}{") + Exec("expr2") + Text("}"),
       "<expr1> of <expr2>":
           Exec("expr1") + Text("\\left(") + Exec("expr2") + Text("\\right)"),
       "<expr1> e to the <expr2>":
           Exec("expr1") + Text(" e^{") + Exec("expr2") + Text("}"),
        'end math': Text('$')+Function(end_math),
    },
    extras = [
        CommandsRef("expr1", 8),
        CommandsRef("expr2", 8),
        CommandsRef("expr3", 8),
    ],
    top_level=True,
)

Obviously I'm using Breathe here, and this is not a rule that I'm even enabling, so I'm fine with just commenting it out here. But obviously, if you could pin down and announce which rule was causing trouble this would have been relatively easy for me to debug.

add python dependencies

I just installed kaldi-active-grammar on a new system. The prebuilt pip package didn't depend on sounddevice and webrtcvad, and I got runtime errors from my grammar until I installed them. It would be good to list these as python dependencies in the pip package, assuming they would be required by any grammar.

How can I configure cmvn?

I have a model trained with cmn and without ivector, I want to konw how to configure cmvn to make the model work.

Import vocabulary

Would be neat to be able to import vocabularies from DNS. When exported, they are of the format:

<text_to_print>\\<spoken phrase>
<text_to_print>

Where if spoken phrase isn't specified it is implied that the text and spoken phrase are the same.

Here's a snippet from my real one:

Guo & Costello\\Guo and Costello
Hoel & Sterner\\whole and sterner
Hsiang\\Solomon Chung last name
htop
iffae
insolation\\word insulation
kia ora\\kyah order
kia ora koutou
[email protected]\\my gmail
knitr\\knitter
Kupe
Landcare Research
LaTeX
Lebesque
LINZ

Is this feasible in Kaldi?

Alternate dictation source: Dragon / Natlink

I don't have experience with Natlink, and don't currently have Dragon installed, but I'd be happy to help implementing this.

Is there a way with Natlink to just get straight dictation recognition text from audio data passed to it?

Model and working directories should be separate

KaldiAG currently writes to several files in the model directory (such as file_cache.json, align_lexicon.int etc.), even when a separate temp directory is specified.

I think this breaks standard expectations of the model dir being a data repository, rather than a working directory for KaldiAG. It might also lead to hard-to-debug issues if multiple instances of KaldiAG are using the same model directory. When installing KaldiAG models for all users on a Linux system (e.g. using a package manager), they will likely be located under /usr/share, and will be read-only for unprivileged users, which again will lead to failure.

The best approach IMO would be to allow the user to specify a "working directory" when constructing a Compiler object (the default could be the model directory as it is now). This will enable a clean separation of immutable model data and mutable working cache if the application or the installation environment requires it.

Portaudio related crashes with kag-v1.8.0

Today, I pulled the update to kag-1.8.0 and downloaded the new large language model. After upgrade of kaldi-active-grammar I ran python setup.py bdist_wheel to ensure that kaldi is rebuild as well and of course pulled the updates for dragonfly.

However, after a few voice commands I am experiencing crashes:

Speech start detected.
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
Traceback (most recent call last):
  File "kaldi_module_loader_plus.py", line 180, in <module>
    self._target(*self._args, **self._kwargs)
  File "/home/user/versioned/voice-coding/dragonfly/dragonfly/engines/backend_kaldi/audio.py", line 100, in _reader_thread
    main()
    in_data, overflowed = self.stream.read(self.stream.blocksize)
  File "kaldi_module_loader_plus.py", line 174, in main
  File "/home/user/versioned/voice-coding/env/lib/python3.8/site-packages/sounddevice.py", line 1196, in read
    engine.do_recognition(on_begin, on_recognition, on_failure)
  File "/home/user/versioned/voice-coding/dragonfly/dragonfly/engines/base/engine.py", line 260, in do_recognition
    _check(err)
  File "/home/user/versioned/voice-coding/env/lib/python3.8/site-packages/sounddevice.py", line 2653, in _check
    self._do_recognition(*args, **kwargs)
  File "/home/user/versioned/voice-coding/dragonfly/dragonfly/engines/backend_kaldi/engine.py", line 372, in _do_recognition
    raise PortAudioError(errormsg, err)
    kaldi_rule, words, words_are_dictation_mask, in_dictation = self._compiler.parse_partial_output(output)
sounddevice.PortAudioError: Stream is stopped [PaErrorCode -9983]
  File "/home/user/versioned/voice-coding/kaldi-active-grammar/kaldi_active_grammar/compiler.py", line 573, in parse_partial_output
    assert nonterm_token.startswith('#nonterm:rule')
AssertionError

Personalized/Incremental Speech Model Training

I'm not sure if Kaldi is capable of this, but DPI 15 seems to allow for "incremental" training, where the program starts with a base model, then very quickly learns from the user's training speech. On mine it seems to get better in a few seconds after just saving the user profile. Is Kaldi capable of this type of learning?

Error on Linux: Did not reach requested beam in determinize-lattice

I'm running Kaldi through Caster on Linux (Kubuntu).
After the start of listening no commands are activated. After very roughly ~20 commands. I get the error below. After which everything seems to be working as expected.
It happens every time on Linux with both the small and big models (20200905_1ep ones)
On Windows 10 (on the same laptop) it works as expected.

- Starting Caster v 1.6.16 -
INFO:engine:Listening...
[KALDI severity=-1] Did not reach requested beam in determinize-lattice: size exceeds maximum 50000000 bytes; (repo,arcs,elems) = (31613856,742496,17659488), after rebuilding, repo size was 21616064, effective beam was 7.36808 vs. requested beam 8

log of Caster
log of Caster with debug logging mode

PortAudioError from sounddevice

I'm getting the error below on startup when running Caster with Kaldi backend. I've tried kaldi-active-grammar versions 1.8.0 and 1.8.1, as well as both the latest medium and big models (kaldi_model_daanzu_20200905_1ep-mediumlm, kaldi_model_daanzu_20200905_1ep-biglm). I'm on an enterprise Dell workstation with builtin Realtek audio. I've tried a cheap USB headset as well as the Scarlett 2i2 audio interface. The crash happens in all configurations.

    If this free, open source engine is valuable to you, please consider donating
    https://github.com/daanzu/kaldi-active-grammar
    Disable message by calling `kaldi_active_grammar.disable_donation_message()`
Traceback (most recent call last):
  File "C:\Python27-64bit\lib\runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "C:\Python27-64bit\lib\runpy.py", line 72, in _run_code
    exec code in run_globals
  File "C:\Python27-64bit\lib\site-packages\dragonfly\__main__.py", line 408, in <module>
    main()
  File "C:\Python27-64bit\lib\site-packages\dragonfly\__main__.py", line 403, in main
    return_code = func(args)
  File "C:\Python27-64bit\lib\site-packages\dragonfly\__main__.py", line 174, in cli_cmd_load
    with engine.connection():
  File "C:\Python27-64bit\lib\site-packages\dragonfly\engines\base\engine.py", line 50, in __enter__
    self._engine.connect()
  File "C:\Python27-64bit\lib\site-packages\dragonfly\engines\backend_kaldi\engine.py", line 197, in connect
    reconnect_callback=self._options['audio_reconnect_callback'],
  File "C:\Python27-64bit\lib\site-packages\dragonfly\engines\backend_kaldi\audio.py", line 228, in __init__
    super(VADAudio, self).__init__(**kwargs)
  File "C:\Python27-64bit\lib\site-packages\dragonfly\engines\backend_kaldi\audio.py", line 75, in __init__
    self._connect(start=start)
  File "C:\Python27-64bit\lib\site-packages\dragonfly\engines\backend_kaldi\audio.py", line 89, in _connect
    callback=proxy_callback if not self.self_threaded else None,
  File "C:\Python27-64bit\lib\site-packages\sounddevice.py", line 1153, in __init__
    **_remove_self(locals()))
  File "C:\Python27-64bit\lib\site-packages\sounddevice.py", line 861, in __init__
    'Error opening {0}'.format(self.__class__.__name__))
  File "C:\Python27-64bit\lib\site-packages\sounddevice.py", line 2651, in _check
    raise PortAudioError(errormsg, err, hosterror_info)
sounddevice.PortAudioError: Error opening RawInputStream: Unanticipated host error [PaErrorCode -9999]: 'Undefined external error.' [MME error 1]

On what kind of datasets does the model trained on?

What are the datasets does this kaldi active grammar model trained on?
If you would have included public datasets, could you name them?
The pretrained model which you mentioned, is that Zamia speech model?

Particular mappings not recognized for non-exported subrules

I am having some problems with non-exported rules. I am having this top level ccr rule (which is exported) which in turn references subrules via repetitions/alternatives. Those subrules are non-exported. I now found this weird case, where one of my mapping entries works if and only if I set the associated subrule to exported. All other mapping entries of the same subrule work as expected either way. I first thought that this is an issue with dragonfly, as kaldi indeed reports that it only found shitty matches in case I set the subrule to non-exported while giving a very low error rate and a positive match if I set the subrule to exported.

However, the mapping entry works correctly if I use dragonfly's text engine. So it looks like the problem does stem from kaldi-active-grammar.

This is an excerpt from the respective rule:

        "open <text> dot <toplevel_domain>": Key("o") + Text("%(text)s.%(toplevel_domain)s") + Key("enter"),
        "edit <text> dot <toplevel_domain>": Key("o") + Text("%(text)s.%(toplevel_domain)s"),

"open google dot com" gives complete garbage:

LOG ([5.5.779~1-db0af]:GetDecodedString():agf-nnet3.cc:698)     1st best: #nonterm:rule6 alt rangle git add comma #nonterm:end
LOG ([5.5.779~1-db0af]:GetDecodedString():agf-nnet3.cc:699)     2nd best: #nonterm:rule6 harp rangle git add comma #nonterm:end
VLOG[1] ([5.5.779~1-db0af]:stop():utils.h:32) ExecutionTimer: Completed confidence in 2656 microseconds
LOG ([5.5.779~1-db0af]:GetDecodedString():agf-nnet3.cc:712) MBR(SER): 2.11014 : #nonterm:rule6 alt rangle git add comma #nonterm:end

While "edit google dot com" works as expected:

LOG ([5.5.779~1-db0af]:GetDecodedString():agf-nnet3.cc:698)     1st best: #nonterm:rule4 edit #nonterm:dictation google #nonterm:end dot com #nonterm:end
LOG ([5.5.779~1-db0af]:GetDecodedString():agf-nnet3.cc:699)     2nd best: #nonterm:rule4 air edit #nonterm:dictation google #nonterm:end dot com #nonterm:end
VLOG[1] ([5.5.779~1-db0af]:stop():utils.h:32) ExecutionTimer: Completed confidence in 232 microseconds
LOG ([5.5.779~1-db0af]:GetDecodedString():agf-nnet3.cc:712) MBR(SER): 0.0165027 : #nonterm:rule4 edit #nonterm:dictation google #nonte

KaldiAG 2.0.0 crashes with "Illegal instruction" on Linux

Create a Linux VM using QEMU on a Linux host (I use Fedora 33 as both guest and host, and GNOME Boxes to create the VM)
Install KaldiAG 2.0.0 with the "big" language model
Run full_example.py from the examples directory

KaldiAG crashes with

Illegal instruction (core dumped)

This does not happen with KaldiAG 1.8.0 on the same VM.

Running the example under gdb, I get

Thread 1 "python" received signal SIGILL, Illegal instruction.
0x00007fffd2434278 in ddot_k ()
   from [...]/kaldi_active_grammar/exec/linux/../../../kaldi_active_grammar.libs/libkaldi-matrix-01efe3e4.so

I don't have debug symbols, so that's as deep as I can go without building everything from scratch.

My guess is that for some reason, OpenBLAS is misdetecting what instruction set extensions are available, and tries to use architecture-specific optimized code that the (virtual) CPU cannot execute. If that is the case, there is probably a bug in how the KaldiAG wheels are built though, because OpenBLAS certainly works on VMs (NumPy would just fall apart otherwise).

prebuild models: decision tree

please consider including the decision tree file with pre-built models (./am/tree)

this file is required to use the existing kaldi wsj utils/mkgraph.sh scripts with your models

What is the vocab size in kaldi-active-grammar?

Is there a future for Text to Speech integration in Kaldi engine ? A way around ?

Hello,
I'm interrrested in having some Text to Speech feedback while using the voice commands.
I assume it isn't planned for integration in this engine yet, so is there a way around we could use ? (I would like to keep using kaldi-active-grammar as engine and no internet connection)

def speak(self, text):
        """ Speak the given *text* using text-to-speech. """
        # FIXME
        self._log.warning("Text-to-speech is not implemented for this engine; printing text instead.")
        print_(text)

Exception: Command exited with status 1: steps/nnet3/get_egs.sh

hello,

I am a beginner under Kaldi and I'am trying to finetune danzu model by mini-librispeech data (juste a simple try) to understand the process.

I have firsty prepared data, coputed MFCC and i have then used this script for finetuning https://github.com/kaldi-asr/kaldi/blob/master/egs/aishell2/s5/local/nnet3/tuning/finetune_tdnn_1a.sh

I have used https://github.com/daanzu/kaldi-active-grammar/releases/download/v1.8.0/tree_sp.zip as it is the tree directory for the most recent models (i have used mainly the ali.x.gz files).

i have faced the below issue:

2021-05-30 14:27:20,165 [steps/nnet3/train_dnn.py:36 - - INFO ] Starting DNN trainer (train_dnn.py)
steps/nnet3/train_dnn.py --stage=-10 --cmd=run.pl --mem 4G --feat.cmvn-opts=--norm-means=false --norm-vars=false --trainer.input-model exp/nnet3/tdnn_sp_train/input.raw --trainer.num-epochs 5 --trainer.optimization.num-jobs-initial 1 --trainer.optimization.num-jobs-final 1 --trainer.optimization.initial-effective-lrate 0.0005 --trainer.optimization.final-effective-lrate 0.00002 --trainer.optimization.minibatch-size 1024 --feat-dir data/train_hires --lang data/lang --ali-dir exp/train_ali --dir exp/nnet3/tdnn_sp_train
['steps/nnet3/train_dnn.py', '--stage=-10', '--cmd=run.pl --mem 4G', '--feat.cmvn-opts=--norm-means=false --norm-vars=false', '--trainer.input-model', 'exp/nnet3/tdnn_sp_train/input.raw', '--trainer.num-epochs', '5', '--trainer.optimization.num-jobs-initial', '1', '--trainer.optimization.num-jobs-final', '1', '--trainer.optimization.initial-effective-lrate', '0.0005', '--trainer.optimization.final-effective-lrate', '0.00002', '--trainer.optimization.minibatch-size', '1024', '--feat-dir', 'data/train_hires', '--lang', 'data/lang', '--ali-dir', 'exp/train_ali', '--dir', 'exp/nnet3/tdnn_sp_train']
2021-05-30 14:27:20,172 [steps/nnet3/train_dnn.py:178 - train - INFO ] Arguments for the experiment
{'ali_dir': 'exp/train_ali',
'backstitch_training_interval': 1,
'backstitch_training_scale': 0.0,
'cleanup': True,
'cmvn_opts': '--norm-means=false --norm-vars=false',
'combine_sum_to_one_penalty': 0.0,
'command': 'run.pl --mem 4G',
'compute_per_dim_accuracy': False,
'dir': 'exp/nnet3/tdnn_sp_train',
'do_final_combination': True,
'dropout_schedule': None,
'egs_command': None,
'egs_dir': None,
'egs_opts': None,
'egs_stage': 0,
'email': None,
'exit_stage': None,
'feat_dir': 'data/train_hires',
'final_effective_lrate': 2e-05,
'frames_per_eg': 8,
'initial_effective_lrate': 0.0005,
'input_model': 'exp/nnet3/tdnn_sp_train/input.raw',
'lang': 'data/lang',
'max_lda_jobs': 10,
'max_models_combine': 20,
'max_objective_evaluations': 30,
'max_param_change': 2.0,
'minibatch_size': '1024',
'momentum': 0.0,
'num_epochs': 5.0,
'num_jobs_compute_prior': 10,
'num_jobs_final': 1,
'num_jobs_initial': 1,
'num_jobs_step': 1,
'online_ivector_dir': None,
'preserve_model_interval': 100,
'presoftmax_prior_scale_power': -0.25,
'prior_subset_size': 20000,
'proportional_shrink': 0.0,
'rand_prune': 4.0,
'remove_egs': True,
'reporting_interval': 0.1,
'samples_per_iter': 400000,
'shuffle_buffer_size': 5000,
'srand': 0,
'stage': -10,
'train_opts': [],
'use_gpu': 'yes'}
nnet3-info exp/nnet3/tdnn_sp_train/input.raw
2021-05-30 14:27:20,373 [steps/nnet3/train_dnn.py:238 - train - INFO ] Generating egs
steps/nnet3/get_egs.sh --cmd run.pl --mem 4G --cmvn-opts --norm-means=false --norm-vars=false --online-ivector-dir --left-context 34 --right-context 34 --left-context-initial -1 --right-context-final -1 --stage 0 --samples-per-iter 400000 --frames-per-eg 8 --srand 0 data/train_hires exp/train_ali exp/nnet3/tdnn_sp_train/egs
steps/nnet3/get_egs.sh: creating egs. To ensure they are not deleted later you can do: touch exp/nnet3/tdnn_sp_train/egs/.nodelete
steps/nnet3/get_egs.sh: feature type is raw, with 'apply-cmvn'
steps/nnet3/get_egs.sh: working out number of frames of training data
steps/nnet3/get_egs.sh: working out feature dim
*** steps/nnet3/get_egs.sh: warning: the --frames-per-eg is too large to generate one archive with
*** as many as --samples-per-iter egs in it. Consider reducing --frames-per-eg.
steps/nnet3/get_egs.sh: creating 1 archives, each with 238983 egs, with
steps/nnet3/get_egs.sh: 8 labels per example, and (left,right) context = (34,34)
steps/nnet3/get_egs.sh: copying data alignments
copy-int-vector ark:- ark,scp:exp/nnet3/tdnn_sp_train/egs/ali.ark,exp/nnet3/tdnn_sp_train/egs/ali.scp
ERROR (copy-int-vector[5.5.929~1539-9bca2]:ReadBasicType():base/io-funcs-inl.h:68) ReadBasicType: did not get expected integer type, 0 vs. 4. You can change this code to successfully read it later, if needed.

[ Stack-Trace: ]
/opt/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x793) [0x7f27463671c3]
copy-int-vector(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x25) [0x5650dbd339dd]
copy-int-vector(kaldi::BasicVectorHolder::Read(std::istream&)+0xba9) [0x5650dbd3adcb]
copy-int-vector(kaldi::SequentialTableReaderArchiveImpl<kaldi::BasicVectorHolder >::Next()+0xf3) [0x5650dbd3b159]
copy-int-vector(main+0x484) [0x5650dbd3320d]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f2745f5f0b3]
copy-int-vector(_start+0x2e) [0x5650dbd32cce]

WARNING (copy-int-vector[5.5.9291539-9bca2]:Read():util/kaldi-holder-inl.h:308) BasicVectorHolder::Read, read error or unexpected data at archive entry beginning at file position 18446744073709551615
WARNING (copy-int-vector[5.5.9291539-9bca2]:Next():util/kaldi-table-inl.h:574) Object read failed, reading archive standard input
LOG (copy-int-vector[5.5.9291539-9bca2]:main():copy-int-vector.cc:83) Copied 2697018 vectors of int32.
ERROR (copy-int-vector[5.5.9291539-9bca2]:~SequentialTableReaderArchiveImpl():util/kaldi-table-inl.h:678) TableReader: error detected closing archive standard input

[ Stack-Trace: ]
/opt/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x793) [0x7f27463671c3]
copy-int-vector(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x25) [0x5650dbd339dd]
copy-int-vector(kaldi::SequentialTableReaderArchiveImpl<kaldi::BasicVectorHolder >::~SequentialTableReaderArchiveImpl()+0x121) [0x5650dbd3734d]
copy-int-vector(kaldi::SequentialTableReaderArchiveImpl<kaldi::BasicVectorHolder >::~SequentialTableReaderArchiveImpl()+0xd) [0x5650dbd3764d]
copy-int-vector(kaldi::SequentialTableReader<kaldi::BasicVectorHolder >::~SequentialTableReader()+0x16) [0x5650dbd37816]
copy-int-vector(main+0x520) [0x5650dbd332a9]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f2745f5f0b3]
copy-int-vector(_start+0x2e) [0x5650dbd32cce]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
what(): kaldi::KaldiFatalError

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe
steps/nnet3/get_egs.sh: line 272: 1101381 Exit 1 for id in $(seq $num_ali_jobs);
do
gunzip -c $alidir/ali.$id.gz;
done
1101382 Aborted (core dumped) | copy-int-vector ark:- ark,scp:$dir/ali.ark,$dir/ali.scp
Traceback (most recent call last):
File "steps/nnet3/train_dnn.py", line 459, in main
train(args, run_opts)
File "steps/nnet3/train_dnn.py", line 253, in train
stage=args.egs_stage)
File "steps/libs/nnet3/train/frame_level_objf/acoustic_model.py", line 61, in generate_egs
egs_opts=egs_opts if egs_opts is not None else ''))
File "steps/libs/common.py", line 129, in execute_command
p.returncode, command))
Exception: Command exited with status 1: steps/nnet3/get_egs.sh --cmd "run.pl --mem 4G" --cmvn-opts "--norm-means=false --norm-vars=false" --online-ivector-dir "" --left-context 34 --right-context 34 --left-context-initial -1 --right-context-final -1 --stage 0 --samples-per-iter 400000 --frames-per-eg 8 --srand 0 data/train_hires exp/train_ali exp/nnet3/tdnn_sp_train/egs

(GPU) /scratch/asma-kaldi/egs/aishell2/s5$ bash local/nnet3/tuning/finetune_tdnn_1a.sh
2021-05-30 14:31:18,075 [steps/nnet3/train_dnn.py:36 - - INFO ] Starting DNN trainer (train_dnn.py)
steps/nnet3/train_dnn.py --stage=-10 --cmd=run.pl --mem 4G --feat.cmvn-opts=--norm-means=false --norm-vars=false --trainer.input-model exp/nnet3/tdnn_sp_train/input.raw --trainer.num-epochs 5 --trainer.optimization.num-jobs-initial 1 --trainer.optimization.num-jobs-final 1 --trainer.optimization.initial-effective-lrate 0.0005 --trainer.optimization.final-effective-lrate 0.00002 --trainer.optimization.minibatch-size 1024 --feat-dir data/train_hires --lang data/lang --ali-dir exp/train_ali --dir exp/nnet3/tdnn_sp_train
['steps/nnet3/train_dnn.py', '--stage=-10', '--cmd=run.pl --mem 4G', '--feat.cmvn-opts=--norm-means=false --norm-vars=false', '--trainer.input-model', 'exp/nnet3/tdnn_sp_train/input.raw', '--trainer.num-epochs', '5', '--trainer.optimization.num-jobs-initial', '1', '--trainer.optimization.num-jobs-final', '1', '--trainer.optimization.initial-effective-lrate', '0.0005', '--trainer.optimization.final-effective-lrate', '0.00002', '--trainer.optimization.minibatch-size', '1024', '--feat-dir', 'data/train_hires', '--lang', 'data/lang', '--ali-dir', 'exp/train_ali', '--dir', 'exp/nnet3/tdnn_sp_train']
2021-05-30 14:31:18,082 [steps/nnet3/train_dnn.py:178 - train - INFO ] Arguments for the experiment
{'ali_dir': 'exp/train_ali',
'backstitch_training_interval': 1,
'backstitch_training_scale': 0.0,
'cleanup': True,
'cmvn_opts': '--norm-means=false --norm-vars=false',
'combine_sum_to_one_penalty': 0.0,
'command': 'run.pl --mem 4G',
'compute_per_dim_accuracy': False,
'dir': 'exp/nnet3/tdnn_sp_train',
'do_final_combination': True,
'dropout_schedule': None,
'egs_command': None,
'egs_dir': None,
'egs_opts': None,
'egs_stage': 0,
'email': None,
'exit_stage': None,
'feat_dir': 'data/train_hires',
'final_effective_lrate': 2e-05,
'frames_per_eg': 8,
'initial_effective_lrate': 0.0005,
'input_model': 'exp/nnet3/tdnn_sp_train/input.raw',
'lang': 'data/lang',
'max_lda_jobs': 10,
'max_models_combine': 20,
'max_objective_evaluations': 30,
'max_param_change': 2.0,
'minibatch_size': '1024',
'momentum': 0.0,
'num_epochs': 5.0,
'num_jobs_compute_prior': 10,
'num_jobs_final': 1,
'num_jobs_initial': 1,
'num_jobs_step': 1,
'online_ivector_dir': None,
'preserve_model_interval': 100,
'presoftmax_prior_scale_power': -0.25,
'prior_subset_size': 20000,
'proportional_shrink': 0.0,
'rand_prune': 4.0,
'remove_egs': True,
'reporting_interval': 0.1,
'samples_per_iter': 400000,
'shuffle_buffer_size': 5000,
'srand': 0,
'stage': -10,
'train_opts': [],
'use_gpu': 'yes'}
nnet3-info exp/nnet3/tdnn_sp_train/input.raw
2021-05-30 14:31:18,286 [steps/nnet3/train_dnn.py:238 - train - INFO ] Generating egs
steps/nnet3/get_egs.sh --cmd run.pl --mem 4G --cmvn-opts --norm-means=false --norm-vars=false --online-ivector-dir --left-context 34 --right-context 34 --left-context-initial -1 --right-context-final -1 --stage 0 --samples-per-iter 400000 --frames-per-eg 8 --srand 0 data/train_hires exp/train_ali exp/nnet3/tdnn_sp_train/egs
steps/nnet3/get_egs.sh: creating egs. To ensure they are not deleted later you can do: touch exp/nnet3/tdnn_sp_train/egs/.nodelete
steps/nnet3/get_egs.sh: feature type is raw, with 'apply-cmvn'
steps/nnet3/get_egs.sh: working out number of frames of training data
steps/nnet3/get_egs.sh: working out feature dim
*** steps/nnet3/get_egs.sh: warning: the --frames-per-eg is too large to generate one archive with
*** as many as --samples-per-iter egs in it. Consider reducing --frames-per-eg.
steps/nnet3/get_egs.sh: creating 1 archives, each with 238983 egs, with
steps/nnet3/get_egs.sh: 8 labels per example, and (left,right) context = (34,34)
steps/nnet3/get_egs.sh: copying data alignments
copy-int-vector ark:- ark,scp:exp/nnet3/tdnn_sp_train/egs/ali.ark,exp/nnet3/tdnn_sp_train/egs/ali.scp
ERROR (copy-int-vector[5.5.9291539-9bca2]:ReadBasicType():base/io-funcs-inl.h:68) ReadBasicType: did not get expected integer type, 0 vs. 4. You can change this code to successfully read it later, if needed.

[ Stack-Trace: ]
/opt/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x793) [0x7f3acf1181c3]
copy-int-vector(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x25) [0x55f3b30bd9dd]
copy-int-vector(kaldi::BasicVectorHolder::Read(std::istream&)+0xba9) [0x55f3b30c4dcb]
copy-int-vector(kaldi::SequentialTableReaderArchiveImpl<kaldi::BasicVectorHolder >::Next()+0xf3) [0x55f3b30c5159]
copy-int-vector(main+0x484) [0x55f3b30bd20d]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f3aced100b3]
copy-int-vector(_start+0x2e) [0x55f3b30bccce]

[ Stack-Trace: ]
/opt/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x793) [0x7f3acf1181c3]
copy-int-vector(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x25) [0x55f3b30bd9dd]
copy-int-vector(kaldi::SequentialTableReaderArchiveImpl<kaldi::BasicVectorHolder >::~SequentialTableReaderArchiveImpl()+0x121) [0x55f3b30c134d]
copy-int-vector(kaldi::SequentialTableReaderArchiveImpl<kaldi::BasicVectorHolder >::~SequentialTableReaderArchiveImpl()+0xd) [0x55f3b30c164d]
copy-int-vector(kaldi::SequentialTableReader<kaldi::BasicVectorHolder >::~SequentialTableReader()+0x16) [0x55f3b30c1816]
copy-int-vector(main+0x520) [0x55f3b30bd2a9]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f3aced100b3]
copy-int-vector(_start+0x2e) [0x55f3b30bccce]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
what(): kaldi::KaldiFatalError

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe
steps/nnet3/get_egs.sh: line 272: 1101630 Exit 1 for id in $(seq $num_ali_jobs);
do
gunzip -c $alidir/ali.$id.gz;
done
1101631 Aborted (core dumped) | copy-int-vector ark:- ark,scp:$dir/ali.ark,$dir/ali.scp
Traceback (most recent call last):
File "steps/nnet3/train_dnn.py", line 459, in main
train(args, run_opts)
File "steps/nnet3/train_dnn.py", line 253, in train
stage=args.egs_stage)
File "steps/libs/nnet3/train/frame_level_objf/acoustic_model.py", line 61, in generate_egs
egs_opts=egs_opts if egs_opts is not None else ''))
File "steps/libs/common.py", line 129, in execute_command
p.returncode, command))
Exception: Command exited with status 1: steps/nnet3/get_egs.sh --cmd "run.pl --mem 4G" --cmvn-opts "--norm-means=false --norm-vars=false" --online-ivector-dir "" --left-context 34 --right-context 34 --left-context-initial -1 --right-context-final -1 --stage 0 --samples-per-iter 400000 --frames-per-eg 8 --srand 0 data/train_hires exp/train_ali exp/nnet3/tdnn_sp_train/egs

Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group

You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/60e0ba04-247d-4a26-93f8-8bdd106ca987n%40googlegroups.com.

Any idea please ( for information, i am using two GPUs with 25 GB each )?

Error switching to new model

Hi, just tried upgrading to the new (medium) model and I'm getting the following error message as soon as recognition begins. I'm on the latest versions and this grammar works fine on the zamia model. I've tried grepping a_I and it isn't coming from my files I don't think.

Traceback (most recent call last):
  File "kaldi_module_loader_medium.py", line 162, in <module>
    main()
  File "kaldi_module_loader_medium.py", line 140, in main
    engine.connect()
  File "C:\ProgramData\Anaconda3\lib\site-packages\dragonfly\engines\backend_kaldi\engine.py", line 132, in connect
    cloud_dictation_lang=self._options['cloud_dictation_lang'],
  File "C:\ProgramData\Anaconda3\lib\site-packages\dragonfly\engines\backend_kaldi\compiler.py", line 73, in __init__
    KaldiAGCompiler.__init__(self, model_dir=model_dir, tmp_dir=tmp_dir, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\kaldi_active_grammar\compiler.py", line 194, in __init__
    self.model = Model(model_dir, tmp_dir)
  File "C:\ProgramData\Anaconda3\lib\site-packages\kaldi_active_grammar\model.py", line 198, in __init__
    self.generate_lexicon_files()
  File "C:\ProgramData\Anaconda3\lib\site-packages\kaldi_active_grammar\model.py", line 279, in generate_lexicon_files
    generate_file('align_lexicon.int', lambda word, word_id, phones:
  File "C:\ProgramData\Anaconda3\lib\site-packages\kaldi_active_grammar\model.py", line 275, in generate_file
    file.write(write_func(word, word_id, phones) + '\n')
  File "C:\ProgramData\Anaconda3\lib\site-packages\kaldi_active_grammar\model.py", line 280, in <lambda>
    str_space_join([word_id, word_id] + [self.phone_to_int_dict[phone] for phone in phones]))
  File "C:\ProgramData\Anaconda3\lib\site-packages\kaldi_active_grammar\model.py", line 280, in <listcomp>
    str_space_join([word_id, word_id] + [self.phone_to_int_dict[phone] for phone in phones]))
KeyError: 'a_I'

Let me know if you need any more details :) thanks

Edit: 892ab77 maybe the fix for this? I tried to update to the latest but there is an assert statement in
the dragonfly engine code which fails on the dev version

audio_self_threaded cannot be set from command line

Using the command line below, Kaldi reports audio_self_threaded=True on startup.

%PYTHON_PATH%\python -m dragonfly load _caster.py --engine kaldi --engine-options "audio_self_threaded=False" --no-recobs-messages

kaldi-active-grammar 1.8.1
dragonfly2[kaldi] 0.28.1

Python 3 compatibility

Hey, I've been checking out the package and it looks really cool!

Currently KAG is targeting Python 2 and raises exceptions in Python 3. In my brief attempts at running KAG on py3, the main changes seemed to be bytes/unicode issues and module imports. I realize it can be a bit of a pain to write polyglot. Would you be interested in supporting both versions?

Other languages

In the future, it should be able to support a lot more, but the work is in finding decent models for other languages, and then some minor modifications to enable their use in KaldiAG.

dictation-toolbox/dragonfly#241

Getting breathe working

Hi, the dragonfly API I'm working on has initial features implemented, but it does a couple of slightly unorthodox things to manage grammars and I'm struggling to debug it on kaldi. I've made an issue to track the bugs here.

The grammar management works by generating grammars as necessary on the fly during process_begin. The implementation is pretty efficient so at least on natlink this is done quickly enough to be imperceptible.

Issues so far:

Grammars loaded with empty lists cause an exception during compilation. I am using lists to keep track of contexts which are activated using an "enable" command (as in caster).
After all grammars are loaded, the first phrase starts, I get a message saying that a new grammar is being loaded (as expected), and then I just get "press any key to continue..." (implying that the program exited and i'm back in the batch script). It looks like the new grammar is loaded correctly but the recognition callback is never reached. Do you know what might be causing the program to exit? pdb is not catching the exit, so not even sure it's Python code that is causing this but I don't know enough about kaldi to be confident about anything.
EDIT: setting lazy_compilation=False seems to fix the problem so I am assuming it is some sort of concurrency bug with grammars being loaded at the same time as recognition occurring. Initial start-up is still significantly quicker than with a caster-based system, and the slight delay when a new context is encountered is hardly noticeable.

The grammar I'm using to test is just:

from dragonfly import *
from breathe import Breathe

Breathe.add_commands(
    None,
    {
        "test": Text("test")
    }
)

Thanks!

Version lock NumPy to `1.19.3`

There is an ongoing issue with NumPy 1.19.4 specific to Windows 10, version 20H2 and Windows Server, version 20H2 with AMD processors.

Downgrading pip install numpy==1.19.3 fixes the issue.

ERROR:engine.kaldi:Exception during import of Kaldi engine module: The current Numpy installation ('C:\\Users\\Main\\Desktop\\Caster3-Test-Kaldi-py3-64bit\\WPy64-3860\\python-3.8.6.amd64\\lib\\site-packages\\numpy\\__init__.py') fails to pass a sanity check due to a bug in the windows runtime. See this issue for more information: https://tinyurl.com/y3dm3h86
Traceback (most recent call last):
  File "C:\Users\Main\Desktop\Caster3-Test-Kaldi-py3-64bit\WPy64-3860\python-3.8.6.amd64\lib\site-packages\dragonfly\engines\backend_kaldi\__init__.py", line 51, in is_engine_available
    from .engine import KaldiEngine
  File "C:\Users\Main\Desktop\Caster3-Test-Kaldi-py3-64bit\WPy64-3860\python-3.8.6.amd64\lib\site-packages\dragonfly\engines\backend_kaldi\engine.py", line 30, in <module>
    import kaldi_active_grammar
  File "C:\Users\Main\Desktop\Caster3-Test-Kaldi-py3-64bit\WPy64-3860\python-3.8.6.amd64\lib\site-packages\kaldi_active_grammar\__init__.py", line 22, in <module>
    from .wrapper import KaldiAgfNNet3Decoder, KaldiPlainNNet3Decoder
  File "C:\Users\Main\Desktop\Caster3-Test-Kaldi-py3-64bit\WPy64-3860\python-3.8.6.amd64\lib\site-packages\kaldi_active_grammar\wrapper.py", line 17, in <module>
    import numpy as np
  File "C:\Users\Main\Desktop\Caster3-Test-Kaldi-py3-64bit\WPy64-3860\python-3.8.6.amd64\lib\site-packages\numpy\__init__.py", line 305, in <module>
    _win_os_check()
  File "C:\Users\Main\Desktop\Caster3-Test-Kaldi-py3-64bit\WPy64-3860\python-3.8.6.amd64\lib\site-packages\numpy\__init__.py", line 302, in _win_os_check
    raise RuntimeError(msg.format(__file__)) from None
RuntimeError: The current Numpy installation ('C:\\Users\\Main\\Desktop\\Caster3-Test-Kaldi-py3-64bit\\WPy64-3860\\python-3.8.6.amd64\\lib\\site-packages\\numpy\\__init__.py') fails to pass a sanity

trouble running sample code

python crashes when running this sample code:

##########
import sys, wave
from kaldi_active_grammar import PlainDictationRecognizer
recognizer = PlainDictationRecognizer()
wfi = 'test.wav' ### load a 16 kHz 16-bit example wav file from kaldi repository
wave_file = wave.open(wfi, 'rb')
data = wave_file.readframes(wave_file.getnframes())
print(type(data), len(data)) #bytes object with in16 audio data
output_str, likelihood = recognizer.decode_utterance(data)
print('won't get here, decode_utterance crashes python')
print(repr(output_str), likelihood)
###########

type(data), len(data) = (<class 'bytes'>, 46002)

test.wav is from here:
https://github.com/kaldi-asr/kaldi/tree/master/src/feat/test_data/test.wav

kaldi_model is from here:
https://github.com/daanzu/kaldi-active-grammar/releases/download/v1.2.0/kaldi_model_zamia.zip

kaldi_active_grammar.version = '1.2.0'
sys.version = '3.7.4 (default, Aug 9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]'

Any idea what might be going wrong? Is there a sample wav file that I should try?

Thanks

Precompiled webrtcvad Python 3

This is not so much of a problem for 2.7 due to the standalone installer "Microsoft Visual C++ Compiler for Python 2.7" or plus your precompiled version. However for Python 3 it looks like Visual Studio is required. Due to restrictions in some outside of DEV environments may not be possible for the end user to install Microsoft Python Compiler.

Feature Request: going to sleep after minutes of inactivity

DNS engine automatically goes to sleep after roughly 5 minutes of inactivity, which is a nice feature to reduce the possibility of commands being detected from random noises while we're away from the computer such as if we've run off to the bathroom! The user has to say "start listening" or similar to wake it up.

If it's easy to add an inactivity timeout to the kaldi backend, that would be handy!

network.xconfig for kaldi_model_daanzu_20200328_1ep-mediumlm

Do you have the network configurations for this model?
Is it?

    cat <<EOF > $dir/configs/network.xconfig
input dim=100 name=ivector
input dim=40 name=input

# please note that it is important to have input layer with the name=input
# as the layer immediately preceding the fixed-affine-layer to enable
# the use of short notation for the descriptor
fixed-affine-layer name=lda input=Append(-1,0,1,ReplaceIndex(ivector, t, 0)) affine-transform-file=$dir/configs/lda.mat

# the first splicing is moved before the lda layer, so no splicing here
relu-batchnorm-dropout-layer name=tdnn1 $affine_opts dim=1536
tdnnf-layer name=tdnnf2 $tdnnf_opts dim=1536 bottleneck-dim=160 time-stride=1
tdnnf-layer name=tdnnf3 $tdnnf_opts dim=1536 bottleneck-dim=160 time-stride=1
tdnnf-layer name=tdnnf4 $tdnnf_opts dim=1536 bottleneck-dim=160 time-stride=1
tdnnf-layer name=tdnnf5 $tdnnf_opts dim=1536 bottleneck-dim=160 time-stride=0
tdnnf-layer name=tdnnf6 $tdnnf_opts dim=1536 bottleneck-dim=160 time-stride=3
tdnnf-layer name=tdnnf7 $tdnnf_opts dim=1536 bottleneck-dim=160 time-stride=3
tdnnf-layer name=tdnnf8 $tdnnf_opts dim=1536 bottleneck-dim=160 time-stride=3
tdnnf-layer name=tdnnf9 $tdnnf_opts dim=1536 bottleneck-dim=160 time-stride=3
tdnnf-layer name=tdnnf10 $tdnnf_opts dim=1536 bottleneck-dim=160 time-stride=3
tdnnf-layer name=tdnnf11 $tdnnf_opts dim=1536 bottleneck-dim=160 time-stride=3
tdnnf-layer name=tdnnf12 $tdnnf_opts dim=1536 bottleneck-dim=160 time-stride=3
tdnnf-layer name=tdnnf13 $tdnnf_opts dim=1536 bottleneck-dim=160 time-stride=3
tdnnf-layer name=tdnnf14 $tdnnf_opts dim=1536 bottleneck-dim=160 time-stride=3
tdnnf-layer name=tdnnf15 $tdnnf_opts dim=1536 bottleneck-dim=160 time-stride=3
linear-component name=prefinal-l dim=256 $linear_opts

prefinal-layer name=prefinal-chain input=prefinal-l $prefinal_opts big-dim=1536 small-dim=256
output-layer name=output include-log-softmax=false dim=$num_targets $output_opts

prefinal-layer name=prefinal-xent input=prefinal-l $prefinal_opts big-dim=1536 small-dim=256
output-layer name=output-xent dim=$num_targets learning-rate-factor=$learning_rate_factor $output_opts

Bluetooth Headset: No good block received

I have the following issue with my bluetooth headset. I am on Windows 10 running caster with kaldi.
If I use separate sources for input (bluetooth microphone from headset) and output (Speakers hooked up to my screen), i get the following error.

...
INFO:engine:streaming audio from 'Headset (VXi B350-XT v3.33 Hand' using MME: 16000 sample_rate, 10 block_duration_ms, 30 latency_ms
...

*- Starting Caster v 1.6.16 -*
INFO:engine:Listening...
WARNING:engine:<dragonfly.engines.backend_kaldi.audio.VADAudio object at 0x000000000ED90E88>: no good block received recently, so reconnecting audio
Press any key to continue . . .

The mic works in audacity or for skype. Caster seems to recognize it,
but then it crashes once the caster/kaldi engine is listening.

The Intel MKL is not Free Software

Hey there, I was really excited to finally see a high-quality Kaldi distribution that can be pip installed. Unfortunately, the Intel Math Kernel Library required to build KaldiAG is not Free Software:

Its license, the "Intel Simplified Software License", is not OSI approved
The Apache Software Foundation has outright rejected the ISSL "because it is not an open source license"
The Kaldi documentation itself states that the MKL "is still a closed-source commercial product"

Therefore, building KaldiAG currently requires nonfree software, and it is questionable whether the binary wheels can even be distributed under the AGPL, as claimed on PyPI.

It would be great if it was possible to build Free Software using KaldiAG. I therefore recommend switching the build process to link against OpenBLAS instead of the MKL by default, especially for the wheels so they can be used in Free Software.

Disclaimer: I am not a lawyer and this is not legal advice.

Add a list of available models to the readme

I was wondering if you could add a section to the readme with a list of the models that you have released, so users don't have to search through the releases to find one they like. It would be good to only list the up to date models. You could just have the links point to the artifacts in the releases tab, and maybe add the model size next to the links to. Thanks!

edit: Also, It might be good to list the amount of memory each model uses since that seems to be the main bottleneck from using the larger models. And maybe described the benefits of using the larger models (i.e. larger vocabulary).

Add a google cloud speech feature flag

We should add an extra feature flag for google cloud speech that automatically adds the dependency.

Where do I download kaldi-caster-winpython-dev?

I do not see that package in your releases only kaldi-dragonfly-winpython37.zip

Thanks.

VC Redistributable missing after installing kaldi-active-grammar

After the installation of dragonfly2 and its kaldi backend (following the setup instructions), upon starting dragonfly using the kaldi engine, fstcompile process crashes with the following error:
The code execution cannot proceed because VCRUNTIME140_1.dll was not found.

Need to find a way to install the VC redistributable automatically, or update the setup instructions.

Name: dragonfly2 Version: 0.24.0
Name: kaldi-active-grammar Version: 1.4.0

Python 3 bug when make_lexicon_fst.py is run

Hey @daanzu. I've discovered a bug specific to Python 3 that occurs when first running the Dragonfly Kaldi module loader. I've posted here instead because it seems to be a KAG bug caused by running the make_lexicon_fst.py script with python instead of python3. python is Python 2.7 on my system, which would explain the confusing SyntaxError.

Not sure how to fix this. I guess you need to use the interpreter's commandline path instead?

  File "/home/dane/.local/lib/python3.7/site-packages/kaldi_active_grammar/kaldi/make_lexicon_fst.py", line 78
    sys.argv[0], line.strip(" \t\r\n"), filename), file=sys.stderr)
                                                       ^
SyntaxError: invalid syntax
Traceback (most recent call last):
  File "kaldi_module_loader.py", line 101, in <module>
    main()
  File "kaldi_module_loader.py", line 82, in main
    engine.connect()
  File "/home/dane/repos/dragonfly/dragonfly/engines/backend_kaldi/engine.py", line 123, in connect
    cloud_dictation_lang=self._options['cloud_dictation_lang'],
  File "/home/dane/repos/dragonfly/dragonfly/engines/backend_kaldi/compiler.py", line 73, in __init__
    KaldiAGCompiler.__init__(self, model_dir=model_dir, tmp_dir=tmp_dir, **kwargs)
  File "/home/dane/.local/lib/python3.7/site-packages/kaldi_active_grammar/compiler.py", line 190, in __init__
    self.model = Model(model_dir, tmp_dir)
  File "/home/dane/.local/lib/python3.7/site-packages/kaldi_active_grammar/model.py", line 197, in __init__
    self.generate_lexicon_files()
  File "/home/dane/.local/lib/python3.7/site-packages/kaldi_active_grammar/model.py", line 301, in generate_lexicon_files
    command()
  File "/home/dane/.local/lib/python3.7/site-packages/ush.py", line 585, in __call__
    return wait(procs, raise_on_error)
  File "/home/dane/.local/lib/python3.7/site-packages/ush.py", line 189, in wait
    result = tuple(iterate_outputs(procs, raise_on_error, status_codes))
  File "/home/dane/.local/lib/python3.7/site-packages/ush.py", line 221, in iterate_outputs
    raise ProcessError(process_info)
ush.ProcessError: One or more commands failed: [(['python', '/home/dane/.local/lib/python3.7/site-packages/kaldi_active_grammar/kaldi/make_lexicon_fst.py', '--left-context-phones=kaldi_model_zamia/left_context_phones.txt', '--nonterminals=kaldi_model_zamia/nonterminals.txt', '--sil-prob=0.5', '--sil-phone=SIL', '--sil-disambig=#14', 'kaldi_model_zamia/lexiconp_disambig.txt'], 24795, 1), (['/home/dane/.local/lib/python3.7/site-packages/kaldi_active_grammar/exec/linux/fstcompile', '--isymbols=kaldi_model_zamia/phones.txt', '--osymbols=kaldi_model_zamia/words.txt', '--keep_isymbols=false', '--keep_osymbols=false'], 24796, 0), (['/home/dane/.local/lib/python3.7/site-packages/kaldi_active_grammar/exec/linux/fstaddselfloops', 'kaldi_model_zamia/wdisambig_phones.int', 'kaldi_model_zamia/wdisambig_words.int'], 24797, 0), (['/home/dane/.local/lib/python3.7/site-packages/kaldi_active_grammar/exec/linux/fstarcsort', '--sort_type=olabel'], 24798, 0)]

Missing documentation: Import of a custom kaldi model

What steps are necessary to import a custom kaldi model (trained from scratch, not transfer-learned as in #33) into KAG?

In the readme it is currently stated that:

Or use your own model. Standard Kaldi models must be converted to be usable. Conversion can be performed automatically, but this hasn't been fully implemented yet.

What steps are necessary to kick off the mentioned partial implementation for automatic conversion?
What steps remain to be carried out by the user?

Issue updating lexicon in frozen app

I'm running into this error updating the lexicon in an application bundled with cx_Freeze on Windows 10. I tried manually adding all of the kaldi_active_grammar/kaldi/*.py files to the dist as cx_Freeze wasn't picking up modules invoked by utils.ExternalProcess but am still getting the same exception.

Support for arm7l or aarch64 Raspberry Pi

Hey so i tried this and it works great on my macosx.. however is there a way i can compile this for raspberry pi4 arm7l or aarch64 ? :)

full_example.py failed with no attribute 'init_decoder'

I have the following problem running the full_example.py

setup:

windows 10
pip install dragonfly2[kaldi]
pip install kaldi-active-grammar

`python full_example.py

Kaldi-Active-Grammar v1.8.3:
If this free, open source engine is valuable to you, please consider donating
https://github.com/daanzu/kaldi-active-grammar
Disable message by calling kaldi_active_grammar.disable_donation_message()
Traceback (most recent call last):
File "full_example.py", line 12, in
decoder = compiler.init_decoder()
AttributeError: 'Compiler' object has no attribute 'init_decoder'`

simple demo

Do you have a simple demo, like the one in dragonfly that doesn't require dragonfly?
I just want to transcribe an audio file with grammar weights added to the language model

Thanks

Question about the license

Copied from the gitter:

Hey, thanks for the great library @daanzu. I have a question about the license if you don't mind. I plan on writing a program that uses dragonfly with the kaldi backend, and i saw that dragonfly uses lgpl while kaldi-active-grammar uses agpl. Isn't that a license conflict? Also, i plan on using the mit license, which seems like it would also conflict with the agpl, but not with dragonfly. Thanks for the help.

Python 3.8 'time.clock()' removed

Python: 3.8
Caster: (Py 3 pull request)
kaldi_module_loader_plus: latest dragonfly master
kaldi_active_grammar: 1.2

Traceback (most recent call last):
  File "kaldi_module_loader_plus.py", line 156, in <module>
    main()
  File "kaldi_module_loader_plus.py", line 150, in main
    engine.do_recognition(on_begin, on_recognition, on_failure)
  File "C:\Users\Laptop\AppData\Local\Programs\Python\Python38\lib\site-packages\dragonfly\engines\base\engine.py", line 242, in do_recognition
    self._do_recognition(*args, **kwargs)
  File "C:\Users\Laptop\AppData\Local\Programs\Python\Python38\lib\site-packages\dragonfly\engines\backend_kaldi\engine.py", line 309, in _do_recognition
    self._decoder.decode(block, False, kaldi_rules_activity)
  File "C:\Users\Laptop\AppData\Local\Programs\Python\Python38\lib\site-packages\kaldi_active_grammar\wrapper.py", line 445, in decode
    self._start_decode_time(len(frames))
  File "C:\Users\Laptop\AppData\Local\Programs\Python\Python38\lib\site-packages\kaldi_active_grammar\wrapper.py", line 64, in _start_decode_time
    self.decode_start_time = time.clock()
AttributeError: module 'time' has no attribute 'clock'

time.clock() has been deprecated since Python 3.3

See issue
Recommended to use: time.perf_counter() or time.process_time() for py3
Not sure about a py2/py3 solution using the six or other compatibility library.

kaldi_active_grammar v2.1.0 incompatible with dragonfly2 ?

I have installed successfully the new versions of kaldi-active-grammar and dragonfly2 with 'pip install dragonfly2==0.30.1' and 'pip install kaldi-active-grammar==2.1.0'.
But any demo or script I try to run returns this message:

  File "C:\Users\xxx\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\dragonfly\engines\__init__.py", line 132, in get_engine
    raise EngineError(message)
dragonfly.engines.base.engine.EngineError: Exception while initializing kaldi engine: Incompatible kaldi_active_grammar version

I am on Windows 10 and everything worked well with kaldi-active-grammar v1.8.1, I wanted to try the new version with the fixes and changes.

the weighting of dictation mixed with commands could use improvement

From Gitter:

ileben @ileben 00:53
Now that i fully switched to Kaldi, I'm having problems with it making unexpected decisions when faced with multiple options. In the example file below (_test.txt) using any of the three example rules when i say "title space" the engine interprets it as "title S space". Basically, instead of just interpreting it as the "title <dictation=space>", it opts for a more complex option "title <dictation=S> space". In the first example both options are part of the same mapping rule with the spec for the simple option having an elevated weight, in the second option the two halves of the complex option are in a mapping rule with an elevated weight inside a repetition with a normal weight, in the third option the two halves are further split into two sub rules, with only the first half having an elevated weight. Kaldi always chooses to use the complex option and/or using the repetition and seems to ignore the weight.

daanzu / kaldi-active-grammar Goto Github PK

kaldi-active-grammar's Introduction

Kaldi Active Grammar

Features

Demo Video

Donations are appreciated to encourage development.

Related Repositories

Getting Started

Setup

Troubleshooting

Documentation

Building

Contributing

Author

License

Acknowledgments

kaldi-active-grammar's People

Contributors

Stargazers

Watchers

Forkers

kaldi-active-grammar's Issues

Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group

Recommend Projects

Recommend Topics

Recommend Org