coqui-ai / stt Goto Github PK
View Code? Open in Web Editor NEW🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
Home Page: https://coqui.ai
License: Mozilla Public License 2.0
🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
Home Page: https://coqui.ai
License: Mozilla Public License 2.0
Support Chinese stt?
Is your feature request related to a problem? Please describe.
you can only train batches in order of length of the batch (i.e. audio length), or reverse audio length
Describe the solution you'd like
I'd like to have more control over the order of batches, in particular random ordering
The .whl file DOES exist in the latest pre-release files! So maybe not a real bug anymore, but already fixed?
Describe the bug
The documentation talks about the project being runable on raspberry pi 4 or higher, but there is no .whl for ARMv7 at the moment.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
stt-tflite should have an arm compatible wheel
(after looking at the supported systems - stt should not have the arm .whl)
Environment (please complete the following information):
Additional context
I made a successful switch from deepspeech to coqui with a "manual" setup from: https://github.com/coqui-ai/STT/releases/download/v0.10.0-alpha.6/stt_tflite-0.10.0a6-cp37-cp37m-linux_armv7l.whl - so it is working, just the wheel is mising!
WIP implementation: https://github.com/project-alice-assistant/ProjectAlice/blob/1.0.0-rc1/core/asr/model/CoquiAsr.py
Thank you for making this awesome program publicly available! :)
If you have a feature request, then please provide the following information:
Is your feature request related to a problem? Please describe.
Training a 5-gram language model on 13G of text takes a very long time (hours?) and only uses one CPU.
Describe the solution you'd like
I'd like to use all my CPUs in parallel to finish the job faster.
Describe alternatives you've considered
Training the LM on a lower resource machine is about the only option I can think of, to not waste time on bigger machines.
Additional context
None.
Describe the bug
In the Node version when installing via npm install stt
(v 0.10.0-alpha.4
) and running the code example, you get an error
TypeError: Ds.Model is not a constructor
But most importantly if you look inside node_modules/stt/index.js
the file is empty.
To Reproduce
npm install stt
node_modules/stt/index.js
index.js
is emptyExpected behavior
To be able to install stt
via npm, import/require the module and use it etc..
Environment (please complete the following information):
Mac OS X
Additional context
Where's the code that is installed via npm? is it in one of the folders in this repo https://github.com/coqui-ai/STT?
The binary "generate_scorer_package" is missing.
From the manual here:
https://stt.readthedocs.io/en/latest/LANGUAGE_MODEL.html
"Finally, we package the trained KenLM model for deployment with generate_scorer_package. You can find pre-built binaries for generate_scorer_package on the official 🐸STT release page (inside native_client.*.tar.xz). "
However, no binaries are available.
If you have a feature request, then please provide the following information:
Is your feature request related to a problem? Please describe.
We have an Aarch64 device (nvidia jetson agx) and would like to use its GPU cores as running non CUDA version of coqui tends to be too slow for real time inference on our platform as well as being too CPU intensive.
Describe the solution you'd like
Add linux/Aarch64 + gpu to supported platforms
Describe alternatives you've considered
We have tried Tflite models on these devices, they run but but tend be of worse quality, and at least with deepspeech, ran slower than CUDA enabled full scale models.
With deepspeech this project https://github.com/domcross/DeepSpeech-for-Jetson-Nano/releases, managed to port 0.9.x to a jetson nano and agx. Showing at least with Deepspeech code base before your fork this is possible. We have also built it ourselves for an older version of Deepspeech but the build process was tricky and likely beyond the scope for many teams.
Description of the bug and steps to Reproduce the behaviour
I was trying to follow the "Quickstart:Deployment" section. When I ran the final command stt --model ...
It did not give any output but directly the following Traceback:
Traceback (most recent call last):
File "/home/sammit/.local/bin/stt", line 5, in <module>
from stt.client import main
File "/home/sammit/.local/lib/python3.7/site-packages/stt/__init__.py", line 23, in <module>
from stt.impl import Version as version
File "/home/sammit/.local/lib/python3.7/site-packages/stt/impl.py", line 13, in <module>
from . import _impl
ImportError: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.27' not found (required by /home/sammit/.local/lib/python3.7/site-packages/stt/lib/libstt.so)
The documentation says Coqui STT supports GLIBC>=2.19 and I have GLIBC=2.23. Please see below the output when I run ldd --version
ldd (Ubuntu GLIBC 2.23-0ubuntu11.3) 2.23
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
And my OS Description:
Operating System: Ubuntu 16.04.6 LTS
Kernel: Linux 4.4.0-184-generic
Architecture: x86-64
Describe the bug
it writes text without dot and comma
To Reproduce
Steps to reproduce the behavior:
tts --text "To help with the large amounts of pull requests, we would appreciate your reviews of other pull requests, especially simple package updates. Just leave a comment describing what you have tested in the relevant package/service. Reviewing helps to reduce the average time-to-merge for everyone. Thanks a lot if you do!"
[nix-shell:~]$ stt --model coqui-stt-0.9.3-models.pbmm --scorer coqui-stt-0.9.3-models.scorer --audio '/home/davidak/Downloads/tts-0.0.14.wav'
TensorFlow: v2.3.0-6-g23ad988fcde
Coqui STT: v0.10.0-alpha.4-74-g49cdf7a6
2021-05-21 23:11:10.956472: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
rate: rate clipped 1 samples; decrease volume?
to help with the large amounts of pull requests we would appreciate your reviews of other poll requests especially simple package up dates just leave a comment describing what you have tested in the relevant packages service reviewing helps to reduce the average time to merge for every one thanks a lot if you do
Expected behavior
In the best case, it would be exactly the same as the input here.
I also tried with spoken words from me and it ignored pauses and intonation.
Environment (please complete the following information):
Describe the bug
When running docker build -f Dockerfile.build .
from the STT directory, Dockerfile.build
redownloads the entire STT source code, instead of using the existing source code.
To Reproduce
Clone STT source code with git
Make some commits / check out the tag you want to build
docker build -f Dockerfile.build .
Expected behavior
The Dockerfile would build the source that exists in the STT directory, including any changes that have been made to the source.
Actual behaviour
The Dockerfile downloads a fresh version of the source and builds that. This is unexpected and unintuitive, and a waste of disk space, bandwidth and time.
sox
is missing when importing librispeech
libopusfile0
missing when importing MLS
Suggested by @bernardohenz:
beta_total = beta_weight * num_words
total_confidence = (confidence - beta_total) / alpha_weight # Discount weights from confidence (we're using deepspeech 0.7.1)
raw_probability = math.pow( math.e, total_confidence) # Transform logarithm to probability
perplexity = math.pow(1/raw_probability, 1/num_words) # Compute perplexity
This should help with alphabet incompatibility problems.
I have tensorflow 2.4.1 installed, but when run
python setup.py install
I got this:
Searching for tensorflow==1.15.4
Reading https://pypi.org/simple/tensorflow/
No local packages or working download links found for tensorflow==1.15.4
error: Could not find suitable distribution for Requirement.parse('tensorflow==1.15.4')
why it force a version? And I can not found where this requirements come from so that nowhere to comment it.
While running the STT docker image on Tesla K80 GPU,
docker run -it --gpus all --mount type=bind,source="$(pwd)"/stt-data,target=/code/stt-data 5adb1e5d8af5
The container starts and once the image is loaded, I'm welcomed by the following message:
ERROR: Detected NVIDIA Tesla K80 GPU, which is not supported by this container
ERROR: No supported GPU(s) detected to run this container
NOTE: Legacy NVIDIA Driver detected. Compatibility mode ENABLED.
That's the message I get while running on a p2.xlarge 1 GPU, 4 vCPUs and 61 GB RAM --- Still a Tesla K80 . I get the following message
NVIDIA Release 21.05-tf1 (build 22596046)
TensorFlow Version 1.15.5
Container image Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
Copyright 2017-2021 The TensorFlow Authors. All rights reserved.
NVIDIA Deep Learning Profiler (dlprof) Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
ERROR: Detected NVIDIA Tesla K80 GPU, which is not supported by this container
ERROR: Detected NVIDIA Tesla K80 GPU, which is not supported by this container
ERROR: Detected NVIDIA Tesla K80 GPU, which is not supported by this container
ERROR: Detected NVIDIA Tesla K80 GPU, which is not supported by this container
ERROR: Detected NVIDIA Tesla K80 GPU, which is not supported by this container
ERROR: Detected NVIDIA Tesla K80 GPU, which is not supported by this container
ERROR: Detected NVIDIA Tesla K80 GPU, which is not supported by this container
ERROR: Detected NVIDIA Tesla K80 GPU, which is not supported by this container
ERROR: No supported GPU(s) detected to run this container
NOTE: Legacy NVIDIA Driver detected. Compatibility mode ENABLED.
I'm I right to say that the this image works for a section of the NVIDIA GPUs?
After checking out the project and opening it in Android Studio (4.1), running the app throws this error:
Could not resolve all files for configuration ':app:debugRuntimeClasspath'.
Could not find ai.coqui:libstt:0.9.3.
Searched in the following locations:
Describe the bug
(As I understand the native_client for .Net is still 'DeepSpeech'; Please correct me if I am wrong.)
I am using DeepSpeech for inference from microphone stream captured via CSCore audio module. I have custom code for VAD and get Intermediate decoding done to get sentence wise live transcription.
Models: 9.0.3 Pre-Trained English Audio Model and custom Scorers with the same hyper-parameters as the Pre-Trained Scorer.
This works but at random times I get the following "unhandled" exceptions from the .so file.
StackOverflowException
from objDeepSpeech.FeedAudioContent(objStream, buffers, Convert.ToUInt32(buffers.Length));
or
System.AccessViolationException: 'Attempted to read or write protected memory. This is often an indication that other memory is corrupt.'
from objDeepSpeech.IntermediateDecodeWithMetadata(objStream, 1);
The errors originate from within libdeepspeech.so, so I am not able to debug any further. Any help is much appreciated. Thanks.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Environment (please complete the following information):
Additional context
No failure is seen when using NAudio but STT accuracy is a lot worser than when using CSCore for the same audio.
Can NAudio be made better instead for Intermediate Decoding?
Is any other language implicitly better at this than C#?
Describe the bug
When training or converting a model to TFLite with latest Docker image, it fails with the error:
Traceback (most recent call last):
File "train.py", line 12, in <module>
stt_train.main()
File "/code/training/coqui_stt_training/train.py", line 1266, in main
export()
File "/code/training/coqui_stt_training/train.py", line 1107, in export
converter = tf.lite.TFLiteConverter(
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/util/module_wrapper.py", line 193, in __getattr__
attr = getattr(self._tfmw_wrapped_module, name)
AttributeError: module 'tensorflow' has no attribute 'lite'
To Reproduce
I am pulling the Docker image and adding Sox as follows:
# Get the latest 🐸STT image
FROM ghcr.io/coqui-ai/stt-train:latest
ENV DEBIAN_FRONTEND=noninteractive
# Install nano editor
RUN apt-get -y update && apt-get install -y nano
# Install sox for inference and for processing Common Voice data
RUN apt-get -y update && apt-get install -y sox
Then I am attempting to convert the yesno model checkpoints to TFLite as follows (note - I also got this error with the same Docker image when training a model from scratch)
#!/bin/bash
set -x
docker build -t stt-image .
docker run \
--rm \
-it \
--entrypoint /bin/bash \
--name stt-train \
--gpus all \
--mount type=bind,source="$(pwd)"/stt-data,target=/code/stt-data \
stt-image \
-c "cd /code && \
python3 train.py \
--checkpoint_dir /code/stt-data/checkpoints \
--export_dir /code/stt-data/exported-model \
--n_hidden 64 \
--alphabet_config_path /code/stt-data/alphabet.txt \
--export_tflite=true "
Expected behavior
The model should train / convert / export without an error.
Environment (please complete the following information):
ghcr.io/coqui-ai/stt-train latest 5adb1e5d8af5
Additional context
I believe this is probably an issue with Tensorflow versions, and I am going to try adding .contrib
to the offending lines in train.py
From the gitter channel <https://gitter.im/coqui-ai/STT>
While running this
python3 lm_optimizer.py --test_files stt-data/cv-corpus-7.0-2021-07-21/lg/clips/test.csv --checkpoint_dir stt-data/best_dev-1594273
I get
Traceback (most recent call last): File "lm_optimizer.py", line 15, in <module> from coqui_stt_training.util.flags import FLAGS, create_flags ModuleNotFoundError: No module named 'coqui_stt_training.util.flags'
Does this speech-to-text conversion work offline? Or is it online? Is there a program in C # to say something through the microphone to test this model? How does it work?
this page: https://stt.readthedocs.io/en/latest/TRAINING_ADVANCED.html#advanced-training-docs is only accessible from Training: Quickstart, which isn't a great userflow
See for example #1844 (comment)
Hi,
me and my team use STT, for Brazilian Portuguese, and we were having problems when dealing with consecutive OOV (out-of-vocabulary) words. The problem was that, when receiving two or more OOV words, the decoder enters in a state that stop accepting any other word.
After some experimentation, I've taken out the return of OOV_SCORE
(in https://github.com/coqui-ai/STT/blob/main/native_client/ctcdecode/scorer.cpp#L247), but adding a penalization together with the BaseScore
as follows:
// encounter OOV
// if (word_index == lm::kUNK) {
// return OOV_SCORE;
// }
cond_prob = language_model_->BaseScore(in_state, word_index, out_state);
if (word_index == lm::kUNK) {
cond_prob-=10;
}
I believe there could be a better solution for this, thus I am opening this issue for discussing a solution.
As your LM is built over a huuge corpus, I suppose that your models do not suffer from OOV words, but I believe that many people may have problems with OOV words with LMs built over smaller corpus.
Is your feature request related to a problem? Please describe.
When training an STT model on some data set with some GPU(s), it's impossible to know the largest possible batchsize ahead of time. So, the user must do some search and trial and error to find the largest possible batch size. The trial and error involves setting a batchsize and starting a training run with --reverse_train --reverse_dev --reverse_test
. If the user hits OOM errors, then they can set a lower batchsize and try again.
This process is sub-optimal for a few reasons:
(1) Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 195, 128, 2048]
(exact values will vary). As such, new users hit this error and don't realize this is an OOM.--reverse_train --reverse_test --reverse_dev
Describe the solution you'd like
I would like a separate script to quickly find the largest possible batchsize for train and test and dev. Something like get_batchsize.py
Describe alternatives you've considered
manually sorting data and running a single epoch on a subset of largest samples in train/test/dev
Additional context
This error comes up often on Gitter
Add jupyter notebooks for:
If you don't configure the alphabet, you get an error about the dims of layer 6. You should get an error about the alphabet
root@e02b8a0e3e26:/code# python3 train.py \
> --train_files stt-data/clips/train.csv \
> --dev_files stt-data/clips/dev.csv \
> --test_files stt-data/clips/test.csv
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Traceback (most recent call last):
File "train.py", line 12, in <module>
stt_train.main()
File "/code/training/coqui_stt_training/train.py", line 1258, in main
train()
File "/code/training/coqui_stt_training/train.py", line 619, in train
gradients, loss, non_finite_files = get_tower_results(
File "/code/training/coqui_stt_training/train.py", line 417, in get_tower_results
avg_loss, non_finite_files = calculate_mean_edit_distance_and_loss(
File "/code/training/coqui_stt_training/train.py", line 335, in calculate_mean_edit_distance_and_loss
logits, _ = create_model(
File "/code/training/coqui_stt_training/train.py", line 294, in create_model
"layer_6", layer_5, Config.n_hidden_6, relu=False
File "/code/training/coqui_stt_training/util/config.py", line 28, in __getattr__
raise RuntimeError(
RuntimeError: Configuration option n_hidden_6 not found in config.
In the Coqui STT 0.10.0-alpha.13 documentation, at https://stt.readthedocs.io/en/latest/playbook/DATA_FORMATTING.html, where it says
`Data from Common Voice
If you are using data from Common Voice for training a model, you will need to prepare it as outlined in the 🐸STT documentation.`
The link to https://stt.readthedocs.io/en/latest/TRAINING.html#common-voice-training-data is broken.
root@2335bec676a7:/code# bash -x train.sh ; bash -x test.sh ; bash -x lm.sh ; bash -x export.sh
+ LLENGUA=el
+ mkdir -p checkpoints
+ TF_CUDNN_RESET_RND_GEN_STATE=1
+ python3 train.py --show_progressbar True --train_cudnn --epochs 25 --es_epochs 3 --max_to_keep 3 --drop_source_layers 2 --train_batch_size 8 --test_batch_size 8 --dev_batch_size 8 --alphabet_config_path /media/cv-corpus-6.1-2020-12-11/el/alphabet.txt --save_checkpoint_dir checkpoints --load_checkpoint_dir deepspeech-0.9.3-checkpoint/ --train_files /media/cv-corpus-6.1-2020-12-11/el/clips/train.csv --dev_files /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv --test_files /media/cv-corpus-6.1-2020-12-11/el/clips/test.csv
W WARNING: You specified different values for --load_checkpoint_dir and --save_checkpoint_dir, but you are running training and testing in a single invocation. The testing step will respect --load_checkpoint_dir, and thus WILL NOT TEST THE CHECKPOINT CREATED BY THE TRAINING STEP. Train and test in two separate invocations, specifying the correct --load_checkpoint_dir in both cases, or use the same location for loading and saving.
I Loading best validating checkpoint from deepspeech-0.9.3-checkpoint/best_dev-1466475
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam_1
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/bias/Adam
I Loading variable from checkpoint: layer_1/bias/Adam_1
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_1/weights/Adam
I Loading variable from checkpoint: layer_1/weights/Adam_1
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/bias/Adam
I Loading variable from checkpoint: layer_2/bias/Adam_1
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_2/weights/Adam
I Loading variable from checkpoint: layer_2/weights/Adam_1
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/bias/Adam
I Loading variable from checkpoint: layer_3/bias/Adam_1
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_3/weights/Adam
I Loading variable from checkpoint: layer_3/weights/Adam_1
I Loading variable from checkpoint: learning_rate
I Initializing variable: layer_5/bias
I Initializing variable: layer_5/bias/Adam
I Initializing variable: layer_5/bias/Adam_1
I Initializing variable: layer_5/weights
I Initializing variable: layer_5/weights/Adam
I Initializing variable: layer_5/weights/Adam_1
I Initializing variable: layer_6/bias
I Initializing variable: layer_6/bias/Adam
I Initializing variable: layer_6/bias/Adam_1
I Initializing variable: layer_6/weights
I Initializing variable: layer_6/weights/Adam
I Initializing variable: layer_6/weights/Adam_1
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:01:58 | Steps: 289 | Loss: 150.440151
Epoch 0 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 118.190063 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv
I Saved new best validating model with loss 118.190063 to: checkpoints/best_dev-1466764
--------------------------------------------------------------------------------
Epoch 1 | Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 112.087529
Epoch 1 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 108.616011 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv
I Saved new best validating model with loss 108.616011 to: checkpoints/best_dev-1467053
--------------------------------------------------------------------------------
Epoch 2 | Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 99.295714
Epoch 2 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 94.945360 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv
I Saved new best validating model with loss 94.945360 to: checkpoints/best_dev-1467342
--------------------------------------------------------------------------------
Epoch 3 | Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 83.251152
Epoch 3 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 80.855902 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv
I Saved new best validating model with loss 80.855902 to: checkpoints/best_dev-1467631
--------------------------------------------------------------------------------
Epoch 4 | Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 69.715410
Epoch 4 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 71.249459 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv
I Saved new best validating model with loss 71.249459 to: checkpoints/best_dev-1467920
--------------------------------------------------------------------------------
Epoch 5 | Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 60.592198
Epoch 5 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 65.730284 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv
I Saved new best validating model with loss 65.730284 to: checkpoints/best_dev-1468209
--------------------------------------------------------------------------------
Epoch 6 | Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 54.503149
Epoch 6 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 62.088597 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv
I Saved new best validating model with loss 62.088597 to: checkpoints/best_dev-1468498
--------------------------------------------------------------------------------
Epoch 7 | Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 50.214056
Epoch 7 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 59.742852 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv
I Saved new best validating model with loss 59.742852 to: checkpoints/best_dev-1468787
--------------------------------------------------------------------------------
Epoch 8 | Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 46.843261
Epoch 8 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 58.021387 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv
I Saved new best validating model with loss 58.021387 to: checkpoints/best_dev-1469076
--------------------------------------------------------------------------------
Epoch 9 | Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 44.179930
Epoch 9 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 56.870370 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv
I Saved new best validating model with loss 56.870370 to: checkpoints/best_dev-1469365
--------------------------------------------------------------------------------
Epoch 10 | Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 41.892102
Epoch 10 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 55.820707 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv
I Saved new best validating model with loss 55.820707 to: checkpoints/best_dev-1469654
--------------------------------------------------------------------------------
Epoch 11 | Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 39.862837
Epoch 11 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 54.944336 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv
I Saved new best validating model with loss 54.944336 to: checkpoints/best_dev-1469943
--------------------------------------------------------------------------------
Epoch 12 | Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 38.057278
Epoch 12 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 54.388507 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv
I Saved new best validating model with loss 54.388507 to: checkpoints/best_dev-1470232
--------------------------------------------------------------------------------
Epoch 13 | Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 36.473011
Epoch 13 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 53.870240 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv
I Saved new best validating model with loss 53.870240 to: checkpoints/best_dev-1470521
--------------------------------------------------------------------------------
Epoch 14 | Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 34.977883
Epoch 14 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 53.513870 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv
I Saved new best validating model with loss 53.513870 to: checkpoints/best_dev-1470810
--------------------------------------------------------------------------------
Epoch 15 | Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 33.688198
Epoch 15 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 53.115888 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv
I Saved new best validating model with loss 53.115888 to: checkpoints/best_dev-1471099
--------------------------------------------------------------------------------
Epoch 16 | Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 32.372648
Epoch 16 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 52.871868 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv
I Saved new best validating model with loss 52.871868 to: checkpoints/best_dev-1471388
--------------------------------------------------------------------------------
Epoch 17 | Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 31.218126
Epoch 17 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 52.653315 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv
I Saved new best validating model with loss 52.653315 to: checkpoints/best_dev-1471677
--------------------------------------------------------------------------------
Epoch 18 | Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 30.027386
Epoch 18 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 52.504788 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv
I Saved new best validating model with loss 52.504788 to: checkpoints/best_dev-1471966
--------------------------------------------------------------------------------
Epoch 19 | Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 28.985256
Epoch 19 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 52.341194 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv
I Saved new best validating model with loss 52.341194 to: checkpoints/best_dev-1472255
--------------------------------------------------------------------------------
Epoch 20 | Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 28.030723
Epoch 20 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 52.125567 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv
I Saved new best validating model with loss 52.125567 to: checkpoints/best_dev-1472544
--------------------------------------------------------------------------------
Epoch 21 | Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 27.094344
Epoch 21 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 52.197986 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv
--------------------------------------------------------------------------------
Epoch 22 | Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 26.126977
Epoch 22 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 52.169392 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv
--------------------------------------------------------------------------------
Epoch 23 | Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 25.329971
Epoch 23 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 51.964646 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv
I Saved new best validating model with loss 51.964646 to: checkpoints/best_dev-1473411
--------------------------------------------------------------------------------
Epoch 24 | Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 24.420492
Epoch 24 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 52.078386 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv
--------------------------------------------------------------------------------
I FINISHED optimization in 1:00:13.756332
I Loading best validating checkpoint from deepspeech-0.9.3-checkpoint/best_dev-1466475
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_6/bias
Traceback (most recent call last):
File "train.py", line 12, in <module>
ds_train.run_script()
File "/code/training/coqui_stt_training/train.py", line 986, in run_script
absl.app.run(main)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "/code/training/coqui_stt_training/train.py", line 962, in main
test()
File "/code/training/coqui_stt_training/train.py", line 682, in test
samples = evaluate(FLAGS.test_files.split(','), create_model)
File "/code/training/coqui_stt_training/evaluate.py", line 87, in evaluate
load_graph_for_evaluation(session)
File "/code/training/coqui_stt_training/util/checkpoints.py", line 151, in load_graph_for_evaluation
_load_or_init_impl(session, methods, allow_drop_layers=False)
File "/code/training/coqui_stt_training/util/checkpoints.py", line 98, in _load_or_init_impl
return _load_checkpoint(session, ckpt_path, allow_drop_layers, allow_lr_init=allow_lr_init)
File "/code/training/coqui_stt_training/util/checkpoints.py", line 71, in _load_checkpoint
v.load(ckpt.get_tensor(v.op.name), session=session)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variables.py", line 1033, in load
session.run(self.initializer, {self.initializer.inputs[1]: value})
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1156, in _run
(np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (29,) for Tensor 'layer_6/bias/Initializer/zeros:0', which has shape '(37,)'
The iOS test project references a libstt.so
file, for example here.
I wonder if this is still intended? I thought the stt_ios.framework
will cover everything needed for run the test project.
Or am I wrong here?
This would allow us to publish aarch64 and armv7 wheels, as well as avoid weird binary incompatibility issues due to the hacking of the platform tag we currently do. Caveat is that installation will fail if the user does not upgrade pip, but at this point I think we instruct people to update pip on every piece of documentation that does a pip install
.
Describe the bug
I got the following error when running the command as instructed in Train the Language Model
/code/kenlm/build/bin/lmplz: error while loading shared libraries: libboost_program_options.so.1.71.0: cannot open shared object file: No such file or directory
To Reproduce
Steps to reproduce the behavior:
python3 data/lm/generate_lm.py \ --input_txt /experiment/librispeech-lm-norm.txt.gz \ --output_dir . \ --top_k 500000 \ --kenlm_bins /code/kenlm/build/bin/ \ --arpa_order 5 \ --max_arpa_memory "85%" \ --arpa_prune "0|0|1" \ --binary_a_bits 255 \ --binary_q_bits 8 \ --binary_type trie
================
== TensorFlow ==
================
NVIDIA Release 20.06-tf1 (build 13409399)
TensorFlow Version 1.15.2
Container image Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
Copyright 2017-2020 The TensorFlow Authors. All rights reserved.
NVIDIA Deep Learning Profiler (dlprof) Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
NOTE: MOFED driver for multi-node communication was not detected.
Multi-node communication performance may be reduced.
python3 data/lm/generate_lm.py \
--input_txt /experiment/librispeech-lm-norm.txt.gz \
--output_dir . \
--top_k 500000 \
--kenlm_bins /code/kenlm/build/bin/ \
--arpa_order 5 \
--max_arpa_memory "85%" \
--arpa_prune "0|0|1" \
--binary_a_bits 255 \
--binary_q_bits 8 \
--binary_type trieroot@f48fa96f3b9c:/code# python3 data/lm/generate_lm.py \
> --input_txt /experiment/librispeech-lm-norm.txt.gz \
> --output_dir . \
> --top_k 500000 \
> --kenlm_bins /code/kenlm/build/bin/ \
> --arpa_order 5 \
> --max_arpa_memory "85%" \
> --arpa_prune "0|0|1" \
> --binary_a_bits 255 \
> --binary_q_bits 8 \
> --binary_type trie
Converting to lowercase and counting word occurrences ...
| | # | 40418260 Elapsed Time: 0:13:52
Saving top 500000 words ...
Calculating word statistics ...
Your text file has 803288729 words in total
It has 973673 unique words
Your top-500000 words are 99.9354 percent of all words
Your most common word "the" occurred 49059384 times
The least common word in your top-k is "corders" with 2 times
The first word with 3 occurrences is "zungwan" at place 420186
Creating ARPA file ...
/code/kenlm/build/bin/lmplz: error while loading shared libraries: libboost_program_options.so.1.71.0: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "data/lm/generate_lm.py", line 210, in <module>
main()
File "data/lm/generate_lm.py", line 201, in main
build_lm(args, data_lower, vocab_str)
File "data/lm/generate_lm.py", line 97, in build_lm
subprocess.check_call(subargs)
File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/code/kenlm/build/bin/lmplz', '--order', '5', '--temp_prefix', '.', '--memory', '85%', '--text', './lower.txt.gz', '--arpa', './lm.arpa', '--prune', '0', '0', '1']' returned non-zero exit status 127.
root@f48fa96f3b9c:/code#
Expected behavior
I should be able to generate lm.binary
and vocab-500000.txt
Environment (please complete the following information):
Solution
I actually got it working by removing the original KenLM and re-compiling KenLM using the following command in the /code
folder
git clone https://github.com/kpu/kenlm.git && cd kenlm && mkdir build && cd build/ && cmake .. && make -j 4
Describe the bug
Stream.intermediateDecode() ignores part of the buffer data that has been sent to the stream.
To Reproduce
(In Python)
# 1. Create a stream
model = stt.Model(args.model)
stream_context = model.createStream()
# 2. Feed it a buffer
stream_context.feedAudioContent(some_buffer)
# 3. Do an intermediate decode
text = stream_context.intermediateDecode()
Current behavior
Depending on the size of some_buffer:
Expected behavior
All of the audio data that has been sent to the stream API so far would be processed and an intermediate result returned.
This is important when streaming to STT with VAD, where a stream will be automatically stopped whenever VAD detection is negative. If the stream is stopped at a point that is not an exact multiple of the internal batch buffer, parts of the audio will not have been processed by the acoustic model and therefore will be missing from the intermediate result. This causes mis-recognition of words.
Stream.finalizeStream() does not suffer from this defect, but it cannot provide intermediate results because it shuts down the stream completely. Intermediate results are essential for fast / low latency voice recognition.
Environment (please complete the following information):
Is your feature request related to a problem? Please describe.
Yes. I find myself saving training logs to a text file (what I consider) a hacky style. On a server, using the docker training image, I find myself always doing this:
$ docker run train.py &> /my/local/file.log &
This seems non-obvious to me, and I see people training models in docker containers, and then they're surprised when the training logs and loss data is gone. Saving the Loss curves to a text file makes viewing the training progress much easier on a remote server than using tensorboard, for example.
Describe the solution you'd like
I'd like to be able to save training loss information to a text file with a command-line flag such as --training_log_file /path/to/file.txt
Describe alternatives you've considered
Redirecting output from stdout to a text file (as mentioned above). Inspecting tensorboard graphs (but I haven't tried this on a remote server, and afaik you can't access them during training.) I like to keep an eye on training curves during training with gnuplot
, by piping a single col of (train or dev) losses into | gnuplot -p -e "set terminal dumb $(tput cols) $(tput lines); plot '-' using 1"
Additional context
None
advise to not build, but to download
Following the Quickstart: Deployment instructions results in an error.
To Reproduce
-Install WSL 2 using Debian on Windows.
-Follow the "Quickstart: Deployment" instructions at: https://stt.readthedocs.io/en/latest/
-When executing the last step "# Transcribe an audio file", this error is thrown:
Loading model from file coqui-stt-0.9.3-models.pbmm
TensorFlow: v2.3.0-6-g23ad988fcde
Coqui STT: v0.10.0-alpha.14-0-g07ed4176
ERROR: Model provided has model identifier 'u/3', should be 'TFL3'
Error at reading model file coqui-stt-0.9.3-models.pbmm
Traceback (most recent call last):
File "/home/bitbarrel/venv-stt/bin/stt", line 8, in
sys.exit(main())
File "/home/bitbarrel/venv-stt/lib/python3.8/site-packages/stt/client.py", line 148, in main
ds = Model(args.model)
File "/home/bitbarrel/venv-stt/lib/python3.8/site-packages/stt/init.py", line 40, in init
raise RuntimeError(
RuntimeError: CreateModel failed with 'Failed to initialize memory mapped model.' (0x3000)
Edit:
It works when tflite is used instead of pbmm, so it looks like the wrong model was downloaded. How to fix this?
right now we use both progressbar2
and tqdm
, but the latter is better maintained and the former has caused issues. We should completely replace progressbar2
with tqdm
Current behavior: If a flag is duplicated at the CLI and passed to train.py
, there is no warning or error message, and the last setting is saved. This was first observed with the --augment
flag where the effect results in only the last augmentation being applied. The behavior was replicated with --reverse_train true --reverse_train false
, with the final result in flags.txt
being --reverse_train false
Expected behavior: An error message stating that a flag has been duplicated, and aborting the training attempt.
Is your feature request related to a problem? Please describe.
When training a language model in an official docker image obtained via $ docker pull ghcr.io/coqui-ai/stt-train
, it is possible to train a language model with generate_lm.py
, but it is not possible to generate a scorer package via ./generate_scorer_package
.
Describe the solution you'd like
I'd like a compiled generate_scorer_package
binary to be included in the docker image.
Describe alternatives you've considered
I usually just wget the compiled version from the linux native_clients on github releases
Additional context
NA
OOM errors happen all the time in training for newcomers and pros... we should have a simple guide on using --reverse_{train,test,dev}
and steps to choose the right batchsize on some setup...
also, maybe we just need --reverse_batches
instead of all three --reverse_{train,test,dev}
... I can't think of a situation where you'd want a subset, but not all, set to reverse
this page in the docs no longer renders, (i think) because there's no flags.py
anymore https://stt.readthedocs.io/en/latest/TRAINING_FLAGS.html#training-flags
Describe the bug
When attempting to build the stt binary in version v0.10.0-alpha.14, the linker fails with the following errors:
c++ -std=c++11 -o stt -I/STT/sox-build/include client.cc -Wl,--no-as-needed -Wl,-rpath,\$ORIGIN -L/STT/tensorflow/bazel-bin/native_client -L/STT/tensorflow/bazel-bin/tensorflow/lite -lstt -lkenlm -ltensorflowlite -L/STT/sox-build/lib -lsox
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::DefaultErrorReporter()'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::GetString(TfLiteTensor const*, int)'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `vtable for tflite::MutableOpResolver'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::impl::Interpreter::SetNumThreads(int)'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::impl::Interpreter::SetExecutionPlan(std::vector<int, std::allocator<int> > const&)'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::impl::Interpreter::Invoke()'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::impl::Interpreter::~Interpreter()'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::impl::InterpreterBuilder::InterpreterBuilder(tflite::FlatBufferModel const&, tflite::OpResolver const&)'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::impl::Interpreter::ModifyGraphWithDelegate(TfLiteDelegate*)'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::impl::InterpreterBuilder::operator()(std::unique_ptr<tflite::impl::Interpreter, std::default_delete<tflite::impl::Interpreter> >*)'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::impl::Interpreter::AllocateTensors()'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::FlatBufferModel::~FlatBufferModel()'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::impl::InterpreterBuilder::~InterpreterBuilder()'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::ops::builtin::BuiltinOpResolver::BuiltinOpResolver()'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::FlatBufferModel::BuildFromFile(char const*, tflite::ErrorReporter*)'
collect2: error: ld returned 1 exit status
Makefile:22: recipe for target 'stt' failed
make: *** [stt] Error 1
The command '/bin/sh -c make NUM_PROCESSES=$(nproc) stt' returned a non-zero code: 2
To Reproduce
Download STT source.
From STT dir, run
docker build -f Dockerfile.build .
Expected behavior
The Dockerfile should build without errors.
Note I am also getting the same errors in my own Dockerfile (that worked with past versions) and in my custom Yocto build.
Environment (please complete the following information):
docker build -f Dockerfile.build .
Is your feature request related to a problem? Please describe.
I have a scorer trained for language X, and it was compiled with alphabet Y. Now I have a new acoustic model (*.pbmm file) which was trained with alphabet Z. I'd like to use my old scorer on new acoustic model, but because the alphabets are not exactly the same, the models are incompatible. I would have to retrain one of the models with the compatible alphabet to use models together. This is burdensome because of the need for data and compute resources.
Describe the solution you'd like
I'd like to be able to specify a new alphabet, and re-export the scorer to be compatible with my acoustic model.
Describe alternatives you've considered
Re-train the language model and re-export the scorer.
Additional context
This is a common problem for sharing models.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.