Comments (11)
Hey Beatrice, in fact it does work with the new Colab, but not in the old way. It works,, but you need to restart the kernel.
Here is my test script I compiled after this thing came out. Run it step by step and before the last step it shows you a button to restart the kernel. After that you continue from where you left. Do not initialize any variables before that point, they will be lost (or if you have to, you need to redefine them).
https://colab.research.google.com/drive/1CfZbtNLht4h0ShOJR1qUqucNg893rsOP?usp=share_link
AFAIK, there is no immediate plan to update the code to TF v2 as it would be a huge undertaking.
Bülent
from stt.
Ok, thanks for the information! I can confirm, that installing python 3.7 on Colab and creating an virtual env with 3.7 sucessfully installs tensorflow 1.15.4 and STT 1.4.0.
Meanwhile, I finished the grid search on my local machine and it seems that increasing the train batch size leads to a higher WER and CER so I will go for a lower batch size with my audio data samples.
I can train up to batch size 16 with my graphics card, I wanted to include a batch size of 32 as well as a comparison in my thesis, but I can do that later if I have time :)
from stt.
Hey Bülent,
thanks for your answer and sharing your notebook! It seems they changed a lot. pip on the new Colab does not find any version below 2 any more:
ERROR: Could not find a version that satisfies the requirement tensorflow==1.15.4 (from coqui-stt-training) (from versions: 2.2.0, 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.5.0, 2.5.1, 2.5.2, 2.5.3, 2.6.0rc0, 2.6.0rc1, 2.6.0rc2, 2.6.0, 2.6.1, 2.6.2, 2.6.3, 2.6.4, 2.6.5, 2.7.0rc0, 2.7.0rc1, 2.7.0, 2.7.1, 2.7.2, 2.7.3, 2.7.4, 2.8.0rc0, 2.8.0rc1, 2.8.0, 2.8.1, 2.8.2, 2.8.3, 2.8.4, 2.9.0rc0, 2.9.0rc1, 2.9.0rc2, 2.9.0, 2.9.1, 2.9.2, 2.9.3, 2.10.0rc0, 2.10.0rc1, 2.10.0rc2, 2.10.0rc3, 2.10.0, 2.10.1, 2.11.0rc0, 2.11.0rc1, 2.11.0rc2, 2.11.0)
ERROR: No matching distribution found for tensorflow==1.15.4
This error occurs when installing the STT from git and also when trying to manually install tensorflow in a later cell. I try to install it from source, maybe this works.
Beatrice
from stt.
I found the reason. They switched the image to use Python 3.8 which only supports TF 2.2+
I checked, they have Python 3.6 installed. Can you try to create a virtual env to use Python 3.6?
from stt.
Hey Bülent,
thanks for the hint with the installed python version! I tried with the virtual environment, it works, but then the following error occurs:
ERROR: Package 'coqui-stt-training' requires a different Python: 3.6.9 not in '<3.9,>=3.7'
I try to install version 3.7 when I have time to see what happens. Thanks for helping, though!
br
Beatrice
from stt.
For a quick check you could use an older version of STT. I couldn't find the exact point where Python 3.6 support is dropped, but you can use v1.0.0 for example. Wrt. underlying DeepSpeech model nothing changed, but there have been changes to parameters, so use the related version documentation for parameters.
from stt.
It would be very nice if you can share the relevant cells here for future questions.
it seems that increasing the train batch size leads to a higher WER and CER
That might be data dependent. I did a similar test last year, the results were not conclusive/erratic, but anyway, I share them here:
As you can see with training batch size set to 32 (and 16), Best Epoch was reached too early. I did not check the loss graphs at that time, but maybe you should for the thesis to pinpoint possible overfitting etc.
from stt.
Sure, here you go:
https://colab.research.google.com/drive/1mLXfqVXIQLbgyfa2pXzVay0fWoh9Geod?usp=sharing
I'm not a heavy Colab user, somehow one has to activate the virtual env for every cell, I think.
Thanks for sharing your results! I don't have mine ready yet (but I'm happy to share the thesis once it's finished).
from stt.
Thank you for sharing the solution Beatrice.
AFAIK, in Colab (probably in all iPython/Notebook implementations), each cell starts a new shell, thus you need to re-activate. I found out that defining def:'s and calling them in succession from a single cell makes it easier. It kinda defeats the purpose of using a Notebook and results in many linting underlines, but works.
Good luck with your thesis :)
from stt.
AFAIK, there is no immediate plan to update the code to TF v2 as it would be a huge undertaking
I hate it when that happens :-/ and since almost all frameworks released in the last years are basically beta versions that constantly introduce breaking changes and drop backwards compatibility it has become the everyday nightmare of programmers 😞.
The question is how long will you be able to live with TF < 2 :-|.
Maybe the tf.compat module can help?
I'm already facing a situation where I need to use two libraries in the same program (one being Coqui) and one requires TF 2 :-(
from stt.
You can totally use TFv2 for inference already.
Training is another beast in it-self.
The question is how long will you be able to live with TF < 2 :-|.
As long as we can't use TFv2 for training. We need some very specific requirements to train models which TFv2 is lacking for now.
I'm already facing a situation where I need to use two libraries in the same program
You shouldn't mix dependencies like that. Training should be performed inside its own dedicated environment.
Meaning you should have one notebook for training using STT, and create other notebooks for your other needs.
@HarikalarKutusu can tell you that notebooks are not made for you to train your models. They are good tools to learn and play with code but not to seriously produce models at scale.
If you followed the docs, we actually recommend you use our docker image to train your models, as it's the easiest way to train and comes out-of-the-box with everything you would need to fully train your models.
We suggest you use our Docker image as a base for training.
- https://stt.readthedocs.io/en/latest/TRAINING_INTRO.html
I'll move this ticket to a discussion as there is really not much we can do about it. We have made some progress towards it but there is still a long way before we can fully use TF2 as base for training.
from stt.
Related Issues (20)
- Bug: lm_optimize fails due to check failing. HOT 2
- Bug: Segmentation Fault HOT 1
- Bug: Scorer.fill_dictiomary() Python function throws SWIG exception
- Feature request: Multiple Parallel/Concatenatable Models
- Bug: Android couldn`t find libstt-jni.so
- Feature request: Cancel previous workflow actions
- Feature request: add Typescript @types for the WASM bindings
- Bug: --alphabet required with --force_bytes_output_mode off but not accepted as a CLI option HOT 4
- Update `genrate_scorer_package` error message when not given any `checkpoint` HOT 1
- Bug: Update `Python` inside `Dockerfile.build` HOT 1
- Bug: Illegal Hard Instruction on generate_scorer_package
- Improvment: `NotFoundError`: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for `best_dev_checkpoint` HOT 2
- Bug: stt complains libbz2.so.1.0 not found HOT 6
- Bug: "import stt" works in notebook but not in bash command HOT 1
- Feature request: Replace Scorer.KenLM with Scorer.Transform HOT 18
- Bug: Importer `import_librivox.py` can't render absolute path of WAV files in CSV HOT 2
- Bug: Update `set-output` calls for ci pipeline v3 HOT 1
- Upload missing aarch64 and arm32 wheels to PyPi
- Bug: Model zoo seems to be gone HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stt.