Comments (14)
That's because defaults changed, r126 has bug with compute_type default when on cuda.
By error it is set to int8 there, which uses less memory but is slower, I think.
from whisper-standalone-win.
Ok thanks, forcing "--compute_type int8" solved the issue. (I'm using an old geforce 1070)
from whisper-standalone-win.
What model did you use and what is VRAM size of your GPU?
from whisper-standalone-win.
--model large-v2
8Go Vram
And verbose mode say there is only 2 compute_type available for my card : int8 and float32
from whisper-standalone-win.
I think 8Gb should be enough for float32, looks like a spike in the memory use happens on that temperature fallback:
File "faster_whisper\transcribe.py", line 603, in generate_with_fallback
Is this error always producible on that file?
Post other settings if you changed from defaults.
Could you check if with --compute_type=float32 --best_of=1
it's gone?
from whisper-standalone-win.
Yes, the problem, for all files, are in the "temperature fallback".
It works fine when using --compute_type=float32 --best_of=1
But I'm a lot confused with this parameters.
In a short way, is there an ideal parameter for accuracy in speech detection and timestamp ?
from whisper-standalone-win.
I was using this script (and using Directory Opus to send all selected files to this .bat)
@echo off
REM ATTENTION CE SCRIP NE FONCTIONNE PAS SI LE PATH DU FICHIER à SOUS TITRER CONTIENT un ESPACE
REM OU SI LE NOM DU FICHIER CONTIENT UN &
REM Initialisez une variable pour stocker les arguments concaténés
set "concatenatedArgs="
set file_path=%~dp1
REM Boucle à travers tous les arguments en ligne de commande
:loop
if "%~1"=="" goto done
REM Ajouter l'argument actuel à une variable temporaire entre guillemets
set "currentArg="%~1""
echo %currentArg%
REM Concaténez chaque argument à la variable "concatenatedArgs"
set "concatenatedArgs=%concatenatedArgs% %currentArg%"
shift
goto loop
:done
REM Affichez la variable contenant les arguments concaténés
echo Arguments concaténés : %concatenatedArgs%
D:\Application\Python\Subtitle\whisper-faster.exe --model large-v2 --language en --output_format srt --model_dir c:\temp --compute_type int8 --output_dir %file_path% %concatenatedArgs%
from whisper-standalone-win.
I was using this script
You don't need to use it, you can do same out of the box. Few usage examples:
whisper-faster.exe "D:\Clips\*.mkv" --language=en --model=medium --batch_recursive
whisper-faster.exe "D:\Audio" --language=en --model=medium --batch_recursive
whisper-faster.exe "D:\Band\Album.m3u" --language=en --model=medium --vad_filter=False
In a short way, is there an ideal parameter for accuracy in speech detection and timestamp?
Imo, defaults are generally good for that, there are no ideal parameters for everything, if there were then all those parameters wouldn't be needed. :)
Thanks for tests, maybe I'll adjust some defaults in the next release.
Could you do benchmark for me with --compute_type=float32 --best_of=1
and --compute_type=int8 --best_of=1
where test runs at least for few minutes?
I need "Transcription speed" lines from tests.
from whisper-standalone-win.
Videofile duration: 39mn57s
int8: Transcription speed: 6.29 audio seconds/s in 458s
float32 : Transcription speed: 6.56 audio seconds/s in 425s
but the flag best_of=1 give poorer recognition result. Lots of hallucination.
from whisper-standalone-win.
but the flag best_of=1 give poorer recognition result. Lots of hallucination.
Can you share audio (remuxed, not transcoded) and the examples of those hallucinations?
from whisper-standalone-win.
the test file is an english adult movie. So maybe not...
from whisper-standalone-win.
Are you sure that best_of=1 affects transcription?
Maybe you are mixing-up int8 vs float32 results. I never noticed that parameter making a single difference,
from whisper-standalone-win.
Here are the result of different Best_of and Compute_Type parameter and the file size.
As you can see, some combination give more text for the same file.
Testfile.1080p.MP4-WRB.int8.bestof1.result.srt 21,2 Ko
Testfile.1080p.MP4-WRB.int8.bestof5.result.srt 41,1 Ko
Testfile.1080p.MP4-WRB.int8.bestof7.result.srt 36,8 Ko
Testfile.1080p.MP4-WRB.float32.bestof1.result.srt 31,3 Ko
Testfile.1080p.MP4-WRB.float32.bestof2.result.srt 40,3 Ko
I can't do more than bestof=2 with float32 without error.
But you should not take so much time of investigation with this particular file. Evenmore, adult films contain groans and grunts that fool both AIs and the code that seek to detect sections containing voice. (I guess the quick repeating non word voice give false result for the compression ratio threshold too)
from whisper-standalone-win.
I downloaded some adult clip, and yeah now I see that best_of has effect.
But opposite effect in results from yours:
float32 + best_of1: 7.6 Kb
float32 + best_of5: 5.5 Kb
float32 + best_of5 + beam_size5: 1.9Kb
By quick looks all versions have worse/better lines, not clear which subs are better. Size difference is mostly because of those pointless ah/oh lines.
from whisper-standalone-win.
Related Issues (20)
- Some problems with large-v3 HOT 48
- CUDA v12 not supported? HOT 2
- RuntimeError: Failed to beep HOT 7
- Update OpenAI's Whisper to latests HOT 4
- Front-end web for Whisper standalone HOT 1
- [Whisper-Faster_r160.5_windows]: --large or --large-v3 shows error HOT 1
- IndexError: tuple index out of range HOT 26
- SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate HOT 18
- Access Denied - This app can't run on your PC HOT 11
- cuBLAS/cuDNN/PyTorch versions please
- Question about compute_type.
- Whisper Master, Please Add Support for Hugging Face Models HOT 8
- Add other Whisper builds HOT 1
- --output_format txt generates timestamps HOT 1
- ambiguous option error on MacOS. HOT 3
- How do you get it to work on Macos?
- Brackets in path breaks wildcard input HOT 6
- --skip only recognises .srt files HOT 8
- Add option to set maximum characters per line HOT 1
- Will large-v3 model support be in a subsequent update? HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from whisper-standalone-win.