I was using version 126 without any problems, and have now switched to version 136.<br

I was using this (and using Directory Opus to send all selected files to this .

I was using this You don't need to

Videofile duration: 39mn57s int8: Tranion speed: 6.29 audio seconds/s in 458

More memory usage for r134.6? about whisper-standalone-win HOT 14 CLOSED

purfview commented on May 13, 2024

More memory usage for r134.6?

from whisper-standalone-win.

Comments (14)

Purfview commented on May 13, 2024

That's because defaults changed, r126 has bug with compute_type default when on cuda.
By error it is set to int8 there, which uses less memory but is slower, I think.

from whisper-standalone-win.

oevesque commented on May 13, 2024

Ok thanks, forcing "--compute_type int8" solved the issue. (I'm using an old geforce 1070)

from whisper-standalone-win.

Purfview commented on May 13, 2024

What model did you use and what is VRAM size of your GPU?

from whisper-standalone-win.

oevesque commented on May 13, 2024

--model large-v2
8Go Vram
And verbose mode say there is only 2 compute_type available for my card : int8 and float32

from whisper-standalone-win.

Purfview commented on May 13, 2024

I think 8Gb should be enough for float32, looks like a spike in the memory use happens on that temperature fallback:

File "faster_whisper\transcribe.py", line 603, in generate_with_fallback

Is this error always producible on that file?
Post other settings if you changed from defaults.
Could you check if with --compute_type=float32 --best_of=1 it's gone?

from whisper-standalone-win.

oevesque commented on May 13, 2024

Yes, the problem, for all files, are in the "temperature fallback".
It works fine when using --compute_type=float32 --best_of=1
But I'm a lot confused with this parameters.
In a short way, is there an ideal parameter for accuracy in speech detection and timestamp ?

from whisper-standalone-win.

oevesque commented on May 13, 2024

I was using this script (and using Directory Opus to send all selected files to this .bat)

@echo off

REM ATTENTION CE SCRIP NE FONCTIONNE PAS SI LE PATH DU FICHIER à SOUS TITRER CONTIENT un ESPACE
REM  OU SI LE NOM DU FICHIER CONTIENT UN &

REM Initialisez une variable pour stocker les arguments concaténés
set "concatenatedArgs="
set file_path=%~dp1

REM Boucle à travers tous les arguments en ligne de commande
:loop
if "%~1"=="" goto done

REM Ajouter l'argument actuel à une variable temporaire entre guillemets
set "currentArg="%~1""

echo %currentArg%

REM Concaténez chaque argument à la variable "concatenatedArgs"
set "concatenatedArgs=%concatenatedArgs% %currentArg%"

shift
goto loop

:done
REM Affichez la variable contenant les arguments concaténés
echo Arguments concaténés : %concatenatedArgs%
D:\Application\Python\Subtitle\whisper-faster.exe --model large-v2 --language en --output_format srt --model_dir c:\temp  --compute_type int8 --output_dir %file_path%  %concatenatedArgs%

from whisper-standalone-win.

Purfview commented on May 13, 2024

I was using this script

You don't need to use it, you can do same out of the box. Few usage examples:

whisper-faster.exe "D:\Clips\*.mkv" --language=en --model=medium --batch_recursive

whisper-faster.exe "D:\Audio" --language=en --model=medium --batch_recursive

whisper-faster.exe "D:\Band\Album.m3u" --language=en --model=medium --vad_filter=False

In a short way, is there an ideal parameter for accuracy in speech detection and timestamp?

Imo, defaults are generally good for that, there are no ideal parameters for everything, if there were then all those parameters wouldn't be needed. :)
Thanks for tests, maybe I'll adjust some defaults in the next release.

Could you do benchmark for me with --compute_type=float32 --best_of=1 and --compute_type=int8 --best_of=1 where test runs at least for few minutes?
I need "Transcription speed" lines from tests.

from whisper-standalone-win.

oevesque commented on May 13, 2024

Videofile duration: 39mn57s
int8: Transcription speed: 6.29 audio seconds/s in 458s
float32 : Transcription speed: 6.56 audio seconds/s in 425s
but the flag best_of=1 give poorer recognition result. Lots of hallucination.

from whisper-standalone-win.

Purfview commented on May 13, 2024

but the flag best_of=1 give poorer recognition result. Lots of hallucination.

Can you share audio (remuxed, not transcoded) and the examples of those hallucinations?

from whisper-standalone-win.

oevesque commented on May 13, 2024

the test file is an english adult movie. So maybe not...

from whisper-standalone-win.

Purfview commented on May 13, 2024

Are you sure that best_of=1 affects transcription?
Maybe you are mixing-up int8 vs float32 results. I never noticed that parameter making a single difference,

from whisper-standalone-win.

oevesque commented on May 13, 2024

Here are the result of different Best_of and Compute_Type parameter and the file size.
As you can see, some combination give more text for the same file.

Testfile.1080p.MP4-WRB.int8.bestof1.result.srt 21,2 Ko
Testfile.1080p.MP4-WRB.int8.bestof5.result.srt 41,1 Ko
Testfile.1080p.MP4-WRB.int8.bestof7.result.srt 36,8 Ko
Testfile.1080p.MP4-WRB.float32.bestof1.result.srt 31,3 Ko
Testfile.1080p.MP4-WRB.float32.bestof2.result.srt 40,3 Ko

I can't do more than bestof=2 with float32 without error.

But you should not take so much time of investigation with this particular file. Evenmore, adult films contain groans and grunts that fool both AIs and the code that seek to detect sections containing voice. (I guess the quick repeating non word voice give false result for the compression ratio threshold too)

from whisper-standalone-win.

Purfview commented on May 13, 2024

I downloaded some adult clip, and yeah now I see that best_of has effect.

But opposite effect in results from yours:

float32 + best_of1: 7.6 Kb
float32 + best_of5: 5.5 Kb
float32 + best_of5 + beam_size5: 1.9Kb

By quick looks all versions have worse/better lines, not clear which subs are better. Size difference is mostly because of those pointless ah/oh lines.

from whisper-standalone-win.

More memory usage for r134.6? about whisper-standalone-win HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent