Giter Site home page Giter Site logo

Comments (14)

Purfview avatar Purfview commented on May 13, 2024

That's because defaults changed, r126 has bug with compute_type default when on cuda.
By error it is set to int8 there, which uses less memory but is slower, I think.

from whisper-standalone-win.

oevesque avatar oevesque commented on May 13, 2024

Ok thanks, forcing "--compute_type int8" solved the issue. (I'm using an old geforce 1070)

from whisper-standalone-win.

Purfview avatar Purfview commented on May 13, 2024

What model did you use and what is VRAM size of your GPU?

from whisper-standalone-win.

oevesque avatar oevesque commented on May 13, 2024

--model large-v2
8Go Vram
And verbose mode say there is only 2 compute_type available for my card : int8 and float32

from whisper-standalone-win.

Purfview avatar Purfview commented on May 13, 2024

I think 8Gb should be enough for float32, looks like a spike in the memory use happens on that temperature fallback:

File "faster_whisper\transcribe.py", line 603, in generate_with_fallback

Is this error always producible on that file?
Post other settings if you changed from defaults.
Could you check if with --compute_type=float32 --best_of=1 it's gone?

from whisper-standalone-win.

oevesque avatar oevesque commented on May 13, 2024

Yes, the problem, for all files, are in the "temperature fallback".
It works fine when using --compute_type=float32 --best_of=1
But I'm a lot confused with this parameters.
In a short way, is there an ideal parameter for accuracy in speech detection and timestamp ?

from whisper-standalone-win.

oevesque avatar oevesque commented on May 13, 2024

I was using this script (and using Directory Opus to send all selected files to this .bat)

@echo off

REM ATTENTION CE SCRIP NE FONCTIONNE PAS SI LE PATH DU FICHIER à SOUS TITRER CONTIENT un ESPACE
REM  OU SI LE NOM DU FICHIER CONTIENT UN &

REM Initialisez une variable pour stocker les arguments concaténés
set "concatenatedArgs="
set file_path=%~dp1

REM Boucle à travers tous les arguments en ligne de commande
:loop
if "%~1"=="" goto done

REM Ajouter l'argument actuel à une variable temporaire entre guillemets
set "currentArg="%~1""

echo %currentArg%

REM Concaténez chaque argument à la variable "concatenatedArgs"
set "concatenatedArgs=%concatenatedArgs% %currentArg%"

shift
goto loop

:done
REM Affichez la variable contenant les arguments concaténés
echo Arguments concaténés : %concatenatedArgs%
D:\Application\Python\Subtitle\whisper-faster.exe --model large-v2 --language en --output_format srt --model_dir c:\temp  --compute_type int8 --output_dir %file_path%  %concatenatedArgs%



from whisper-standalone-win.

Purfview avatar Purfview commented on May 13, 2024

I was using this script

You don't need to use it, you can do same out of the box. Few usage examples:

whisper-faster.exe "D:\Clips\*.mkv" --language=en --model=medium --batch_recursive

whisper-faster.exe "D:\Audio" --language=en --model=medium --batch_recursive

whisper-faster.exe "D:\Band\Album.m3u" --language=en --model=medium --vad_filter=False

In a short way, is there an ideal parameter for accuracy in speech detection and timestamp?

Imo, defaults are generally good for that, there are no ideal parameters for everything, if there were then all those parameters wouldn't be needed. :)
Thanks for tests, maybe I'll adjust some defaults in the next release.

Could you do benchmark for me with --compute_type=float32 --best_of=1 and --compute_type=int8 --best_of=1 where test runs at least for few minutes?
I need "Transcription speed" lines from tests.

from whisper-standalone-win.

oevesque avatar oevesque commented on May 13, 2024

Videofile duration: 39mn57s
int8: Transcription speed: 6.29 audio seconds/s in 458s
float32 : Transcription speed: 6.56 audio seconds/s in 425s
but the flag best_of=1 give poorer recognition result. Lots of hallucination.

from whisper-standalone-win.

Purfview avatar Purfview commented on May 13, 2024

but the flag best_of=1 give poorer recognition result. Lots of hallucination.

Can you share audio (remuxed, not transcoded) and the examples of those hallucinations?

from whisper-standalone-win.

oevesque avatar oevesque commented on May 13, 2024

the test file is an english adult movie. So maybe not...

from whisper-standalone-win.

Purfview avatar Purfview commented on May 13, 2024

Are you sure that best_of=1 affects transcription?
Maybe you are mixing-up int8 vs float32 results. I never noticed that parameter making a single difference,

from whisper-standalone-win.

oevesque avatar oevesque commented on May 13, 2024

Here are the result of different Best_of and Compute_Type parameter and the file size.
As you can see, some combination give more text for the same file.

Testfile.1080p.MP4-WRB.int8.bestof1.result.srt 21,2 Ko
Testfile.1080p.MP4-WRB.int8.bestof5.result.srt 41,1 Ko
Testfile.1080p.MP4-WRB.int8.bestof7.result.srt 36,8 Ko
Testfile.1080p.MP4-WRB.float32.bestof1.result.srt 31,3 Ko
Testfile.1080p.MP4-WRB.float32.bestof2.result.srt 40,3 Ko

I can't do more than bestof=2 with float32 without error.

But you should not take so much time of investigation with this particular file. Evenmore, adult films contain groans and grunts that fool both AIs and the code that seek to detect sections containing voice. (I guess the quick repeating non word voice give false result for the compression ratio threshold too)

from whisper-standalone-win.

Purfview avatar Purfview commented on May 13, 2024

I downloaded some adult clip, and yeah now I see that best_of has effect.

But opposite effect in results from yours:

float32 + best_of1: 7.6 Kb
float32 + best_of5: 5.5 Kb
float32 + best_of5 + beam_size5: 1.9Kb

By quick looks all versions have worse/better lines, not clear which subs are better. Size difference is mostly because of those pointless ah/oh lines.

from whisper-standalone-win.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.