I've been using Faster Whisper from the start and since it was integrated into Subtitl

Is there a way to figure out why it quits? <p dir="auto

Is there a quality downside to this setting? <p dir="au

192.3 in Subtitle Edit, incomplete transcriptions about whisper-standalone-win HOT 7 CLOSED

rsmith02ct commented on June 28, 2024

192.3 in Subtitle Edit, incomplete transcriptions

from whisper-standalone-win.

Comments (7)

Purfview commented on June 28, 2024

Is there a way to figure out why it quits?

Use it directly in a terminal/console.

Is it that it encounters a gap in the audio or a different language and stops?

It's because you have not enough VRAM and it crash with the out of memory error.

from whisper-standalone-win.

rsmith02ct commented on June 28, 2024

I just had a chance to do more testing. On my desktop (RTX2080 8GB VRAM) I had no issues with any of the video files I tried. On my laptop it generally worked though one PCM wav is giving me trouble. It gets to 4:54 min out of a 1 hour 34 min file and stops. I also assumed it was a VRAM issue though watching it in task manager the GPU goes from 2 to 3GB used out of 4 and then back again. From the command line with the same large v2 model I see less usage, 1.6 to 3.1. You are right though it has the error: "File "D:\whisper-fast\__main__.py", line 1600, in <module> File "D:\whisper-fast\__main__.py", line 1527, in cli File "faster_whisper\transcribe.py", line 1373, in restore_speech_timestamps File "faster_whisper\transcribe.py", line 722, in generate_segments File "faster_whisper\transcribe.py", line 1072, in generate_with_fallback RuntimeError: CUDA failed with error out of memory [19540] Failed to execute script '__main__' due to unhandled exception!" Is Windows not showing all the GPU VRAM usage? It's too bad as largev2 does work sometimes. I tried it again with the medium model and it easily gets past 4:50. It's about at 1.7GB VRAM maximum. GPU-Z reports up to 2420GB but mostly under 1800. I guess there is nothing to be done other than a smaller model or better GPU?

…

On Fri, May 31, 2024 at 1:13 AM Purfview ***@***.***> wrote: Is there a way to figure out why it quits? Use it directly in a terminal/console. Is it that it encounters a gap in the audio or a different language and stops? It's because you have not enough VRAM and it crash with the out of memory error. — Reply to this email directly, view it on GitHub <#263 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A5EWOZNWRZIMUWZJY5PDICTZE5F37AVCNFSM6AAAAABIQNEEEGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBQGEYTGMJXHE> . You are receiving this because you authored the thread.Message ID: ***@***.***>

from whisper-standalone-win.

Purfview commented on June 28, 2024

8GB should be enough for the large models.
4GB is barely enough for large models maybe your internet browser or other soft is using VRAM too. Try Faster-Whisper-XXL.

Check with --verbose true what compute type is in use.

from whisper-standalone-win.

rsmith02ct commented on June 28, 2024

I just downloaded and tested XXL (on my laptop, 4GB) with large v2. With just this browser (Firefox) open, testing it on the same audio file it says it is using my GTX1050 with int8_float32. I see nearly 100% CUDA activity in task manager.

With large v2... it made it to 10 minutes and then gave up when it ran out of memory. Oh well!

from whisper-standalone-win.

Purfview commented on June 28, 2024

Try to reduce --best_of till it takes less than 4GB memory.

from whisper-standalone-win.

rsmith02ct commented on June 28, 2024

5 is the default, right? 4 went a few more minutes before failing. 3 was able to complete the 1 hour 30 min interview!

Is there a quality downside to this setting? In general I use the large model as it just a better job with proper nouns. Is best of 3 and large v2 likely better for specific names than medium at 5? A quick review seems acceptable.

from whisper-standalone-win.

Purfview commented on June 28, 2024

Is there a quality downside to this setting?

Yes, but going from 5 to 3 has very small impact to quality, much less than going from large to medium model.

If it's audio from a movie then you can use --ff_mdx_kim2 --vad_alt_method pyannote_v3 to increase quality [that's with Faster-Whisper-XXL].

EDIT:
Oh, you wrote that it's interview, then don't bother with those settings unless there is background music.

from whisper-standalone-win.

192.3 in Subtitle Edit, incomplete transcriptions about whisper-standalone-win HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent