Giter Site home page Giter Site logo

Comments (7)

Purfview avatar Purfview commented on June 28, 2024

Is there a way to figure out why it quits?

Use it directly in a terminal/console.

Is it that it encounters a gap in the audio or a different language and stops?

It's because you have not enough VRAM and it crash with the out of memory error.

from whisper-standalone-win.

rsmith02ct avatar rsmith02ct commented on June 28, 2024

from whisper-standalone-win.

Purfview avatar Purfview commented on June 28, 2024

8GB should be enough for the large models.
4GB is barely enough for large models maybe your internet browser or other soft is using VRAM too. Try Faster-Whisper-XXL.

Check with --verbose true what compute type is in use.

from whisper-standalone-win.

rsmith02ct avatar rsmith02ct commented on June 28, 2024

I just downloaded and tested XXL (on my laptop, 4GB) with large v2. With just this browser (Firefox) open, testing it on the same audio file it says it is using my GTX1050 with int8_float32. I see nearly 100% CUDA activity in task manager.

With large v2... it made it to 10 minutes and then gave up when it ran out of memory. Oh well!

from whisper-standalone-win.

Purfview avatar Purfview commented on June 28, 2024

Try to reduce --best_of till it takes less than 4GB memory.

from whisper-standalone-win.

rsmith02ct avatar rsmith02ct commented on June 28, 2024

5 is the default, right? 4 went a few more minutes before failing. 3 was able to complete the 1 hour 30 min interview!

Is there a quality downside to this setting? In general I use the large model as it just a better job with proper nouns. Is best of 3 and large v2 likely better for specific names than medium at 5? A quick review seems acceptable.

from whisper-standalone-win.

Purfview avatar Purfview commented on June 28, 2024

Is there a quality downside to this setting?

Yes, but going from 5 to 3 has very small impact to quality, much less than going from large to medium model.

If it's audio from a movie then you can use --ff_mdx_kim2 --vad_alt_method pyannote_v3 to increase quality [that's with Faster-Whisper-XXL].

EDIT:
Oh, you wrote that it's interview, then don't bother with those settings unless there is background music.

from whisper-standalone-win.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.