Giter Site home page Giter Site logo

Comments (3)

jordimas avatar jordimas commented on May 15, 2024

whisper-ctranslate version 0.1.8 or higher expose the following vad filter parameters:

 --vad_filter VAD_FILTER
                        Enable the voice activity detection (VAD) to filter out parts of the audio without speech. This step is using the Silero VAD model
                        https://github.com/snakers4/silero-vad. (default: False)
  --vad_threshold VAD_THRESHOLD
                        When `vad_filter` is enabled, probabilities above this value are considered as speech. (default: None)
  --vad_min_speech_duration_ms VAD_MIN_SPEECH_DURATION_MS
                        When `vad_filter` is enabled, final speech chunks shorter min_speech_duration_ms are thrown out. (default: None)
  --vad_max_speech_duration_s VAD_MAX_SPEECH_DURATION_S
                        When `vad_filter` is enabled, Maximum duration of speech chunks in seconds. Longer will be split at the timestamp of the last silence. (default: None)
  --vad_min_silence_duration_ms VAD_MIN_SILENCE_DURATION_MS
                        When `vad_filter` is enabled, in the end of each speech chunk time to wait before separating it. (default: None)

You can play with the different parameters that suite your audio file. Here you have a more detailed description of the meaning of these parameters: https://github.com/guillaumekln/faster-whisper/blob/master/faster_whisper/vad.py#L28 You probably want to start with the vad_threshold parameter.

from whisper-ctranslate2.

old9 avatar old9 commented on May 15, 2024

Thanks, tuning some of these parameters does improve the issue, but also introduces some other issues like occasional sentence missing, still trying to figure out the right combination.

from whisper-ctranslate2.

mayeaux avatar mayeaux commented on May 15, 2024

Thanks, tuning some of these parameters does improve the issue, but also introduces some other issues like occasional sentence missing, still trying to figure out the right combination.

It's a known issue and a pretty bad one.

SYSTRAN/faster-whisper#125

The issue is that the Whisper timestamps can be a bit inaccurate so if your timestamp 'starts a little early' in the audio file that is processed with VAD, once you apply the timesplits back to the existing transcription then it can appear to the post processing that the subtitle starts before the silence. I'm not sure what the best approach is, perhaps transcribing each segment separately and then re-attaching them, but in my testing this leads to some of its own issues. guillaumekln is working on it though I know, we'll see what he comes up with, I am a little mystified myself as to what the solution is.

from whisper-ctranslate2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.