Comments (3)
whisper-ctranslate version 0.1.8 or higher expose the following vad filter parameters:
--vad_filter VAD_FILTER
Enable the voice activity detection (VAD) to filter out parts of the audio without speech. This step is using the Silero VAD model
https://github.com/snakers4/silero-vad. (default: False)
--vad_threshold VAD_THRESHOLD
When `vad_filter` is enabled, probabilities above this value are considered as speech. (default: None)
--vad_min_speech_duration_ms VAD_MIN_SPEECH_DURATION_MS
When `vad_filter` is enabled, final speech chunks shorter min_speech_duration_ms are thrown out. (default: None)
--vad_max_speech_duration_s VAD_MAX_SPEECH_DURATION_S
When `vad_filter` is enabled, Maximum duration of speech chunks in seconds. Longer will be split at the timestamp of the last silence. (default: None)
--vad_min_silence_duration_ms VAD_MIN_SILENCE_DURATION_MS
When `vad_filter` is enabled, in the end of each speech chunk time to wait before separating it. (default: None)
You can play with the different parameters that suite your audio file. Here you have a more detailed description of the meaning of these parameters: https://github.com/guillaumekln/faster-whisper/blob/master/faster_whisper/vad.py#L28 You probably want to start with the vad_threshold parameter.
from whisper-ctranslate2.
Thanks, tuning some of these parameters does improve the issue, but also introduces some other issues like occasional sentence missing, still trying to figure out the right combination.
from whisper-ctranslate2.
Thanks, tuning some of these parameters does improve the issue, but also introduces some other issues like occasional sentence missing, still trying to figure out the right combination.
It's a known issue and a pretty bad one.
The issue is that the Whisper timestamps can be a bit inaccurate so if your timestamp 'starts a little early' in the audio file that is processed with VAD, once you apply the timesplits back to the existing transcription then it can appear to the post processing that the subtitle starts before the silence. I'm not sure what the best approach is, perhaps transcribing each segment separately and then re-attaching them, but in my testing this leads to some of its own issues. guillaumekln is working on it though I know, we'll see what he comes up with, I am a little mystified myself as to what the solution is.
from whisper-ctranslate2.
Related Issues (20)
- only works in cpu mode , but gpu outputs nothing HOT 10
- Please make sure libcudnn_ops_infer.so.8 is in your library path? HOT 7
- How to turn off --highlight_words ? HOT 5
- sometimes srt file not generated HOT 11
- vad_filter still runs even if 'False' is passed HOT 2
- compress json filesize HOT 4
- Live transcription is extremely inaccurate HOT 5
- Extending CLI for Fine-Tuned Whisper Models on Hugging Face HOT 4
- --verbose argument should work with lowercase true and false HOT 1
- Please support python API for whisper-ctranslate2 HOT 3
- It works fine, but gives an error.
- Consider adding options list to readme HOT 2
- diarize option? HOT 2
- Using GPU without any output HOT 4
- I get error code 126 with CUDA installed and running. HOT 4
- Doesn't accept specified language
- Is possible to support with ydotool / nerd-dictation in whisper-ctranslate2 ? HOT 1
- Help, the software is not working! HOT 9
- Random stops HOT 2
- initial_prompt? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from whisper-ctranslate2.