Comments (5)
The WhisperHallu option addSRT
is producing 2 outputs:
- one with noise and silence filtering to get a transcription without hallucinations.
- one without cut to get a proper SRT with good timestamps, but possibly with hallucinations (that should not damage the timestamps quality).
You then use WhisperTimeSync to put the good timestamps over the good text.
from whispertimesync.
Hi, thanks for the response. You said "one without cut to get a proper SRT with good timestamps, but possibly with hallucinations", the assumption is that the timestamp quality is not affected.
The issue is, for some hallucinations, which just repeat themselves into lines, the timestamps vary between 5 seconds and 30 seconds.
Therefore, when the timestamps are synced with the correct subtitles, you get extremely long chunks of subtitle texts for each line, which is inaccurate and defeats the purpose of needing WhisperHallu. I was wondering if there was a way, even with hallucinations, to get accurate timestamps from Whisper or Faster Whisper.
from whispertimesync.
I never see such timestamp shift due to hallucinations: even if timestamps are not always fully accurate, I never had the impression that this inaccuracy was due to hallucinations.
from whispertimesync.
Here is an example of a timestamp in Vietnamese through Faster Whisper:
600
00:30:21,250 --> 00:30:24,500
Mà tôi không
601
00:30:24,500 --> 00:30:26,500
Trách ông Cát Mát
602
00:30:26,500 --> 00:30:28,500
Vì Cát Mát
603
00:30:28,500 --> 00:30:30,500
Là người đưa ra
604
00:30:30,500 --> 00:30:52,540
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!
605
00:31:05,010 --> 00:31:30,350
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!
606
00:31:42,370 --> 00:32:04,700
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!
607
00:32:16,050 --> 00:32:37,360
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!
608
00:32:48,830 --> 00:33:11,550
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!
609
00:33:11,550 --> 00:33:34,560
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!
610
00:33:34,560 --> 00:33:56,830
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!
611
00:33:56,830 --> 00:34:18,720
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!
612
00:34:28,940 --> 00:34:48,940
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!
613
00:34:48,940 --> 00:35:12,620
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!
614
00:35:12,620 --> 00:35:35,150
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!
615
00:35:35,150 --> 00:35:55,340
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!
616
00:35:55,340 --> 00:36:16,720
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!
If you see here, the timestamps are okay and are usually between 2-5 seconds. The moment it starts hallucinating (I'm still saving the timestamps so I can integrate them later with WhisperHallu and WhisperTimeSync, the timestamps suddenly go up to 30-second intervals, which don't help for subtitles.
My parameters on Faster Whisper are as follow:
model_size=large-v2
device="cuda"
compute_type="float32"
beam_size=7,
vad_filter=True,
vad_parameters=dict(min_silence_duration_ms=50),
language = "vi",
max_initial_timestamp = 2.0,
condition_on_previous_text = True,
length_penalty = 1.5,
from whispertimesync.
In my own experiments, using the original sound file was more efficient to get proper timestamps.
In your case, perhaps you may try/adapt WhisperHallu with a configuration using all filters (especially blank and noise removal), but without cut.
from whispertimesync.
Related Issues (14)
- Can you make a Google colab version please? HOT 2
- An error occurred while loading the archive HOT 4
- Crash with long SRT File HOT 2
- word_timestamps parameter HOT 4
- Highlight and Max line width HOT 1
- Synchronization offset HOT 4
- [Feature request] Adaptation to the Whisper's JSON output HOT 8
- Memory error when synchronising longer texts HOT 2
- Incorrect sync HOT 5
- how can I change language option to transcribe audio HOT 2
- java.lang.ArrayIndexOutOfBoundsException when running WhisperTimeSync HOT 1
- ComPair Freeware: installation expired HOT 10
- CUDA usage HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from whispertimesync.