The issue with WhisperTimeSync with WhisperHallu is that if you need to use Whisper Ha

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hallucinations about whispertimesync HOT 5 OPEN

joseph2mi commented on June 18, 2024

Hallucinations

from whispertimesync.

Comments (5)

EtienneAb3d commented on June 18, 2024

@joseph2mi

The WhisperHallu option addSRT is producing 2 outputs:

one with noise and silence filtering to get a transcription without hallucinations.
one without cut to get a proper SRT with good timestamps, but possibly with hallucinations (that should not damage the timestamps quality).

You then use WhisperTimeSync to put the good timestamps over the good text.

from whispertimesync.

joseph2mi commented on June 18, 2024

Hi, thanks for the response. You said "one without cut to get a proper SRT with good timestamps, but possibly with hallucinations", the assumption is that the timestamp quality is not affected.

The issue is, for some hallucinations, which just repeat themselves into lines, the timestamps vary between 5 seconds and 30 seconds.

Therefore, when the timestamps are synced with the correct subtitles, you get extremely long chunks of subtitle texts for each line, which is inaccurate and defeats the purpose of needing WhisperHallu. I was wondering if there was a way, even with hallucinations, to get accurate timestamps from Whisper or Faster Whisper.

from whispertimesync.

EtienneAb3d commented on June 18, 2024

I never see such timestamp shift due to hallucinations: even if timestamps are not always fully accurate, I never had the impression that this inaccuracy was due to hallucinations.

from whispertimesync.

joseph2mi commented on June 18, 2024

Here is an example of a timestamp in Vietnamese through Faster Whisper:

600
00:30:21,250 --> 00:30:24,500
Mà tôi không

601
00:30:24,500 --> 00:30:26,500
Trách ông Cát Mát

602
00:30:26,500 --> 00:30:28,500
Vì Cát Mát

603
00:30:28,500 --> 00:30:30,500
Là người đưa ra

604
00:30:30,500 --> 00:30:52,540
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

605
00:31:05,010 --> 00:31:30,350
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

606
00:31:42,370 --> 00:32:04,700
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

607
00:32:16,050 --> 00:32:37,360
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

608
00:32:48,830 --> 00:33:11,550
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

609
00:33:11,550 --> 00:33:34,560
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

610
00:33:34,560 --> 00:33:56,830
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

611
00:33:56,830 --> 00:34:18,720
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

612
00:34:28,940 --> 00:34:48,940
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

613
00:34:48,940 --> 00:35:12,620
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

614
00:35:12,620 --> 00:35:35,150
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

615
00:35:35,150 --> 00:35:55,340
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

616
00:35:55,340 --> 00:36:16,720
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

If you see here, the timestamps are okay and are usually between 2-5 seconds. The moment it starts hallucinating (I'm still saving the timestamps so I can integrate them later with WhisperHallu and WhisperTimeSync, the timestamps suddenly go up to 30-second intervals, which don't help for subtitles.

My parameters on Faster Whisper are as follow:
model_size=large-v2
device="cuda"
compute_type="float32"
beam_size=7,
vad_filter=True,
vad_parameters=dict(min_silence_duration_ms=50),
language = "vi",
max_initial_timestamp = 2.0,
condition_on_previous_text = True,
length_penalty = 1.5,

from whispertimesync.

EtienneAb3d commented on June 18, 2024

In my own experiments, using the original sound file was more efficient to get proper timestamps.
In your case, perhaps you may try/adapt WhisperHallu with a configuration using all filters (especially blank and noise removal), but without cut.

from whispertimesync.

Hallucinations about whispertimesync HOT 5 OPEN

Comments (5)

Related Issues (14)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent