Giter Site home page Giter Site logo

Hallucinations about whispertimesync HOT 5 OPEN

joseph2mi avatar joseph2mi commented on June 18, 2024
Hallucinations

from whispertimesync.

Comments (5)

EtienneAb3d avatar EtienneAb3d commented on June 18, 2024

@joseph2mi

The WhisperHallu option addSRT is producing 2 outputs:

  • one with noise and silence filtering to get a transcription without hallucinations.
  • one without cut to get a proper SRT with good timestamps, but possibly with hallucinations (that should not damage the timestamps quality).

You then use WhisperTimeSync to put the good timestamps over the good text.

from whispertimesync.

joseph2mi avatar joseph2mi commented on June 18, 2024

Hi, thanks for the response. You said "one without cut to get a proper SRT with good timestamps, but possibly with hallucinations", the assumption is that the timestamp quality is not affected.

The issue is, for some hallucinations, which just repeat themselves into lines, the timestamps vary between 5 seconds and 30 seconds.

Therefore, when the timestamps are synced with the correct subtitles, you get extremely long chunks of subtitle texts for each line, which is inaccurate and defeats the purpose of needing WhisperHallu. I was wondering if there was a way, even with hallucinations, to get accurate timestamps from Whisper or Faster Whisper.

from whispertimesync.

EtienneAb3d avatar EtienneAb3d commented on June 18, 2024

I never see such timestamp shift due to hallucinations: even if timestamps are not always fully accurate, I never had the impression that this inaccuracy was due to hallucinations.

from whispertimesync.

joseph2mi avatar joseph2mi commented on June 18, 2024

Here is an example of a timestamp in Vietnamese through Faster Whisper:

600
00:30:21,250 --> 00:30:24,500
Mà tôi không

601
00:30:24,500 --> 00:30:26,500
Trách ông Cát Mát

602
00:30:26,500 --> 00:30:28,500
Vì Cát Mát

603
00:30:28,500 --> 00:30:30,500
Là người đưa ra

604
00:30:30,500 --> 00:30:52,540
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

605
00:31:05,010 --> 00:31:30,350
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

606
00:31:42,370 --> 00:32:04,700
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

607
00:32:16,050 --> 00:32:37,360
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

608
00:32:48,830 --> 00:33:11,550
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

609
00:33:11,550 --> 00:33:34,560
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

610
00:33:34,560 --> 00:33:56,830
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

611
00:33:56,830 --> 00:34:18,720
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

612
00:34:28,940 --> 00:34:48,940
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

613
00:34:48,940 --> 00:35:12,620
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

614
00:35:12,620 --> 00:35:35,150
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

615
00:35:35,150 --> 00:35:55,340
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

616
00:35:55,340 --> 00:36:16,720
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

If you see here, the timestamps are okay and are usually between 2-5 seconds. The moment it starts hallucinating (I'm still saving the timestamps so I can integrate them later with WhisperHallu and WhisperTimeSync, the timestamps suddenly go up to 30-second intervals, which don't help for subtitles.

My parameters on Faster Whisper are as follow:
model_size=large-v2
device="cuda"
compute_type="float32"
beam_size=7,
vad_filter=True,
vad_parameters=dict(min_silence_duration_ms=50),
language = "vi",
max_initial_timestamp = 2.0,
condition_on_previous_text = True,
length_penalty = 1.5,

from whispertimesync.

EtienneAb3d avatar EtienneAb3d commented on June 18, 2024

In my own experiments, using the original sound file was more efficient to get proper timestamps.
In your case, perhaps you may try/adapt WhisperHallu with a configuration using all filters (especially blank and noise removal), but without cut.

from whispertimesync.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.