Giter Site home page Giter Site logo

Comments (24)

Purfview avatar Purfview commented on May 28, 2024

Sometimes the last part of segments is missing 8 - 10s in the transcript. This can happen with all files over 20s

Share an audio sample where it happens in first ~5 mins and when you use --compute_type int8_float32!

The discrepancy occurs in different segments with different models.

Did you tried medium model?

I used: --initial_prompt=none

It should be None.

from whisper-standalone-win.

yukihana2k3 avatar yukihana2k3 commented on May 28, 2024

Share an audio sample where it happens in first ~5 mins and when you use --compute_type int8_float32!

I find that this error is less common in smaller files than in larger ones, although even in large files, the error tends to occur more towards the end of the file.

[03:10.740 --> 03:12.500] 秦國豔眼含笑地看著秦百玉
[03:12.500 --> 03:13.420] 問他有什麼想法
Processing segment at 03:19.080
[03:19.480 --> 03:20.960] 隨後秦百玉感慨到
[03:20.960 --> 03:22.160] 他們家老四向來孤傲

Did you tried medium model?

I just ran medium model and found that the error frequency is still higher.

[05:36.840 --> 05:37.640] 她脚步一顿
[05:37.640 --> 05:39.080] 渣男陪堂聚议的轮廓
Processing segment at 05:46.760
[05:46.760 --> 05:47.820] 你们的事我不感兴趣
[05:47.820 --> 05:48.480] 我还有事

[06:53.860 --> 06:55.500] 秦百玉随手放下文件
[06:55.500 --> 06:56.000] 下一秒
Processing segment at 07:01.600
[07:01.600 --> 07:02.120] 小女人
[07:02.120 --> 07:02.860] 纹身僵硬

[10:52.640 --> 10:53.880] 至于她有没有觉得不愉快
[10:53.880 --> 10:54.700] 我就不清楚了
Processing segment at 10:59.960
[11:00.000 --> 11:01.180] 您看一下这张照片
[11:01.180 --> 11:02.700] 这辆车是你昨天乘坐的吗
...

from whisper-standalone-win.

Purfview avatar Purfview commented on May 28, 2024

I meant share the audio, not the text copy/pastes.

from whisper-standalone-win.

yukihana2k3 avatar yukihana2k3 commented on May 28, 2024

It's here bro
https://drive.google.com/file/d/1H-tIwn-UiOMmVGEVbPW1vf5NnrqDYvmE/view?usp=sharing

from whisper-standalone-win.

Purfview avatar Purfview commented on May 28, 2024

And where it's missing anything and what settings are used?

from whisper-standalone-win.

yukihana2k3 avatar yukihana2k3 commented on May 28, 2024

my prompt
--initial_prompt=none --language=Chinese --verbose=true --reprompt=0 --model=large-v2
[03:10.740 --> 03:12.500] 秦國豔眼含笑地看著秦百玉
[03:12.500 --> 03:13.420] 問他有什麼想法
Processing segment at 03:19.080
[03:19.480 --> 03:20.960] 隨後秦百玉感慨到
[03:20.960 --> 03:22.160] 他們家老四向來孤傲

from whisper-standalone-win.

yukihana2k3 avatar yukihana2k3 commented on May 28, 2024

missing 03:13.420 to 03:19.480

秦百玉看着晏十七
捏紧了她的手指
露出柔和的笑容
说道
她说的
我没意见
长辈们意识都愣住了

from whisper-standalone-win.

Purfview avatar Purfview commented on May 28, 2024

my prompt
--initial_prompt=none --language=Chinese --verbose=true --reprompt=0 --model=large-v2

That's not what I wrote in the second post, you may want to read it again.

Use --compute_type int8_float32 and model medium.
And "none" is not "None".

from whisper-standalone-win.

yukihana2k3 avatar yukihana2k3 commented on May 28, 2024

Share an audio sample where it happens in first ~5 mins and when you use --compute_type int8_float32!

I find that this error is less common in smaller files than in larger ones, although even in large files, the error tends to occur more towards the end of the file.

[03:10.740 --> 03:12.500] 秦國豔眼含笑地看著秦百玉 [03:12.500 --> 03:13.420] 問他有什麼想法 Processing segment at 03:19.080 [03:19.480 --> 03:20.960] 隨後秦百玉感慨到 [03:20.960 --> 03:22.160] 他們家老四向來孤傲

Did you tried medium model?

I just ran medium model and found that the error frequency is still higher.

[05:36.840 --> 05:37.640] 她脚步一顿 [05:37.640 --> 05:39.080] 渣男陪堂聚议的轮廓 Processing segment at 05:46.760 [05:46.760 --> 05:47.820] 你们的事我不感兴趣 [05:47.820 --> 05:48.480] 我还有事

[06:53.860 --> 06:55.500] 秦百玉随手放下文件 [06:55.500 --> 06:56.000] 下一秒 Processing segment at 07:01.600 [07:01.600 --> 07:02.120] 小女人 [07:02.120 --> 07:02.860] 纹身僵硬

[10:52.640 --> 10:53.880] 至于她有没有觉得不愉快 [10:53.880 --> 10:54.700] 我就不清楚了 Processing segment at 10:59.960 [11:00.000 --> 11:01.180] 您看一下这张照片 [11:01.180 --> 11:02.700] 这辆车是你昨天乘坐的吗 ...

There's a bit of misunderstanding here. In the long file, I've already addressed them in this post and encountered more errors than before, so I won't redo it with the shorter file anymore.
With prompt --compute_type int8_float32, it's currently set as default on my computer

Supported compute types by GPU: {'int8_float32', 'float32', 'int8'}

[2024-05-03 12:46:04.320] [ctranslate2] [thread 11004] [info] CPU: AuthenticAMD (SSE4.1=true, AVX=true, AVX2=true, AVX512=false)
[2024-05-03 12:46:04.320] [ctranslate2] [thread 11004] [info] - Selected ISA: AVX2
[2024-05-03 12:46:04.320] [ctranslate2] [thread 11004] [info] - Use Intel MKL: false
[2024-05-03 12:46:04.320] [ctranslate2] [thread 11004] [info] - SGEMM backend: DNNL (packed: false)
[2024-05-03 12:46:04.320] [ctranslate2] [thread 11004] [info] - GEMM_S16 backend: none (packed: false)
[2024-05-03 12:46:04.320] [ctranslate2] [thread 11004] [info] - GEMM_S8 backend: DNNL (packed: false, u8s8 preferred: true)
[2024-05-03 12:46:04.320] [ctranslate2] [thread 11004] [info] GPU #0: NVIDIA GeForce GTX 1060 6GB (CC=6.1)
[2024-05-03 12:46:04.320] [ctranslate2] [thread 11004] [info] - Allow INT8: true
[2024-05-03 12:46:04.320] [ctranslate2] [thread 11004] [info] - Allow FP16: false (with Tensor Cores: false)
[2024-05-03 12:46:04.320] [ctranslate2] [thread 11004] [info] - Allow BF16: false
[2024-05-03 12:46:07.860] [ctranslate2] [thread 11004] [info] Using CUDA allocator: cuda_malloc_async
[2024-05-03 12:46:08.204] [ctranslate2] [thread 11004] [info] Loaded model C:\Users\thaytu\Desktop\Faster-Whisper-XXL_models\faster-whisper-medium on device cuda:0
[2024-05-03 12:46:08.204] [ctranslate2] [thread 11004] [info] - Binary version: 6
[2024-05-03 12:46:08.204] [ctranslate2] [thread 11004] [info] - Model specification revision: 3
[2024-05-03 12:46:08.204] [ctranslate2] [thread 11004] [info] - Selected compute type: int8_float32

Model loaded in: 3.99 seconds

from whisper-standalone-win.

Purfview avatar Purfview commented on May 28, 2024

OK, I'll check it.

from whisper-standalone-win.

yukihana2k3 avatar yukihana2k3 commented on May 28, 2024

Running different models on the same file results in missing segments at different points.

from whisper-standalone-win.

Purfview avatar Purfview commented on May 28, 2024

Test if there are similar problems with Whisper-OpenAI [with medium model] -> https://github.com/Purfview/whisper-standalone-win/releases/tag/Whisper-OpenAI

from whisper-standalone-win.

Purfview avatar Purfview commented on May 28, 2024

missing 03:13.420 to 03:19.480

I don't see anything missing:

Screenshot_4

from whisper-standalone-win.

yukihana2k3 avatar yukihana2k3 commented on May 28, 2024

Can you try with this larger file
https://drive.google.com/file/d/1H2pVXvWxdfTW2YmgGbxKq-HPoeeSbRSq/view?usp=sharing

from whisper-standalone-win.

yukihana2k3 avatar yukihana2k3 commented on May 28, 2024

Test if there are similar problems with Whisper-OpenAI [with medium model] -> https://github.com/Purfview/whisper-standalone-win/releases/tag/Whisper-OpenAI

still missing
image

from whisper-standalone-win.

Purfview avatar Purfview commented on May 28, 2024

Whisper-OpenAI
still missing

Then you can ask there, as it's the reference implementation -> https://github.com/openai/whisper/discussions

Can you try with this larger file

I'll check later.

from whisper-standalone-win.

yukihana2k3 avatar yukihana2k3 commented on May 28, 2024

tks for your support

from whisper-standalone-win.

Purfview avatar Purfview commented on May 28, 2024

I don't see anything missing on that big file too:

Screenshot_5

What version are you using?
And post the exact file with exact command to reproduce the issue.

EDIT:
I see you posted nothing there too, do you think anyone will answer you anything?
I guess some people can't be helped... 😄

from whisper-standalone-win.

yukihana2k3 avatar yukihana2k3 commented on May 28, 2024

Could you please provide me with the SRT file and the prompt you used?

from whisper-standalone-win.

Purfview avatar Purfview commented on May 28, 2024

I used same command as you posted.

from whisper-standalone-win.

yukihana2k3 avatar yukihana2k3 commented on May 28, 2024

Could you send me your SRT file? I wonder if there's any issue with my hardware.

from whisper-standalone-win.

Purfview avatar Purfview commented on May 28, 2024

Could you send me your SRT file?

Deleted those srt files, anyway, you wouldn't see there anything.

I wonder if there's any issue with my hardware.

No. I think the problem is somewhere between a chair and a keyboard, but that's not accurate.

from whisper-standalone-win.

yukihana2k3 avatar yukihana2k3 commented on May 28, 2024

I used prompt --ff_mdx_kim2 --vad_alt_method=pyannote_v3 and then the issue was completely resolved.

from whisper-standalone-win.

Purfview avatar Purfview commented on May 28, 2024

Why sometimes with no apparent reason Whisper doesn't transcribe something is the question for the Whisper devs.
Maybe later I'll post there with the reproducible issue because to your post no one will bother answering.

VAD is irrelevant to your example as there is no spaces in speech to remove anything.
And there is no background noise in your example, probably it's just a random coincidence that --ff_mdx_kim2 helped.

You can check if --ff_rnndn_sh or --ff_rnndn_xiph helps with your issue too.

from whisper-standalone-win.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.