Comments (24)
Sometimes the last part of segments is missing 8 - 10s in the transcript. This can happen with all files over 20s
Share an audio sample where it happens in first ~5 mins and when you use --compute_type int8_float32
!
The discrepancy occurs in different segments with different models.
Did you tried medium
model?
I used: --initial_prompt=none
It should be None
.
from whisper-standalone-win.
Share an audio sample where it happens in first ~5 mins and when you use
--compute_type int8_float32
!
I find that this error is less common in smaller files than in larger ones, although even in large files, the error tends to occur more towards the end of the file.
[03:10.740 --> 03:12.500] 秦國豔眼含笑地看著秦百玉
[03:12.500 --> 03:13.420] 問他有什麼想法
Processing segment at 03:19.080
[03:19.480 --> 03:20.960] 隨後秦百玉感慨到
[03:20.960 --> 03:22.160] 他們家老四向來孤傲
Did you tried
medium
model?
I just ran medium model and found that the error frequency is still higher.
[05:36.840 --> 05:37.640] 她脚步一顿
[05:37.640 --> 05:39.080] 渣男陪堂聚议的轮廓
Processing segment at 05:46.760
[05:46.760 --> 05:47.820] 你们的事我不感兴趣
[05:47.820 --> 05:48.480] 我还有事
[06:53.860 --> 06:55.500] 秦百玉随手放下文件
[06:55.500 --> 06:56.000] 下一秒
Processing segment at 07:01.600
[07:01.600 --> 07:02.120] 小女人
[07:02.120 --> 07:02.860] 纹身僵硬
[10:52.640 --> 10:53.880] 至于她有没有觉得不愉快
[10:53.880 --> 10:54.700] 我就不清楚了
Processing segment at 10:59.960
[11:00.000 --> 11:01.180] 您看一下这张照片
[11:01.180 --> 11:02.700] 这辆车是你昨天乘坐的吗
...
from whisper-standalone-win.
I meant share the audio, not the text copy/pastes.
from whisper-standalone-win.
It's here bro
https://drive.google.com/file/d/1H-tIwn-UiOMmVGEVbPW1vf5NnrqDYvmE/view?usp=sharing
from whisper-standalone-win.
And where it's missing anything and what settings are used?
from whisper-standalone-win.
my prompt
--initial_prompt=none --language=Chinese --verbose=true --reprompt=0 --model=large-v2
[03:10.740 --> 03:12.500] 秦國豔眼含笑地看著秦百玉
[03:12.500 --> 03:13.420] 問他有什麼想法
Processing segment at 03:19.080
[03:19.480 --> 03:20.960] 隨後秦百玉感慨到
[03:20.960 --> 03:22.160] 他們家老四向來孤傲
from whisper-standalone-win.
missing 03:13.420 to 03:19.480
秦百玉看着晏十七
捏紧了她的手指
露出柔和的笑容
说道
她说的
我没意见
长辈们意识都愣住了
from whisper-standalone-win.
my prompt
--initial_prompt=none --language=Chinese --verbose=true --reprompt=0 --model=large-v2
That's not what I wrote in the second post, you may want to read it again.
Use --compute_type int8_float32
and model medium
.
And "none" is not "None".
from whisper-standalone-win.
Share an audio sample where it happens in first ~5 mins and when you use
--compute_type int8_float32
!I find that this error is less common in smaller files than in larger ones, although even in large files, the error tends to occur more towards the end of the file.
[03:10.740 --> 03:12.500] 秦國豔眼含笑地看著秦百玉 [03:12.500 --> 03:13.420] 問他有什麼想法 Processing segment at 03:19.080 [03:19.480 --> 03:20.960] 隨後秦百玉感慨到 [03:20.960 --> 03:22.160] 他們家老四向來孤傲
Did you tried
medium
model?I just ran medium model and found that the error frequency is still higher.
[05:36.840 --> 05:37.640] 她脚步一顿 [05:37.640 --> 05:39.080] 渣男陪堂聚议的轮廓 Processing segment at 05:46.760 [05:46.760 --> 05:47.820] 你们的事我不感兴趣 [05:47.820 --> 05:48.480] 我还有事
[06:53.860 --> 06:55.500] 秦百玉随手放下文件 [06:55.500 --> 06:56.000] 下一秒 Processing segment at 07:01.600 [07:01.600 --> 07:02.120] 小女人 [07:02.120 --> 07:02.860] 纹身僵硬
[10:52.640 --> 10:53.880] 至于她有没有觉得不愉快 [10:53.880 --> 10:54.700] 我就不清楚了 Processing segment at 10:59.960 [11:00.000 --> 11:01.180] 您看一下这张照片 [11:01.180 --> 11:02.700] 这辆车是你昨天乘坐的吗 ...
There's a bit of misunderstanding here. In the long file, I've already addressed them in this post and encountered more errors than before, so I won't redo it with the shorter file anymore.
With prompt --compute_type int8_float32, it's currently set as default on my computer
Supported compute types by GPU: {'int8_float32', 'float32', 'int8'}
[2024-05-03 12:46:04.320] [ctranslate2] [thread 11004] [info] CPU: AuthenticAMD (SSE4.1=true, AVX=true, AVX2=true, AVX512=false)
[2024-05-03 12:46:04.320] [ctranslate2] [thread 11004] [info] - Selected ISA: AVX2
[2024-05-03 12:46:04.320] [ctranslate2] [thread 11004] [info] - Use Intel MKL: false
[2024-05-03 12:46:04.320] [ctranslate2] [thread 11004] [info] - SGEMM backend: DNNL (packed: false)
[2024-05-03 12:46:04.320] [ctranslate2] [thread 11004] [info] - GEMM_S16 backend: none (packed: false)
[2024-05-03 12:46:04.320] [ctranslate2] [thread 11004] [info] - GEMM_S8 backend: DNNL (packed: false, u8s8 preferred: true)
[2024-05-03 12:46:04.320] [ctranslate2] [thread 11004] [info] GPU #0: NVIDIA GeForce GTX 1060 6GB (CC=6.1)
[2024-05-03 12:46:04.320] [ctranslate2] [thread 11004] [info] - Allow INT8: true
[2024-05-03 12:46:04.320] [ctranslate2] [thread 11004] [info] - Allow FP16: false (with Tensor Cores: false)
[2024-05-03 12:46:04.320] [ctranslate2] [thread 11004] [info] - Allow BF16: false
[2024-05-03 12:46:07.860] [ctranslate2] [thread 11004] [info] Using CUDA allocator: cuda_malloc_async
[2024-05-03 12:46:08.204] [ctranslate2] [thread 11004] [info] Loaded model C:\Users\thaytu\Desktop\Faster-Whisper-XXL_models\faster-whisper-medium on device cuda:0
[2024-05-03 12:46:08.204] [ctranslate2] [thread 11004] [info] - Binary version: 6
[2024-05-03 12:46:08.204] [ctranslate2] [thread 11004] [info] - Model specification revision: 3
[2024-05-03 12:46:08.204] [ctranslate2] [thread 11004] [info] - Selected compute type: int8_float32
Model loaded in: 3.99 seconds
from whisper-standalone-win.
OK, I'll check it.
from whisper-standalone-win.
Running different models on the same file results in missing segments at different points.
from whisper-standalone-win.
Test if there are similar problems with Whisper-OpenAI [with medium model] -> https://github.com/Purfview/whisper-standalone-win/releases/tag/Whisper-OpenAI
from whisper-standalone-win.
missing 03:13.420 to 03:19.480
I don't see anything missing:
from whisper-standalone-win.
Can you try with this larger file
https://drive.google.com/file/d/1H2pVXvWxdfTW2YmgGbxKq-HPoeeSbRSq/view?usp=sharing
from whisper-standalone-win.
Test if there are similar problems with Whisper-OpenAI [with medium model] -> https://github.com/Purfview/whisper-standalone-win/releases/tag/Whisper-OpenAI
from whisper-standalone-win.
Whisper-OpenAI
still missing
Then you can ask there, as it's the reference implementation -> https://github.com/openai/whisper/discussions
Can you try with this larger file
I'll check later.
from whisper-standalone-win.
tks for your support
from whisper-standalone-win.
I don't see anything missing on that big file too:
What version are you using?
And post the exact file with exact command to reproduce the issue.
EDIT:
I see you posted nothing there too, do you think anyone will answer you anything?
I guess some people can't be helped... 😄
from whisper-standalone-win.
Could you please provide me with the SRT file and the prompt you used?
from whisper-standalone-win.
I used same command as you posted.
from whisper-standalone-win.
Could you send me your SRT file? I wonder if there's any issue with my hardware.
from whisper-standalone-win.
Could you send me your SRT file?
Deleted those srt files, anyway, you wouldn't see there anything.
I wonder if there's any issue with my hardware.
No. I think the problem is somewhere between a chair and a keyboard, but that's not accurate.
from whisper-standalone-win.
I used prompt --ff_mdx_kim2 --vad_alt_method=pyannote_v3 and then the issue was completely resolved.
from whisper-standalone-win.
Why sometimes with no apparent reason Whisper doesn't transcribe something is the question for the Whisper devs.
Maybe later I'll post there with the reproducible issue because to your post no one will bother answering.
VAD is irrelevant to your example as there is no spaces in speech to remove anything.
And there is no background noise in your example, probably it's just a random coincidence that --ff_mdx_kim2
helped.
You can check if --ff_rnndn_sh
or --ff_rnndn_xiph
helps with your issue too.
from whisper-standalone-win.
Related Issues (20)
- Missing whole parts of the text [r186.1] HOT 28
- how to make exe HOT 1
- Allow `--highlight_words true` with `--sentence` HOT 6
- Using distil-whisper HOT 5
- Faster-Whisper-XXL test2: Error code 126 HOT 1
- Whisper: Add support for a new model HOT 1
- Errors on DTS audio tracks HOT 6
- Error when running faster whisper r192.3
- a request: Purfview Whisper Live ? HOT 1
- Named Pipes are not recognized HOT 1
- Americans with Disabilities Act (ADA) guidelines, for subtitles HOT 1
- Repeated output issue HOT 1
- Is wisper-standalone-win is closed source? HOT 1
- transcription as best as possible HOT 5
- My computer freezes when transcripts process starts HOT 3
- new Whisper old problems HOT 8
- --highlight_words true --max_line_width 43 --max_line_count 2 HOT 17
- How to make the sentence segmentation more precise HOT 1
- cuBLAS dll file takes too much space HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from whisper-standalone-win.