Giter Site home page Giter Site logo

cheshirecc / faster-whisper-gui Goto Github PK

View Code? Open in Web Editor NEW
753.0 6.0 55.0 96.28 MB

faster_whisper GUI with PySide6

License: GNU Affero General Public License v3.0

Python 100.00% QMake 0.01%
faster-whisper openai transcribe vad voice-transcription whisper whisperx asr

faster-whisper-gui's People

Contributors

cheshirecc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

faster-whisper-gui's Issues

0.030版本发现一个BUG

在听写完成后如果我直接点“保存字幕文件”会正常生成字幕文件,但是我如果点击了“whisperX 时间轴对齐”后,在点击“保存字幕文件”会无法生成字幕文件

可否添加一个保存修改后参数的功能

可否添加一个保存修改后参数的功能,有时候使用会涉及修改一些参数,但是每次打开软件都要改一次,能不能添加一个保存修改后参数的功能,谢谢!

0.4.0 WhisperX alignment Error 以及保存字幕文件无反应

转写完成后在WhisperX处理界面分别点击了时间戳对齐和保存字幕文件,前台均提示成功,但没有输出任何文件。
image

log如下:
==========2023-11-14_17:20:49==========
==========TimeStample_Alignment==========

alignment Error
Error: [WinError 2] 系统找不到指定的文件。
UPdata DataModel

==========OutputSubtitleFiles==========

【Over】

==========2023-11-14_17:21:36==========
==========TimeStample_Alignment==========

alignment Error
Error: [WinError 2] 系统找不到指定的文件。
UPdata DataModel

==========OutputSubtitleFiles==========

【Over】


目前我只能手动在temp文件里找转写那一步自动保存的srt。

伟大的项目!

嘿,我只是想感谢你这个伟大的项目!

你能否考虑将 "德国完蛋了!.mp4" 转录截图样本改成争议性较小的内容?这部分视频来自一位极右翼(种族主义)政客的咆哮。
(您可能也不会用特朗普关于**的演讲中的咆哮部分来推广您的软件吧? ;-) )

无论如何,请继续努力。👍

把我桌面的东西全删了

下载并转换模型里面,因为想看看转换出啥所以选的桌面,结果转着转着报错,桌面的东西全没了
有点难绷

下面是log,想搜搜是不是调用了什么rm,果然有
model_size_or_path: tiny
device: cuda
device_index: 0
compute_type: float32
cpu_threads: 4
num_workers: 1
download_root: C:/Users/arr2/.cache/huggingface/hub
local_files_only: True
Exception in thread Thread-34 (go):
Traceback (most recent call last):
File "C:\PROGRA2\FASTER1\threading.py", line 1016, in _bootstrap_inner
File "C:\PROGRA2\FASTER1\threading.py", line 953, in run
File "C:\PROGRA2\FASTER1\faster_whisper_GUI\mainWindows.py", line 197, in go
File "C:\PROGRA2\FASTER1\faster_whisper_GUI\modelLoad.py", line 39, in loadModel
File "C:\PROGRA2\FASTER1\threading.py", line 953, in run
File "C:\PROGRA2\FASTER1\faster_whisper_GUI\modelLoad.py", line 22, in go
File "C:\PROGRA2\FASTER1\faster_whisper\transcribe.py", line 122, in init
File "C:\PROGRA2\FASTER1\faster_whisper\utils.py", line 99, in download_model
File "C:\Program Files (x86)\FasterWhisperGUI\huggingface_hub\utils_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "C:\Program Files (x86)\FasterWhisperGUI\huggingface_hub_snapshot_download.py", line 169, in snapshot_download
with open(ref_path) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/arr2/.cache/huggingface/hub\models--guillaumekln--faster-whisper-tiny\refs\main'
....

Downloading (…)in/added_tokens.json: 100%|##########| 34.6k/34.6k [00:00<00:00, 111kB/s]
...
Downloading (…)cial_tokens_map.json: 0%| | 0.00/2.08k [00:00<?, ?B/s]
Downloading (…)cial_tokens_map.json: 100%|##########| 2.08k/2.08k [00:00<?, ?B/s]
.Exception in thread .Thread-6 (go_2):
Traceback (most recent call last):
File "C:\PROGRA2\FASTER1\threading.py", line 1016, in _bootstrap_inner
File "C:\PROGRA2\FASTER1\threading.py", line 953, in run
File "C:\PROGRA2\FASTER1\faster_whisper_GUI\convertModel.py", line 87, in go_2
File "C:\PROGRA2\FASTER1\ctranslate2\converters\converter.py", line 102, in convert
File "C:\PROGRA2\FASTER1\shutil.py", line 750, in rmtree
File "C:\PROGRA2\FASTER1\shutil.py", line 624, in _rmtree_unsafe
File "C:\PROGRA2\FASTER1\shutil.py", line 622, in _rmtree_unsafe
PermissionError: [WinError 5] 拒绝访问。: 'C:/Users/arr2/Desktop'

Over

2 language transcribe

I have Arabic audio but contains some English sentences and words. in the the previous version with whisper large v2 it could detect those words and transcribe them. now in the new version with whisper v3 it only transcribes the Arabic speech omitting the English words is this issue due to a new program update or the new whisper version?

无法启用Transcription页面中的“音频采集”(Capture Audio)功能

虽然Transcription页面中有“音频采集”的按钮,但是它一直是灰色的,我无论如何尝试都无法启用该功能,我希望您能够 加入/修复 对该功能的支持,因为没有该功能的话就无法做到实时转换而只能对离线文件进行转换,而其他拥有音频采集功能的同类软件又没有处理幻听问题的功能,我希望能使用这个软件对一些视频进行实时转录
image
感谢作者的辛苦付出

ImportError: cannot import name 'TranscriptionInfo' from 'faster_whisper'

有2个问题,帮忙看一下,
尝试安装了requirements.txt 发现里面的很多包有问题。。官网上面找不到这个包。。joblib==1.12.0 要么是1.1.0 要么是1.2.0 没有这个1.12

另外我在启动的过程中会报==========2023-11-14_18:38:45==========
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
Traceback (most recent call last):
File "d:\faster-whisper-GUI-main\FasterWhisperGUI.py", line 81, in
from faster_whisper_GUI.mainWindows import MainWindows
File "d:\faster-whisper-GUI-main\faster_whisper_GUI\mainWindows.py", line 29, in
from faster_whisper import TranscriptionInfo
ImportError: cannot import name 'TranscriptionInfo' from 'faster_whisper' (D:\conda_env\envs\fastgui\lib\site-packages\faster_whisper_init_.py)
我去目录下面看了 确实没有这个TranscriptionInfo 这个是怎么回事?

你好,作者,感谢你的付出!

我发现在我用这个软件的时候,油管下载的MP4视频,用V3大模型,音频语言需要指定 en-english。不然会出现类似其它阿拉伯数字的语种..已开启使用V3模型选项。

加载模型报错

报错 log:
`==========LoadModel==========

model_size_or_path: C:/Users/Extre/Downloads/model-large-v3
device: auto
device_index: 0
compute_type: float32
cpu_threads: 4
num_workers: 1
download_root: C:/Users/Extre/.cache/huggingface/hub
local_files_only: True
Exception in thread Thread-9 (go):
Traceback (most recent call last):
File "C:\PROGRA2\FASTER1\threading.py", line 1016, in _bootstrap_inner
File "C:\PROGRA2\FASTER1\threading.py", line 953, in run
File "C:\PROGRA2\FASTER1\faster_whisper_GUI\mainWindows.py", line 200, in go
File "C:\PROGRA2\FASTER1\faster_whisper_GUI\modelLoad.py", line 40, in loadModel
File "C:\PROGRA2\FASTER1\threading.py", line 953, in run
File "C:\PROGRA2\FASTER1\faster_whisper_GUI\modelLoad.py", line 22, in go
File "C:\PROGRA2\FASTER1\faster_whisper\transcribe.py", line 128, in init
RuntimeError: Unable to open file 'model.bin' in model 'C:/Users/Extre/Downloads/model-large-v3'
`

模型是从 https://huggingface.co/gradjitta/ct2-whisper-large-v3/tree/main 下载的,是在软件“模型参数-Local Model”那里加载的

其他信息:

  • 系统版本:Windows 11 专业工作站版
  • 软件版本:4.0.3
  • 显卡:NVIDIA GeForce MX330,已安装 CUDA

希望增加错误提示和转换耗时

  • 使用的FasterWhisperGUI 0.5.0 版本
  • 使用Transcription碰到的问题
    开始转换后, 如果报错了, 没有提示, 还是显示 开始处理音频, 需要看日志文件才知道, 比如
    ValueError: Frame does not match AudioFifo parameters.
  • 希望能增加转换音频的耗时时间

最后感谢作者的开源贡献

RuntimeError: [json.exception.type_error.305] cannot use operator[] with a string argument with null

Traceback (most recent call last):
File "D:\P\FASTER1\faster_whisper_GUI\transcribe.py", line 333, in run
File "D:\P\FASTER
1\concurrent\futures_base.py", line 621, in result_iterator
File "D:\P\FASTER1\concurrent\futures_base.py", line 319, in _result_or_cancel
File "D:\P\FASTER
1\concurrent\futures_base.py", line 458, in result
File "D:\P\FASTER1\concurrent\futures_base.py", line 403, in __get_result
File "D:\P\FASTER
1\concurrent\futures\thread.py", line 58, in run
File "D:\P\FASTER1\faster_whisper_GUI\transcribe.py", line 207, in transcribe_file
File "D:\P\FASTER
1\faster_whisper\transcribe.py", line 311, in transcribe
RuntimeError: [json.exception.type_error.305] cannot use operator[] with a string argument with null
Demucs可以正常使用。其他报错,使用模型是largev2

不稳定

经常闪退
按照作者的设置后,还是出现经常闪退
已确定不是电脑配置问题
特别是更换模型后,转化过程中必然闪退

提示错误

Traceback (most recent call last):
File "D:\ProgramAI\GUI\fasterwhispergui.py", line 9, in
File "D:\ProgramAI\GUI\faster_whisper_GUI_init_.py", line 1, in
File "D:\ProgramAI\GUI\whisperx_init_.py", line 1, in
File "D:\ProgramAI\GUI\whisperx\transcribe.py", line 6, in
ModuleNotFoundError: No module named 'numpy'

输出的字幕会覆盖空白的区域和Kernel size报错

我们主要用于视频的字幕制作,目前遇到的情况是输出的字幕会把语音前后的时长一并连进去,比如视频里空白了4秒然后说了2秒,那么输出的字幕会在这一共6秒的时间内都显示这两秒语音的内容
企业微信截图_17025497955843
请问需要修改什么设置吗?
感谢开发者

我在2.3版本遇到一个时间轴问题,

1
00:00:07,220 --> 00:00:14,410
2
00:00:14,410 --> 00:00:20,450
3
00:00:20,450 --> 00:00:25,010
4
00:00:25,010 --> 00:00:30,210
5
00:00:30,210 --> 00:00:34,510
开始和结尾的时间重叠,1作者他的视频没有这个问题,那么他剩下的所有视频都没有这个问题。2作者他的视频有这个问题,那么剩下的都有重叠问题。
比如以下,说完了too,字幕应该消失,而不是一直显示。
20
00:01:56,740 --> 00:01:59,380
too.
21
00:01:59,380 --> 00:02:04,940
So if you're looking at the eye from the side, this is kind of where the iris is.
我该如何避免这个问题,2.4时间戳对齐可以解决。

[Bug] AttributeError

在尝试保存文件时,我遇到了一些错误:

version: 11070
torchaudio_cuda_version: 11070
_lfilter_core_cpu_loop:torchaudio._lfilter_core_loop
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
torchvision is not available - cannot save figures
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
language: en_GB

Traceback (most recent call last):
File "C:\FASTER1\qfluentwidgets\components\navigation\navigation_panel.py", line 506, in _onWidgetClicked
AttributeError: 'NoneType' object has no attribute 'isSelectable'
Traceback (most recent call last):
File "C:\FASTER
1\qfluentwidgets\components\navigation\navigation_panel.py", line 506, in _onWidgetClicked
AttributeError: 'NoneType' object has no attribute 'isSelectable'
Traceback (most recent call last):
File "C:\FASTER~1\qfluentwidgets\components\navigation\navigation_panel.py", line 506, in _onWidgetClicked
AttributeError: 'NoneType' object has no attribute 'isSelectable'

==========Speaker_Diarize==========

Speaker diarize and alignment
load speaker brain model...
failed to diarize speaker!
Error: No module named 'asteroid_filterbanks'
UPdata DataModel

==========OutputSubtitleFiles==========

【Over】
Traceback (most recent call last):
File "C:\FASTER~1\qfluentwidgets\components\navigation\navigation_panel.py", line 506, in _onWidgetClicked
AttributeError: 'NoneType' object has no attribute 'isSelectable'



这有时会导致崩溃

multiple speaks seperation failure

log below:

==========2023-10-28_19:44:46==========
==========Speaker_Diarize==========


Speaker diarize and alignment
load speaker brain model...

Could not download 'pyannote/speaker-diarization' pipeline.
It might be because the pipeline is private or gated so make
sure to authenticate. Visit https://hf.co/settings/tokens to
create your access token and retry with:

   >>> Pipeline.from_pretrained('pyannote/speaker-diarization',
   ...                          use_auth_token=YOUR_AUTH_TOKEN)

If this still does not work, it might be because the pipeline is gated:
visit https://hf.co/pyannote/speaker-diarization to accept the user conditions.
speaker diarize...
failed to diarize speaker!
Error: 'NoneType' object is not callable
UPdata DataModel

After change the auth_token to my own accessible token, but still seems unavailable.

AMAZING PROJECT

HELLO MAN...CONGRATULATIONS FOR YOUR AMAZING PROJECT!!!BUT COULD YOU PLEASE TELL ME STEP BY STEP WHAT I HAVE TO DO FOR USE YOUR AMAZING PROJECT?I AM NEW ON THIS AND I DONT KNOW!!!

关于生成字幕文件格式问题

你好,使用这个软件可以批量非常方便,然后结合PROMT 这款离线翻译软件进行英转中文,制作双语字幕,但是在准备翻译生成的srt文件时,PROMT却提示格式不支持,又分别测试了VTT,LRC都提示不支持,但是之前我翻译同样的音频,使用whisperDesktop生成的srt文件却没有这个问题,然而whisperDesktop却不能批量转,这个有点头疼了,想请忙看看问题出在哪里了???

提示RuntimeError: CUDA failed with error no CUDA-capable device is detected,但是我有显卡

==========2023-10-31_16:58:12==========
==========LoadModel==========

model_size_or_path: Z:/Program Files/FasterWhisperGUI/models/faster-whisper/whisper-large-v2-ct2-32
device: cuda
device_index: 0
compute_type: float32
cpu_threads: 4
num_workers: 1
download_root: C:/Users/PKR/.cache/huggingface/hub
local_files_only: True
Exception in thread Thread-1 (go):
Traceback (most recent call last):
File "Z:\Program Files\FasterWhisperGUI\threading.py", line 1016, in _bootstrap_inner
File "Z:\Program Files\FasterWhisperGUI\threading.py", line 953, in run
File "Z:\Program Files\FasterWhisperGUI\faster_whisper_GUI\mainWindows.py", line 197, in go
File "Z:\Program Files\FasterWhisperGUI\faster_whisper_GUI\modelLoad.py", line 39, in loadModel
File "Z:\Program Files\FasterWhisperGUI\threading.py", line 953, in run
File "Z:\Program Files\FasterWhisperGUI\faster_whisper_GUI\modelLoad.py", line 22, in go
File "Z:\Program Files\FasterWhisperGUI\faster_whisper\transcribe.py", line 128, in init
RuntimeError: CUDA failed with error no CUDA-capable device is detected

提示找不到cuda设备,但是我本地环境是有的,而且我自己用pycharm使用fast-whisper项目是可以提取的,如图,通过torch是可以打印设备的,请问一下这个项目的python用的是我本机的还是项目自带的?我本地环境是torch2.1.0+cuda12.1
image

永远加载模型失败

永远加载模型失败,实在不得已前来求助

已经尝试了:

  1. 搜索全网关于 fatser-whisper-GUI 的内容,只找到两篇相关的 b 站专栏,但是里面的内容和本处的 readme 几乎是一样的,帮助很小;
  2. 自行尝试安装了 CUDA 12.2.2、PyTorch 2、CUDNN 8.9.5.29,设置环境变量,重启计算机等等

软件设置:
image

自行在软件根目录创建了 model 文件夹
image

fasterwhispergui.log中的内容:

==========2023-10-04_16:44:51==========
version: 11070
torchaudio_cuda_version: 11070
_lfilter_core_cpu_loop:torchaudio._lfilter_core_loop
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
torchvision is not available - cannot save figures
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.

抱歉,我对深度学习相关的内容没有任何知识储备。但是如果有一个逐步的部署文档,我应该是能像部署其它模型一样使用的

硬件配置是 13900K + 4090,应该不是瓶颈

谢谢!

Model error.

Hey, I got a problem. I can't load the model. Can you please tell me what I did wrong?
32323232
222

能否在转写任务完成时生成临时结果文件?

在进行多任务转写时,在完成一个任务时能否生成一个临时结果文件?
因为在多任务转写,由于意外情况发生(没有转出结果就退出程序、电脑异常崩溃),前面几个转写完成的结果不能保存下来,浪费转写的时间。
所以能否考虑一下添加一个转写完成的临时结果文件的功能?在程序异常退出的时候再次打开程序可以处理上次转写完成的结果。
谢谢!

Support Large v3

Hello!
Yesterday installed the program version 0.7.4, loaded local model faster-whisper-large v3 from HG.
Your application loaded this model. Then I uploaded audio in Russian, the program recognized it and so and hung. Nothing else happened.

A couple of fixes.

I'm honestly not much of a coder, but it took me half an evening to stupidly run it....

1. Libraries in requirements.txt.

pyside6-fluent-widgets
faster-whisper==0.10.0
pyAV
ffmpeg-python
pyAudio
nltk
CTranslate2>=3.21.0
joblib==1.2.0
pyside6==6.4.2
webvtt-py
pandas
transformers
pyannote.audio

and
Cuda 12.1

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

or Cuda 11.8

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

2. Loglines.

WHY DO I NEED TO OUTPUT THE DOWNLOAD LOGS TO A LOG FILE?
Why not just parse the state of the boot state? Or just output to a log within the program?

3. Incorrect import of faster-whisper library.

In many pieces of code the module was imported incorrectly (a lot of moments, I won't describe each one).
From

from faster_whisper import  (Segment, Word)
from typing import List

to

from faster_whisper.transcribe import  (Segment, Word)
from typing import List

4. The problem is cublas.

To be more precise, faster-whisper requires cublas libraries, in Windows they are from the Nvidia Toolkit.
Here is links 12.3 or (preferably her) 11.8 for Europe, for China I don't know which one.

Edit: You can just install pytorch 11.8 and everything will work fine without them

5. I don't understand the translation system.

I haven't figured out how to add it, and the English translation is stupidly missing in a couple places.

That's what I found so far, and what I needed to get up and running, I think there are tons more bugs out there....

faster-whisper-GUI 0.4.0,转换效率很慢

使用同样的模型faster-whisper-large-v2,转换同样时长的音频,用faster-whisper-GUI 0.4.0的话,要1、2个小时,才能转完。

出了字幕以后,因为还要调整,所以需要用到Subtitle Edit。结果发现Subtitle Edit里有一个叫Purfview's Whisper Faster的引擎,于是把模型faster-whisper-large-v2复制到Subtitle Edit指定的目录下使用,还是转换同样时长的音频,结果只用了几分钟,就转完了。

两个软件,在转换的过程中,GPU的实际使用率都在90-100%之间浮动,但转换的时间差别太大,不知道是什么问题。

0.43版本,使用v3模型之后报错,v2模型能正常使用

File "E:\fast\FASTER1\faster_whisper_GUI\transcribe.py", line 347, in run
File "E:\fast\FASTER
1\concurrent\futures_base.py", line 621, in result_iterator
File "E:\fast\FASTER1\concurrent\futures_base.py", line 319, in _result_or_cancel
File "E:\fast\FASTER
1\concurrent\futures_base.py", line 458, in result
File "E:\fast\FASTER1\concurrent\futures_base.py", line 403, in __get_result
File "E:\fast\FASTER
1\concurrent\futures\thread.py", line 58, in run
File "E:\fast\FASTER1\faster_whisper_GUI\transcribe.py", line 261, in transcribe_file
File "E:\fast\FASTER
1\faster_whisper\transcribe.py", line 922, in restore_speech_timestamps
File "E:\fast\FASTER1\faster_whisper\transcribe.py", line 426, in generate_segments
File "E:\fast\FASTER
1\faster_whisper\transcribe.py", line 610, in encode
ValueError: Invalid input features shape: expected an input with shape (1, 128, 3000), but got an input with shape (1, 80, 3000) instead

视频会因为作者在其他帖子里说的问题报错,但是我按指示把视频转换成音频后显示RuntimeError: CUDA failed with error out of memory

==========2023-12-10_08:01:27==========
==========Process==========

redirect std output
vad_filter : True
-threshold : 0.5
-min_speech_duration_ms : 250
-max_speech_duration_s : inf
-min_silence_duration_ms : 2000
-window_size_samples : 1024
-speech_pad_ms : 400
Transcribes options: -audio : ['D:/提取音频/BUENA-143 ☆高画質DVD-ROM☆F●O@ジャッ●・ザ・リッパー.wav'] -language : ja -task : False -beam_size : 5 -best_of : 1 -patience : 1.0 -length_penalty : 1.0 -temperature : [0.0] -compression_ratio_threshold : 2.4 -log_prob_threshold : -1.0 -no_speech_threshold : 0.6 -condition_on_previous_text : False -initial_prompt : None -prefix : None -suppress_blank : True -suppress_tokens : [-1] -without_timestamps : False -max_initial_timestamp : 1.0 -word_timestamps : True -prepend_punctuations : "'“¿([{- -append_punctuations : "'.。,,!!??::”)]}、 -repetition_penalty : 1.0 -no_repeat_ngram_size : 0 -prompt_reset_on_temperature : 0.5create transcribe process with 1 workersstart transcribe processTraceback (most recent call last):
File "D:\FASTER1\faster_whisper_GUI\transcribe.py", line 353, in run
File "D:\FASTER
1\concurrent\futures_base.py", line 621, in result_iterator
File "D:\FASTER1\concurrent\futures_base.py", line 319, in _result_or_cancel
File "D:\FASTER
1\concurrent\futures_base.py", line 458, in result
File "D:\FASTER1\concurrent\futures_base.py", line 403, in __get_result
File "D:\FASTER
1\concurrent\futures\thread.py", line 58, in run
File "D:\FASTER1\faster_whisper_GUI\transcribe.py", line 266, in transcribe_file
File "D:\FASTER
1\faster_whisper\transcribe.py", line 922, in restore_speech_timestamps
File "D:\FASTER1\faster_whisper\transcribe.py", line 538, in generate_segments
File "D:\FASTER
1\faster_whisper\transcribe.py", line 765, in add_word_timestamps
File "D:\FASTER~1\faster_whisper\transcribe.py", line 875, in find_alignment
RuntimeError: CUDA failed with error out of memory

> 为了照顾小显存的朋友(尤其是有批量转写需求的),我在代码中频繁使用了 gc 和内存、显存清理,这可能会拖慢一部分转写效率,但仅仅只是可能原因的一种,因为照顾线程的取消等操作进行的线程检查也有可能造成部分性能问题。如果你想更快速的执行转写工作,可以尝试:

          > 为了照顾小显存的朋友(尤其是有批量转写需求的),我在代码中频繁使用了 gc 和内存、显存清理,这可能会拖慢一部分转写效率,但仅仅只是可能原因的一种,因为照顾线程的取消等操作进行的线程检查也有可能造成部分性能问题。如果你想更快速的执行转写工作,可以尝试:
  1. 无论使用哪种模型,都使用 int8 作为 计算精度,这应该能够较大程度提升转写效率,同时这对转写精度有一定影响,但我认为对于人声清晰、质量较好的音频影响微乎其微。使用 int8 作为计算精度同时也能大幅度降低对显卡显存、内存、显卡计算能力的占用和需求,大约只需要 3GB 显存、原本的 70% 的CUDA计算能力就可以运行转写。
  2. 将温度选项也就是 采样热度候选 设置为一个 0,这可能加速一点点转写速度,同时可能解决部分闪退问题。但同时这也将会使温度回退配置失效,如果模型出现幻听、失败循环(重复同一句话或者无意义语句)请设置 循环提示False
  3. 分块大小 设置 为 1, 实际测试中这确实加速了模型的提取速度,但是有测试认为 分块大小影响模型转写结果,但也有一些相关测试(就是你所说的 Purfview Faster Whisper)认为该值对转写结果影响不大,实际测试中该值越大就越占用计算资源和显存,该值调小之后确实加快了转写速度,但结合 faster-whisper官方说法该值越大可能转写效果就会越好。目前我个人尚未看出对结果的明显影响。
  4. 过去本项目使用 cuda118 作为 CUDA 引擎,彼时转写效率确实高于现在,同时由于 torch 等相关套件进行了字节码编译的原因,运行速度也高于现在的版本,但是由于兼容 wshiperX 引擎的原因,我不得不放弃对 CT2 引擎所进行的兼容修改,重新使用 cuda117 引擎,而 torch 的 GPU 版本再编译已经无法完成而且价值不大,所以放弃了对 torch 及相关其他套件的编译工作,目前该部分程序工作在 python 环境下,但我并不清除这是否与性能瓶颈有关。

关于软件运行过程中的 显卡 GPU 使用率,如果使用 CUDA 引擎,则 CUDA 将会自动完成资源调度,无法干涉。对比 whisperDesktop 软件,其使用 whisper.cpp 后端,不使用 CUDA 引擎,而是使用 Direct 3D 着色渲染器来作为计算器,程序将自行控制资源调度和开销,这样其 GPU 占用始终能维持在接近 100%,但并不一定比使用 CUDA 核心效率更高。不过其作为一个尝试性项目,证明了即使不依赖 英伟达 CUDA 套件也有可能在支持更多设备、更多品牌硬件、更多显卡和CPU型号的基础之上运行深度学习项目。 关于软件转写效率以及目前仍然存在的运行稳定性问题我还将进一步进行的排查。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.