Giter Site home page Giter Site logo

Comments (12)

bigcash avatar bigcash commented on June 13, 2024 1

我这边,使用 v2,v3转到 faster-whisper 的模型,好像也没有 vad 成功。

Name: whisperx Version: 3.1.2

Name: faster-whisper Version: 1.0.1

测试用视频:https://www.youtube.com/watch?v=we8vNy6DYMI

v2 偶尔还会出现乱码 v3 的话,就算设置了 vad 也一样是30s 一个切片片段。

model = WhisperModel(model_size, device="cuda", compute_type="float16") model.feature_extractor.mel_filters = model.feature_extractor.get_mel_filters(model.feature_extractor.sampling_rate, model.feature_extractor.n_fft, n_mels=128) segments, info = model.transcribe(file, vad_filter=True, vad_parameters=dict(min_silence_duration_ms=500), language=language)

您好,我使用ct2-transformers-converter --model BELLE-2--Belle-whisper-large-v3-zh --output_dir BELLE-2--Belle-whisper-large-v3-zh-ct2 --copy_files preprocessor_config.json --quantization float16 这个命令将模型转换为faster-whisper格式,在加载模型时model = WhisperModel(model_size, device="cuda", compute_type="float16")提示错误:Max retries exceeded with url: /openai/whisper-tiny/resolve/main/tokenizer.json,请问为什么还要去huggingface.co下载这个tokenizer.json呀,正确的做法该怎么做呢,谢谢拉

from belle.

houmochenliu avatar houmochenliu commented on June 13, 2024

+1

from belle.

shuaijiang avatar shuaijiang commented on June 13, 2024

根据上面结果,大概原因可能是使用belle-whisper没有做vad切分,所以都是按照最长30秒做的识别,这样有一定的影响。
建议把belle-whisper转为fasterwhisper模型格式,基于faster-whisper框架去做推理,faster-whisper内置了vad 模块。速度和效果都有一定保证。

from belle.

chenquan avatar chenquan commented on June 13, 2024

根据上面结果,大概原因可能是使用belle-whisper没有做vad切分,所以都是按照最长30秒做的识别,这样有一定的影响。 建议把belle-whisper转为fasterwhisper模型格式,基于faster-whisper框架去做推理,faster-whisper内置了vad 模块。速度和效果都有一定保证。

belle-whisper转为fasterwhisper模型格式,请问这个怎么处理呢?有相关的技术资料吗?

from belle.

chenquan avatar chenquan commented on June 13, 2024

根据上面结果,大概原因可能是使用belle-whisper没有做vad切分,所以都是按照最长30秒做的识别,这样有一定的影响。 建议把belle-whisper转为fasterwhisper模型格式,基于faster-whisper框架去做推理,faster-whisper内置了vad 模块。速度和效果都有一定保证。

belle-whisper转为fasterwhisper模型格式,请问这个怎么处理呢?有相关的技术资料吗?

ct2-transformers-converter --model BELLE-2/Belle-whisper-large-v2-zh --output_dir Belle-whisper-large-v2-ct2 --copy_files  preprocessor_config.json --quantization int8_float32

https://opennmt.net/CTranslate2/quantization.html#quantize-on-model-conversion

from belle.

drilistbox avatar drilistbox commented on June 13, 2024

但是whisper里默认是有vad的呀,你是指belle-whisper里把vad去掉了?

from belle.

shuaijiang avatar shuaijiang commented on June 13, 2024

你说的应该是 timestamps, belle-whisper 微调时没有进一步优化timestamp。如果需要timestamps需要在推理时主动打开。faster-whisper框架有vad,切分效果更好一些。所以建议用faster-whisper框架调用belle-whisper

from belle.

drilistbox avatar drilistbox commented on June 13, 2024

多谢大佬 我试试

from belle.

dogvane avatar dogvane commented on June 13, 2024

我这边,使用 v2,v3转到 faster-whisper 的模型,好像也没有 vad 成功。

Name: whisperx
Version: 3.1.2

Name: faster-whisper
Version: 1.0.1

测试用视频:https://www.youtube.com/watch?v=we8vNy6DYMI

v2 偶尔还会出现乱码
v3 的话,就算设置了 vad 也一样是30s 一个切片片段。

model = WhisperModel(model_size, device="cuda", compute_type="float16")
model.feature_extractor.mel_filters = model.feature_extractor.get_mel_filters(model.feature_extractor.sampling_rate, model.feature_extractor.n_fft, n_mels=128)
segments, info = model.transcribe(file, vad_filter=True, vad_parameters=dict(min_silence_duration_ms=500), language=language)

from belle.

Xuyaoyan avatar Xuyaoyan commented on June 13, 2024

e: whispe

请问你是怎么转的,我自己用命令行转没成功

from belle.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.