Giter Site home page Giter Site logo

Comments (14)

wanghuii1 avatar wanghuii1 commented on July 30, 2024

电视剧、综艺上效果有限,尤其带有背景音干扰的。可以的话提供下具体音频

from 3d-speaker.

yfchenlucky avatar yfchenlucky commented on July 30, 2024

补充一下:可以变换run.sh中的speaker_model_id=damo/speech_campplus_sv_zh-cn_16k-common来选择使用eres2net or cam++;也可以使用modelscope开源模型识别:https://modelscope.cn/models/damo/speech_campplus_speaker-diarization_common/summary 可能会取得更好的效果,have a try~

from 3d-speaker.

lucasjinreal avatar lucasjinreal commented on July 30, 2024

感觉没有funcasr的分离效果好,即便funasr也比较差

from 3d-speaker.

yfchenlucky avatar yfchenlucky commented on July 30, 2024

不会啊?funasr就是调用我们的模型,你可以再次尝试一下不同的speaker_model,通过修改代码中的speaker_model_id。进一步, https://modelscope.cn/models/damo/speech_campplus_speaker-diarization_common/summary 该模型有后处理,你试下。如果方便可以提供具体音频吗?

from 3d-speaker.

lucasjinreal avatar lucasjinreal commented on July 30, 2024

@yfchenlucky 我目前用的是这个pipeline,效果一般,大佬有何建议吗

pipeline_ms = pipeline(
    task=Tasks.auto_speech_recognition,
    model="damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn",
    model_revision="v0.0.2",
    vad_model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch",
    punc_model="damo/punc_ct-transformer_cn-en-common-vocab471067-large",
    output_dir="results",
)

paramformer 和 campplus 效果差别大吗?哪个更好

from 3d-speaker.

yfchenlucky avatar yfchenlucky commented on July 30, 2024

当前funasr中speaker diarization部分使用的是campplus模型,paraformer是用来做语音识别的,campplus是说话人分离。

  1. 你可以在本仓库中使用campplus和eres2net来提取speaker embedding做分离任务,具体修改 https://github.com/alibaba-damo-academy/3D-Speaker/blob/main/egs/3dspeaker/speaker-diarization/run.sh 第38行 speaker_model_id。
  2. 如果上述方法识别效果不好,可以尝试加入说话人转换点定位模型,可以使用modelscope中:https://modelscope.cn/models/damo/speech_campplus_speaker-diarization_common/summaryhttps://modelscope.cn/models/damo/speech_eres2net-large_speaker-diarization_common/summary 模型,按照readme中的快速体验模型效果就可以进行说话人分离。
  3. 如果上述两种方法都不能达到您的要求,您可以提供音频给我们,帮助你分析一下,针对背景音干扰音频,分离性能确实有限。

from 3d-speaker.

lucasjinreal avatar lucasjinreal commented on July 30, 2024

@yfchenlucky 我用了这个pipeline,看起来der比上面那个稍微好一点:

sd_pipeline = pipeline(
task='speaker-diarization',
model='damo/speech_campplus_speaker-diarization_common',
model_revision='v1.0.0'
)

这是为啥?如果都是i一个模型的话?

from 3d-speaker.

yfchenlucky avatar yfchenlucky commented on July 30, 2024

ModelScope上发布的这个模型'damo/speech_campplus_speaker-diarization_common'包含说话人转换点定位模型,所以性能会更优一点。

from 3d-speaker.

jpyjpr avatar jpyjpr commented on July 30, 2024

ModelScope上发布的这个模型'damo/speech_campplus_speaker-diarization_common'包含说话人转换点定位模型,所以性能会更优一点。

iic/speech_campplus_speaker-diarization_common 以及iic/speech_eres2net-large_speaker-diarization_common 这两个模型能在funasr里面组合asr、vad等一起使用吗?

from 3d-speaker.

yfchenlucky avatar yfchenlucky commented on July 30, 2024

在funasr中已经集成campplus模型,正在集成eres2net-large模型,详情见https://github.com/alibaba-damo-academy/FunASR 具体组合需求可以进funasr钉钉群提。

from 3d-speaker.

jpyjpr avatar jpyjpr commented on July 30, 2024

在funasr中已经集成campplus模型,正在集成eres2net-large模型,详情见https://github.com/alibaba-damo-academy/FunASR 具体组合需求可以进funasr钉钉群提。

好的谢谢!

from 3d-speaker.

jpyjpr avatar jpyjpr commented on July 30, 2024

在funasr中已经集成campplus模型,正在集成eres2net-large模型,详情见https://github.com/alibaba-damo-academy/FunASR 具体组合需求可以进funasr钉钉群提。

如果想在funasr框架下使用iic/speech_campplus_speaker-diarization_common,spk_model参数值应该怎么写?还有后面的版本?

如果写cam++ 默认下载iic/speech_campplus_sv_zh-cn_16k-common

paraformer_model = AutoModel(model="/mydata/model/download/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch", model_revision="v2.0.4", \
                vad_model="/mydata/model/download/speech_fsmn_vad_zh-cn-16k-common-pytorch", vad_model_revision="v2.0.4", \
                punc_model="/mydata/model/download/punc_ct-transformer_zh-cn-common-vocab272727-pytorch", punc_model_revision="v2.0.4",\
                spk_model="", spk_model_revision="")

from 3d-speaker.

yfchenlucky avatar yfchenlucky commented on July 30, 2024

具体可以参考funasr示例或者从上述github链接中进入钉钉群询问,会有专业同学答疑,感谢您的提问,如果觉得有帮助可以give a star!

from 3d-speaker.

dfengpo avatar dfengpo commented on July 30, 2024

社区软件包是还不支持结合说话人模型iic/speech_campplus_sv_zh-cn_16k-common实现说话日志功能吗?
我加了spk_model参数,报错了
PARSE ERROR: Argument: --spk_model
Couldn't find match for argument

from 3d-speaker.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.