Comments (14)
电视剧、综艺上效果有限,尤其带有背景音干扰的。可以的话提供下具体音频
from 3d-speaker.
补充一下:可以变换run.sh中的speaker_model_id=damo/speech_campplus_sv_zh-cn_16k-common来选择使用eres2net or cam++;也可以使用modelscope开源模型识别:https://modelscope.cn/models/damo/speech_campplus_speaker-diarization_common/summary 可能会取得更好的效果,have a try~
from 3d-speaker.
感觉没有funcasr的分离效果好,即便funasr也比较差
from 3d-speaker.
不会啊?funasr就是调用我们的模型,你可以再次尝试一下不同的speaker_model,通过修改代码中的speaker_model_id。进一步, https://modelscope.cn/models/damo/speech_campplus_speaker-diarization_common/summary 该模型有后处理,你试下。如果方便可以提供具体音频吗?
from 3d-speaker.
@yfchenlucky 我目前用的是这个pipeline,效果一般,大佬有何建议吗
pipeline_ms = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn",
model_revision="v0.0.2",
vad_model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch",
punc_model="damo/punc_ct-transformer_cn-en-common-vocab471067-large",
output_dir="results",
)
paramformer 和 campplus 效果差别大吗?哪个更好
from 3d-speaker.
当前funasr中speaker diarization部分使用的是campplus模型,paraformer是用来做语音识别的,campplus是说话人分离。
- 你可以在本仓库中使用campplus和eres2net来提取speaker embedding做分离任务,具体修改 https://github.com/alibaba-damo-academy/3D-Speaker/blob/main/egs/3dspeaker/speaker-diarization/run.sh 第38行 speaker_model_id。
- 如果上述方法识别效果不好,可以尝试加入说话人转换点定位模型,可以使用modelscope中:https://modelscope.cn/models/damo/speech_campplus_speaker-diarization_common/summary 和 https://modelscope.cn/models/damo/speech_eres2net-large_speaker-diarization_common/summary 模型,按照readme中的快速体验模型效果就可以进行说话人分离。
- 如果上述两种方法都不能达到您的要求,您可以提供音频给我们,帮助你分析一下,针对背景音干扰音频,分离性能确实有限。
from 3d-speaker.
@yfchenlucky 我用了这个pipeline,看起来der比上面那个稍微好一点:
sd_pipeline = pipeline(
task='speaker-diarization',
model='damo/speech_campplus_speaker-diarization_common',
model_revision='v1.0.0'
)
这是为啥?如果都是i一个模型的话?
from 3d-speaker.
ModelScope上发布的这个模型'damo/speech_campplus_speaker-diarization_common'包含说话人转换点定位模型,所以性能会更优一点。
from 3d-speaker.
ModelScope上发布的这个模型'damo/speech_campplus_speaker-diarization_common'包含说话人转换点定位模型,所以性能会更优一点。
iic/speech_campplus_speaker-diarization_common 以及iic/speech_eres2net-large_speaker-diarization_common 这两个模型能在funasr里面组合asr、vad等一起使用吗?
from 3d-speaker.
在funasr中已经集成campplus模型,正在集成eres2net-large模型,详情见https://github.com/alibaba-damo-academy/FunASR 具体组合需求可以进funasr钉钉群提。
from 3d-speaker.
在funasr中已经集成campplus模型,正在集成eres2net-large模型,详情见https://github.com/alibaba-damo-academy/FunASR 具体组合需求可以进funasr钉钉群提。
好的谢谢!
from 3d-speaker.
在funasr中已经集成campplus模型,正在集成eres2net-large模型,详情见https://github.com/alibaba-damo-academy/FunASR 具体组合需求可以进funasr钉钉群提。
如果想在funasr框架下使用iic/speech_campplus_speaker-diarization_common,spk_model参数值应该怎么写?还有后面的版本?
如果写cam++ 默认下载iic/speech_campplus_sv_zh-cn_16k-common
paraformer_model = AutoModel(model="/mydata/model/download/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch", model_revision="v2.0.4", \
vad_model="/mydata/model/download/speech_fsmn_vad_zh-cn-16k-common-pytorch", vad_model_revision="v2.0.4", \
punc_model="/mydata/model/download/punc_ct-transformer_zh-cn-common-vocab272727-pytorch", punc_model_revision="v2.0.4",\
spk_model="", spk_model_revision="")
from 3d-speaker.
具体可以参考funasr示例或者从上述github链接中进入钉钉群询问,会有专业同学答疑,感谢您的提问,如果觉得有帮助可以give a star!
from 3d-speaker.
社区软件包是还不支持结合说话人模型iic/speech_campplus_sv_zh-cn_16k-common实现说话日志功能吗?
我加了spk_model参数,报错了
PARSE ERROR: Argument: --spk_model
Couldn't find match for argument
from 3d-speaker.
Related Issues (20)
- 使用speaker diarization结合视频的DER结果效果比单音频的还要差,请问这可以微调嘛? HOT 3
- 使用speaker diarization结合视频的DER结果效果比单音频的还要差,请问这可以微调嘛?
- 关于切分subseg的问题 HOT 1
- 关于人脸相关模型输入通道的问题。 HOT 1
- support real-time speaker diarization? HOT 1
- 数据集 HOT 3
- 有没有ERes2NetV2,m_channels = 32,在200k-Spkrs上面训练的模型发布? HOT 4
- 客户端没有所需的特权
- For ERes2NetV2 performance on short-duration wavs HOT 2
- SELF-DISTILLATION NETWORK WITH ENSEMBLE PROTOTYPES: LEARNING ROBUST SPEAKER REPRESENTATIONS WITHOUT SUPERVISION HOT 2
- 流式说话人识别可以实现吗? HOT 1
- 关于ERes2Net_VOX模型的效果问题 HOT 4
- Assertion error
- Inference index info in indentification from trained model HOT 5
- Numbers of speakers HOT 1
- GPU requirement for sv-eres2netv2 HOT 3
- 请教language-identification语料时数问题 HOT 2
- 训练问题 HOT 3
- 请教多模态说话人日志处理问题 HOT 1
- 训练CAM++时的问题 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from 3d-speaker.