modelscope / 3d-speaker Goto Github PK
View Code? Open in Web Editor NEWA Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
License: Apache License 2.0
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
License: Apache License 2.0
隔壁cam++推理时,可以直接输入提取好的音频numpy数组进行推理。
希望eres2net也可以统一加上,谢谢
speaklab/bin下的compute_score_metrics.py输入输出参数具体是什么呢?有没有样例参考参考。。感谢~
我复现了一遍代码,我用dino框架,然后mutil crop :local 2条 (2s) global 1条(3s) ecapa(512). 没有用rdino。 最终的结果是5.0 请问算是一个正常的结果吗
请问cam++ 适合做文本相关的说话人确认任务吗?
您好,我再训练完dino之后想用label数据fine-tune,如何加载之前的模型.pth
Thank you for your excellent work. 🙂
We have observed that whenever we resume training with a different number of epochs after training completion, the loaded historical model exhibits significantly lower accuracy compared to the corresponding epoch during the original training. For instance, when loading a model trained for 100 epochs, its performance is only comparable to that of a model trained for 30 epochs.
This inconsistency in performance after resuming training poses a challenge for us to continue training from a checkpoint and obtain the desired results.
is speech_campplus_speaker-diarization_common onnx model available?
/opt/conda/lib/python3.10/site-packages/sklearn/cluster/_kmeans.py:1416: FutureWarning: The default value of n_init
will change from 10 to 'auto' in 1.4. Set the value of n_init
explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
Traceback (most recent call last):
File "/vepfs/code/MossFormer/3D-Speaker/egs/3dspeaker/speaker-diarization/local/cluster_and_postprocess_h5.py", line 93, in audio_only_func_getnums
labels = cluster(embeddings)
File "/vepfs/code/MossFormer/3D-Speaker/speakerlab/process/cluster.py", line 186, in call
labels = self.filter_minor_cluster(labels, X, self.min_cluster_size)
File "/vepfs/code/MossFormer/3D-Speaker/speakerlab/process/cluster.py", line 203, in filter_minor_cluster
major_center = np.stack([x[labels == i].mean(0)
File "/opt/conda/lib/python3.10/site-packages/numpy/core/shape_base.py", line 445, in stack
raise ValueError('need at least one array to stack')
ValueError: need at least one array to stack
During handling of the above exception, another exception occurred:
===========================
运行这块时,labels = cluster(embeddings) # embedding [14, 192]
报了上边的错误;
请问是什么导致的问题,如何解决呢?
Hi My name is Nathan. And i try to test 3d-speaker to get rttm from pretrained model on model scope.
But i get error as below.
(3D-Speaker) [asr@0419bb3cf325 speaker-diarization]$ bash run.sh
Stage 1: Prepare input wavs...
--2024-02-05 09:07:39-- https://modelscope.cn/api/v1/models/damo/speech_eres2net-large_speaker-diarization_common/repo?Revision=master&FilePath=examples/2speakers_example.wav
Resolving modelscope.cn (modelscope.cn)... 39.101.130.40
Connecting to modelscope.cn (modelscope.cn)|39.101.130.40|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2528044 (2.4M) [application/octet-stream]
Saving to: 'examples/2speakers_example.wav'
100%[===========================================================================>] 2,528,044 831KB/s in 3.0s
2024-02-05 09:07:43 (831 KB/s) - 'examples/2speakers_example.wav' saved [2528044/2528044]
--2024-02-05 09:07:43-- https://modelscope.cn/api/v1/models/damo/speech_eres2net-large_speaker-diarization_common/repo?Revision=master&FilePath=examples/2speakers_example.rttm
Resolving modelscope.cn (modelscope.cn)... 39.101.130.40
Connecting to modelscope.cn (modelscope.cn)|39.101.130.40|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 380 [application/octet-stream]
Saving to: 'examples/2speakers_example.rttm'
100%[===========================================================================>] 380 --.-K/s in 0s
2024-02-05 09:07:44 (40.0 MB/s) - 'examples/2speakers_example.rttm' saved [380/380]
Stage2: Do vad for input wavs...
2024-02-05 09:07:46,885 - modelscope - INFO - PyTorch version 1.13.1 Found.
2024-02-05 09:07:46,886 - modelscope - INFO - Loading ast index from /home/asr/.cache/modelscope/ast_indexer
2024-02-05 09:07:47,056 - modelscope - INFO - Updating the files for the changes of local files, first time updating will take longer time! Please wait till updating done!
2024-02-05 09:07:47,083 - modelscope - INFO - AST-Scanning the path "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/modelscope" with the following sub folders ['models', 'metrics', 'pipelines', 'preprocessors', 'trainers', 'msdatasets', 'exporters']
2024-02-05 09:08:18,037 - modelscope - INFO - Scanning done! A number of 964 components indexed or updated! Time consumed 30.954344987869263s
2024-02-05 09:08:18,114 - modelscope - INFO - Loading done! Current index file version is 1.12.0, with md5 ccb085697b83dbefd09232fac3402a63 and a total number of 964 components indexed
Please install rotary_embedding_torch by:
pip install -U rotary_embedding_torch
Please install rotary_embedding_torch by:
pip install -U rotary_embedding_torch
Please Requires the ffmpeg CLI and ffmpeg-python
package to be installed.
Please install rotary_embedding_torch by:
pip install -U rotary_embedding_torch
Please install rotary_embedding_torch by:
pip install -U rotary_embedding_torch
2024-02-05 09:08:22,477 - modelscope - WARNING - Model revision not specified, use revision: v2.0.4
2024-02-05 09:08:22,825 - modelscope - INFO - initiate model from /home/asr/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch
2024-02-05 09:08:22,826 - modelscope - INFO - initiate model from location /home/asr/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch.
2024-02-05 09:08:22,827 - modelscope - INFO - initialize model from /home/asr/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch
2024-02-05 09:08:22,874 - modelscope - WARNING - No preprocessor field found in cfg.
2024-02-05 09:08:22,875 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.
2024-02-05 09:08:22,875 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': '/home/asr/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch'}. trying to build by task and model information.
2024-02-05 09:08:22,875 - modelscope - WARNING - No preprocessor key ('funasr', 'voice-activity-detection') found in PREPROCESSOR_MAP, skip building preprocessor.
2024-02-05 09:08:22,876 - modelscope - INFO - cuda is not available, using cpu instead.
[INFO]: Start computing VAD...
rtf_avg: 0.043: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.22it/s]
Traceback (most recent call last):
File "local/voice_activity_detection.py", line 90, in
main()
File "local/voice_activity_detection.py", line 71, in main
for vad_t in vad_time['text']:
TypeError: list indices must be integers or slices, not str
if i print "vad_time", I get check
[{'key': 'rand_key_2yW4Acq9GFz6Y', 'value': [[5240, 29010], [29290, 37360], [37640, 67570], [67860, 78980]]}]
I don't understand meaning of text.
Please check this problem.
Thank you.
Hello, thank you for the wonderful repository! It really helped.
Currently, our team is trying to fine-tune ERes2Net-200k published in modelscope
using a large amount of speech data. As I was not able to fine-tune properly, I think that several parameters within the configuration need to be modified for the task. Could you please share those details? If my fine-tuning is successful with good results, I will share the methodologies for the community.
您好:
首先特别感谢您在modelscope中贡献的模型及代码。
我看到modelscope最近更新了ERes2Net的250k模型:“speech_eres2net_base_250k_sv_zh-cn_16k-common”
下面是使用该模型在本地推理的代码:
model_id=damo/speech_eres2net_base_250k_sv_zh-cn_16k-common
python speakerlab/bin/infer_sv.py --model_id $model_id --wavs $wav_path
但是我发现这个模型缺少了一些必要命令,如:
ERes2Net_Large_3D_Speaker = {
'obj': 'speakerlab.models.eres2net.ResNet.ERes2Net',
'args': {
'feat_dim': 80,
'embedding_size': 512,
'm_channels': 64,
}
和
supports = {...}
希望得到您的帮助,非常感谢~
I ran the exact same script for the EResNet experiment on VoxCeleb. The EER and minDCF I got is 1.0105 and 0.1146, which is not comparable to the paper. The only difference is that I trained the model on 4 A100 machines, but I doubt that is the reason behind. Can you please provide the train.log
and train_epoch.log
files?
I also notice that in prepare_data_csv.csv, the default segment duration is 4 seconds, but in conf/eres2net.yaml it's 3 seconds. May I ask why is that?
在用torch.load 载入模型speech_eres2net_sv_zh-cn_16k-common时,报错_pickle.UnpicklingError: invalid load key, '\x08'.。请问下这个有遇到过吗?环境信息:Python 3.10.9、torch 1.12.1。
而用同样的代码载入speech_campplus_sv_zh-cn_16k-common这个模型就没问题
3d_speaker里有些audio clip没有相应的转录文本
使用了最新的Funasr==1.0.4,需要补充model_revision和修改vad_pipeline(wpath),但是在执行第六步的时候,会出现这样的报错,换成旧的0.8.8也是无法执行
Stage 1: Prepare input wavs...
--2024-01-30 18:07:32-- https://modelscope.cn/api/v1/models/damo/speech_eres2net-large_speaker-diarization_common/repo?Revision=master&FilePath=examples/example.wav
正在解析主机 modelscope.cn (modelscope.cn)... 39.101.130.40
正在连接 modelscope.cn (modelscope.cn)|39.101.130.40|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:30720078 (29M) [application/octet-stream]
正在保存至: “examples/example.wav”
examples/example.wav 100%[==========================================================================>] 29.30M 43.9MB/s 用时 0.7s
2024-01-30 18:07:34 (43.9 MB/s) - 已保存 “examples/example.wav” [30720078/30720078])
--2024-01-30 18:07:34-- https://modelscope.cn/api/v1/models/damo/speech_eres2net-large_speaker-diarization_common/repo?Revision=master&FilePath=examples/example.rttm
正在解析主机 modelscope.cn (modelscope.cn)... 39.101.130.40
正在连接 modelscope.cn (modelscope.cn)|39.101.130.40|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:1329 (1.3K) [application/octet-stream]
正在保存至: “examples/example.rttm”
examples/example.rttm 100%[==========================================================================>] 1.30K --.-KB/s 用时 0s
2024-01-30 18:07:34 (29.3 MB/s) - 已保存 “examples/example.rttm” [1329/1329])
Stage2: Do vad for input wavs...
2024-01-30 18:07:37,343 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:07:37,345 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:07:37,470 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
[2024-01-30 18:07:38,659] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Please install rotary_embedding_torch by:
pip install -U rotary_embedding_torch
Please install rotary_embedding_torch by:
pip install -U rotary_embedding_torch
Please install rotary_embedding_torch by:
pip install -U rotary_embedding_torch
Please install rotary_embedding_torch by:
pip install -U rotary_embedding_torch
2024-01-30 18:07:44,757 - modelscope - INFO - Use user-specified model revision: v2.0.4
2024-01-30 18:07:45,018 - modelscope - INFO - initiate model from /home/winner/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch
2024-01-30 18:07:45,018 - modelscope - INFO - initiate model from location /home/winner/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch.
2024-01-30 18:07:45,019 - modelscope - INFO - initialize model from /home/winner/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch
2024-01-30 18:07:49,164 - modelscope - WARNING - No preprocessor field found in cfg.
2024-01-30 18:07:49,164 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.
2024-01-30 18:07:49,164 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': '/home/winner/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch'}. trying to build by task and model information.
2024-01-30 18:07:49,164 - modelscope - WARNING - No preprocessor key ('funasr', 'voice-activity-detection') found in PREPROCESSOR_MAP, skip building preprocessor.
[INFO]: Start computing VAD...
rtf_avg: 0.225: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00, 3.69s/it]
rtf_avg: 594.604: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:11<00:00, 11.90s/it]
[INFO]: VAD json is prepared in exp/json/vad.json
Stage3: Prepare subsegments info...
[INFO]: Generate sub-segmetns...
[INFO]: Subsegments json is prepared in exp/json/subseg.json
Stage4: Extract speaker embeddings...
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
2024-01-30 18:08:21,239 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:08:21,241 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:08:21,262 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:08:21,264 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:08:21,274 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:08:21,275 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:08:21,362 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:08:21,363 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:08:21,382 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:08:21,384 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:08:21,386 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:08:21,388 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:08:21,394 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
2024-01-30 18:08:21,414 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
2024-01-30 18:08:21,430 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
2024-01-30 18:08:21,486 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
2024-01-30 18:08:21,502 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
2024-01-30 18:08:21,510 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
2024-01-30 18:08:21,716 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:08:21,718 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:08:21,829 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
2024-01-30 18:08:21,835 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:08:21,837 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:08:21,968 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
[2024-01-30 18:08:22,719] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-30 18:08:22,719] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-30 18:08:22,743] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-30 18:08:22,763] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-30 18:08:22,797] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-30 18:08:22,825] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-30 18:08:23,048] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-30 18:08:23,275] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-01-30 18:08:32,879 - modelscope - INFO - Use user-specified model revision: v1.0.0
WARNING: The number of threads exceeds the number of files
WARNING: The number of threads exceeds the number of files
WARNING: The number of threads exceeds the number of files
[INFO] Start computing embeddings...
[INFO] Start computing embeddings...
WARNING: The number of threads exceeds the number of filesWARNING: The number of threads exceeds the number of files
[WARNING] Embeddings has been saved previously. Skip it.
[WARNING] Embeddings has been saved previously. Skip it.
WARNING: The number of threads exceeds the number of files
Stage5: Perform clustering and output sys rttms...
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[INFO] Start clustering...
[INFO] Start clustering...
[INFO] Start clustering...
WARNING: The number of threads exceeds the number of files
WARNING: The number of threads exceeds the number of files
WARNING: The number of threads exceeds the number of files
WARNING: The number of threads exceeds the number of files
WARNING: The number of threads exceeds the number of files
/home/winner/anaconda3/envs/py38-pt200/lib/python3.8/site-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
warnings.warn(
/home/winner/anaconda3/envs/py38-pt200/lib/python3.8/site-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
warnings.warn(
/home/winner/anaconda3/envs/py38-pt200/lib/python3.8/site-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
warnings.warn(
Stage6: Get the final metrics...
Computing DER...
2024-01-30 18:08:53,245 - INFO: Concatenating individual RTTM files...
2024-01-30 18:08:53,285 - INFO: MS: 2.069159, FA: 0.203668, SER: 0.000000, DER: 2.272828
Computing ACC...
error,there is no fileid_sys in ref rttm: output
seg pur error,there is no fileid_sys in ref rttm: %s output
eval_elems_seg error,there is no fileid_sys in ref rttm: %s output
All metrics have been done.
We downloaded the train.tar.gz-part-{a-f}, but the md5 value of the merged file is wrong. We are not sure which file is the wrong one.
Hi, I am Nathan and i am facing some problem with training part.
My env
Centos7.5
#PIP
pytorch-wpe 0.0.1
rotary-embedding-torch 0.5.3
torch 1.12.1+cu113 //To use cuda, I did reinstall torch and torchaudio.
torch-complex 0.4.3
torchaudio 0.12.1+cu113
torchvision 0.13.1+cu113
#rpm
libcudnn8-devel-8.2.0.53-1.cuda11.3.x86_64
libcudnn8-8.2.0.53-1.cuda11.3.x86_64
libnccl-devel-2.9.9-1+cuda11.3.x86_64
libnccl-2.9.9-1+cuda11.3.x86_64
To run a script , I follow 'egs/voxceleb/sv-ecapa/run.sh'
I set 4 gpus. (When i set single gpu, It's not working too)
But I got error as below.
Stage3: Training the speaker model...
WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
Root Cause (first observed failure):
[0]:
time : 2024-02-15_14:32:03
host : e7bcf3a85e2c
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 121547)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
使用 SV 进行声纹验证,一段音频是存在人声的音频,另一段音频几乎没有声音(没有人声)。验证结果应该是低于阈值 0.6,但是结果却是高于0.6。想问下对于模型的识别结果,能获取到判断依据么?另外这个 threshold 一般应该设置多少合适?
damo/speech_campplus_sv_cn_cnceleb_16k
{'score': 0.68535, 'text': 'yes'}
您好,eres2net模型分为两部分,embedding和classifier,但是只提供了提取embedding的预训练模型,是否考虑提供分类器的预训练模型?
/opt/conda/lib/python3.10/site-packages/sklearn/cluster/_kmeans.py:1416: FutureWarning: The default value of n_init
will change from 10 to 'auto' in 1.4. Set the value of n_init
explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
Traceback (most recent call last):
File "/vepfs/code/MossFormer/3D-Speaker/egs/3dspeaker/speaker-diarization/local/cluster_and_postprocess_h5.py", line 93, in audio_only_func_getnums
labels = cluster(embeddings)
File "/vepfs/code/MossFormer/3D-Speaker/speakerlab/process/cluster.py", line 186, in call
labels = self.filter_minor_cluster(labels, X, self.min_cluster_size)
File "/vepfs/code/MossFormer/3D-Speaker/speakerlab/process/cluster.py", line 203, in filter_minor_cluster
major_center = np.stack([x[labels == i].mean(0)
File "/opt/conda/lib/python3.10/site-packages/numpy/core/shape_base.py", line 445, in stack
raise ValueError('need at least one array to stack')
ValueError: need at least one array to stack
During handling of the above exception, another exception occurred:
===========================
运行这块时,labels = cluster(embeddings) # embedding [14, 192]
报了上边的错误;
请问是什么导致的问题,如何解决呢?
1、提出使用200k的说话人进行训练,但是3D-Speaker中只有10000个说话人,请问是还使用了其他数据吗?
2、使用这个模型对CNCeleb的测试集和注册集分别提取embedding,然后再使用项目中的compute_score_metrics.py计算EER,我这边结果是4.08,这样对吗?比给出的结果2.8高出不少呢
from modelscope.pipelines import pipeline
sv_pipeline = pipeline(
task='speaker-verification',
model='damo/speech_campplus_sv_zh-cn_16k-common',
model_revision='v1.0.0'
)
speaker1_a_wav = 'https://modelscope.cn/api/v1/models/damo/speech_campplus_sv_zh-cn_16k-common/repo?Revision=master&FilePath=examples/speaker1_a_cn_16k.wav'
speaker1_b_wav = 'https://modelscope.cn/api/v1/models/damo/speech_campplus_sv_zh-cn_16k-common/repo?Revision=master&FilePath=examples/speaker1_b_cn_16k.wav'
speaker2_a_wav = 'https://modelscope.cn/api/v1/models/damo/speech_campplus_sv_zh-cn_16k-common/repo?Revision=master&FilePath=examples/speaker2_a_cn_16k.wav'
# 相同说话人语音
result = sv_pipeline([speaker1_a_wav, speaker1_b_wav])
print(result)
# 不同说话人语音
result = sv_pipeline([speaker1_a_wav, speaker2_a_wav])
print(result)
# 可以自定义得分阈值来进行识别,阈值越高,判定为同一人的条件越严格
result = sv_pipeline([speaker1_a_wav, speaker2_a_wav], thr=0.31)
print(result)
The old model extracting speaker features is no longer supported
Hello, I use the same model params as your configs in https://github.com/alibaba-damo-academy/3D-Speaker/blob/6f6ed3189a4d1db040586a518c8e5d80f4fc0665/egs/3dspeaker/sv-eres2net/conf/eres2net.yaml, but I get 9.88M. (Yours is 4.6M)
Here is the way I compute the model params:
I'm wondering where the difference is ?
一段电视剧声音,含背景音,几乎区分不出4个说话人,全部是speaker 0
Is there a training environment deployed under Windows?
When applying the module of speaker classification, hundreds of millions of data inference, how to perform batch inference when vad, extraction embedding.
Thanks to the author for his reply and suggestions
Hello, thank you for your open source of CAM++ model. The results are impressive!
I tried to train CAM++, but found it a little bit slower than ResNet34. The same training configs are used for both models (2*A100).
The interesting thing is that after exporting the models into onnx types and infer them using onnxruntime in CPUs,I can still see that CAM++ is about 3 times faster than ResNet34 (about 1/3 in rtf), which is consistent with your conclusion in your recent PR on 20230420.
My question is that do you have the same training phenomenon as me that CAM++ is slower than ResNet34? And how do you explain this phenomenon? lower inference rtf in cpu while lower training speed in gpu?
Stage5: Get the final metrics...
Refrttm.list is not detected. Can't calculate the result
I read the FAQ on page. But I still find missing some transcripts, for example, the speaker 3D_SPK_00001
does not exist in transcription/train_transcription
or transcription/test_transcription
.
I missed something?
Or it just provides some transcripts.
cnceleb的cam++ config中num_class是否应该是2793。
https://github.com/alibaba-damo-academy/3D-Speaker/blob/main/egs/cnceleb/sv-cam%2B%2B/conf/cam%2B%2B.yaml
你好,现有自己的数据集,如何用这些个模型跑自己的数据集?3dspeaker 数据集优的文件自己的数据集并没有,例如trials文件等,求教求教!
I’ve been trying to train sv-rdino, my code did report such an error at runtime:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [2048]] is at version 3;
How should we solve this problem?
The spectral clustering in speakerlab/process/cluster.py, the following code is used to estimate the number of speakers
lambda_gap_list = self.getEigenGaps(
lambdas[self.min_num_spks - 1:self.max_num_spks + 1])
num_of_spk = np.argmax(lambda_gap_list) + self.min_num_spks
But in other related projects, the following code is used to estimate the number of speakers
num_spks = num_spks if num_spks is not None \
else cp.argmax(cp.diff(eig_values[:max_num_spks + 1])) + 1
num_spks = max(num_spks, min_num_spks)
# another
lambda_gap_list = self.getEigenGaps(lambdas[1 : self.max_num_spkrs])
num_of_spk = (
np.argmax(
lambda_gap_list[
: min(self.max_num_spkrs, len(lambda_gap_list))
]
)
if lambda_gap_list
else 0
) + 2
I would like to know what is the theoretical basis for your design? If the number of speakers' sentences is uneven, such as if a speaker speaks very little, is this estimation still valid? Perhaps you can provide relevant information? Thank you in advance for your answer.
我发现在准备cnceleb的时候,flac2wav 那一步,在local下没有flac2wav.py 文件,我使用sv-ecapa下的文件的时候发现有部分flac无法转换成wav文件
用脚本处理数据,发现一条0kb大小的数据3dspeaker/train/3D_SPK_00014/3D_SPK_00014_008_Device06_Distance08_Dialect00.wav
I noticed that you have released the ERes2Net-Large-200k-Spkrs model on modelscope, can you also release the base model of ERes2Net(trained with 200k speakers)?
比如3D_SPK_07854_005_Device03_Distance03_Dialect09.wav
Device、Distance、Dialect的数字标签分别代表什么?论文里也没说,所以想咨询下。
请问 Apache-2.0 协议涵盖的范围是否包括代码、语音语料库和预训练模型?
我最近在复习您的项目,我用voxceleb2 训练dino,但是eer刚开始几轮只有14%。我不确定这是不是正常的,您可以给我一份您的训练日志吗。非常感谢
在多人说话重叠的场景是否可以分离出指定说话人的声音?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.