modelscope / 3d-speaker Goto Github PK

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

License: Apache License 2.0

Shell 28.30% Python 52.52% Perl 15.37% CMake 0.27% C++ 3.54%

campplus speaker-diarization speaker-verification voxceleb 3d-speaker eres2net rdino language-identification modelscope cnceleb

3d-speaker's Issues

damo/speech_eres2net_sv_zh-cn_16k-common 模型不支持输入numpy数组

隔壁cam++推理时，可以直接输入提取好的音频numpy数组进行推理。
希望eres2net也可以统一加上，谢谢

compute_score_metrics.py 输入参数？

speaklab/bin下的compute_score_metrics.py输入输出参数具体是什么呢？有没有样例参考参考。。感谢~

about the result

我复现了一遍代码，我用dino框架，然后mutil crop ：local 2条（2s) global 1条（3s) ecapa(512). 没有用rdino。最终的结果是5.0 请问算是一个正常的结果吗

fine-tune

您好，我再训练完dino之后想用label数据fine-tune，如何加载之前的模型.pth

Inconsistent Performance and Loss when Resuming Training

Thank you for your excellent work. 🙂

We have observed that whenever we resume training with a different number of epochs after training completion, the loaded historical model exhibits significantly lower accuracy compared to the corresponding epoch during the original training. For instance, when loading a model trained for 100 epochs, its performance is only comparable to that of a model trained for 30 epochs.

This inconsistency in performance after resuming training poses a challenge for us to continue training from a checkpoint and obtain the desired results.

请问为在训练CAM++模型时使用多卡训练显示Reducer buckets have been rebuilt in this iteration是正常的吗

is speech_campplus_speaker-diarization_common onnx model available?

ValueError: need at least one array to stack

/opt/conda/lib/python3.10/site-packages/sklearn/cluster/_kmeans.py:1416: FutureWarning: The default value of n_init will change from 10 to 'auto' in 1.4. Set the value of n_init explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
Traceback (most recent call last):
File "/vepfs/code/MossFormer/3D-Speaker/egs/3dspeaker/speaker-diarization/local/cluster_and_postprocess_h5.py", line 93, in audio_only_func_getnums
labels = cluster(embeddings)
File "/vepfs/code/MossFormer/3D-Speaker/speakerlab/process/cluster.py", line 186, in call
labels = self.filter_minor_cluster(labels, X, self.min_cluster_size)
File "/vepfs/code/MossFormer/3D-Speaker/speakerlab/process/cluster.py", line 203, in filter_minor_cluster
major_center = np.stack([x[labels == i].mean(0)
File "/opt/conda/lib/python3.10/site-packages/numpy/core/shape_base.py", line 445, in stack
raise ValueError('need at least one array to stack')
ValueError: need at least one array to stack

During handling of the above exception, another exception occurred:

===========================
运行这块时，labels = cluster(embeddings) # embedding [14, 192]
报了上边的错误；
请问是什么导致的问题，如何解决呢？

为什么我在训练sv-cam++的时候loss越来越大

Error occurred during "bash run.sh" for speaker diarization

Hi My name is Nathan. And i try to test 3d-speaker to get rttm from pretrained model on model scope.
But i get error as below.

(3D-Speaker) [asr@0419bb3cf325 speaker-diarization]$ bash run.sh
Stage 1: Prepare input wavs...
--2024-02-05 09:07:39-- https://modelscope.cn/api/v1/models/damo/speech_eres2net-large_speaker-diarization_common/repo?Revision=master&FilePath=examples/2speakers_example.wav
Resolving modelscope.cn (modelscope.cn)... 39.101.130.40
Connecting to modelscope.cn (modelscope.cn)|39.101.130.40|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2528044 (2.4M) [application/octet-stream]
Saving to: 'examples/2speakers_example.wav'

100%[===========================================================================>] 2,528,044 831KB/s in 3.0s

2024-02-05 09:07:43 (831 KB/s) - 'examples/2speakers_example.wav' saved [2528044/2528044]

--2024-02-05 09:07:43-- https://modelscope.cn/api/v1/models/damo/speech_eres2net-large_speaker-diarization_common/repo?Revision=master&FilePath=examples/2speakers_example.rttm
Resolving modelscope.cn (modelscope.cn)... 39.101.130.40
Connecting to modelscope.cn (modelscope.cn)|39.101.130.40|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 380 [application/octet-stream]
Saving to: 'examples/2speakers_example.rttm'

100%[===========================================================================>] 380 --.-K/s in 0s

2024-02-05 09:07:44 (40.0 MB/s) - 'examples/2speakers_example.rttm' saved [380/380]

Stage2: Do vad for input wavs...
2024-02-05 09:07:46,885 - modelscope - INFO - PyTorch version 1.13.1 Found.
2024-02-05 09:07:46,886 - modelscope - INFO - Loading ast index from /home/asr/.cache/modelscope/ast_indexer
2024-02-05 09:07:47,056 - modelscope - INFO - Updating the files for the changes of local files, first time updating will take longer time! Please wait till updating done!
2024-02-05 09:07:47,083 - modelscope - INFO - AST-Scanning the path "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/modelscope" with the following sub folders ['models', 'metrics', 'pipelines', 'preprocessors', 'trainers', 'msdatasets', 'exporters']
2024-02-05 09:08:18,037 - modelscope - INFO - Scanning done! A number of 964 components indexed or updated! Time consumed 30.954344987869263s
2024-02-05 09:08:18,114 - modelscope - INFO - Loading done! Current index file version is 1.12.0, with md5 ccb085697b83dbefd09232fac3402a63 and a total number of 964 components indexed
Please install rotary_embedding_torch by:
pip install -U rotary_embedding_torch
Please install rotary_embedding_torch by:
pip install -U rotary_embedding_torch
Please Requires the ffmpeg CLI and ffmpeg-python package to be installed.
Please install rotary_embedding_torch by:
pip install -U rotary_embedding_torch
Please install rotary_embedding_torch by:
pip install -U rotary_embedding_torch
2024-02-05 09:08:22,477 - modelscope - WARNING - Model revision not specified, use revision: v2.0.4
2024-02-05 09:08:22,825 - modelscope - INFO - initiate model from /home/asr/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch
2024-02-05 09:08:22,826 - modelscope - INFO - initiate model from location /home/asr/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch.
2024-02-05 09:08:22,827 - modelscope - INFO - initialize model from /home/asr/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch
2024-02-05 09:08:22,874 - modelscope - WARNING - No preprocessor field found in cfg.
2024-02-05 09:08:22,875 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.
2024-02-05 09:08:22,875 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': '/home/asr/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch'}. trying to build by task and model information.
2024-02-05 09:08:22,875 - modelscope - WARNING - No preprocessor key ('funasr', 'voice-activity-detection') found in PREPROCESSOR_MAP, skip building preprocessor.
2024-02-05 09:08:22,876 - modelscope - INFO - cuda is not available, using cpu instead.
[INFO]: Start computing VAD...
rtf_avg: 0.043: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.22it/s]
Traceback (most recent call last):
File "local/voice_activity_detection.py", line 90, in
main()
File "local/voice_activity_detection.py", line 71, in main
for vad_t in vad_time['text']:
TypeError: list indices must be integers or slices, not str

if i print "vad_time", I get check
[{'key': 'rand_key_2yW4Acq9GFz6Y', 'value': [[5240, 29010], [29290, 37360], [37640, 67570], [67860, 78980]]}]

I don't understand meaning of text.
Please check this problem.
Thank you.

Methods for fine-tuning of pretrained models in modelscope

Hello, thank you for the wonderful repository! It really helped.
Currently, our team is trying to fine-tune ERes2Net-200k published in modelscope using a large amount of speech data. As I was not able to fine-tune properly, I think that several parameters within the configuration need to be modified for the task. Could you please share those details? If my fine-tuning is successful with good results, I will share the methodologies for the community.

关于ERes2Net的250k新模型

您好：
首先特别感谢您在modelscope中贡献的模型及代码。
我看到modelscope最近更新了ERes2Net的250k模型：“speech_eres2net_base_250k_sv_zh-cn_16k-common”
下面是使用该模型在本地推理的代码：

model_id=damo/speech_eres2net_base_250k_sv_zh-cn_16k-common
python speakerlab/bin/infer_sv.py --model_id $model_id --wavs $wav_path

但是我发现这个模型缺少了一些必要命令，如：

ERes2Net_Large_3D_Speaker = {
    'obj': 'speakerlab.models.eres2net.ResNet.ERes2Net',
    'args': {
        'feat_dim': 80,
        'embedding_size': 512,
        'm_channels': 64,
    }

和
supports = {...}
希望得到您的帮助，非常感谢~

EResNet result on VoxCeleb is not comparable

I ran the exact same script for the EResNet experiment on VoxCeleb. The EER and minDCF I got is 1.0105 and 0.1146, which is not comparable to the paper. The only difference is that I trained the model on 4 A100 machines, but I doubt that is the reason behind. Can you please provide the train.log and train_epoch.log files?

I also notice that in prepare_data_csv.csv, the default segment duration is 4 seconds, but in conf/eres2net.yaml it's 3 seconds. May I ask why is that?

[BUG] 提fbank的频率和重采样之后的频率不一致

https://github.com/alibaba-damo-academy/3D-Speaker/blob/b537e3734bc502529bbdb921dca784cb9f67b1b5/speakerlab/bin/infer_sv.py#L165-L176

提fbank的频率始终为16kHz，应该等于重采样之后的频率

ERes2Net模型 load报错

在用torch.load 载入模型speech_eres2net_sv_zh-cn_16k-common时，报错_pickle.UnpicklingError: invalid load key, '\x08'.。请问下这个有遇到过吗？环境信息：Python 3.10.9、torch 1.12.1。
而用同样的代码载入speech_campplus_sv_zh-cn_16k-common这个模型就没问题

transcription

3d_speaker里有些audio clip没有相应的转录文本

speaker-diarization需要哪个版本的Funasr，第六步无法输出

使用了最新的Funasr==1.0.4，需要补充model_revision和修改vad_pipeline(wpath)，但是在执行第六步的时候，会出现这样的报错，换成旧的0.8.8也是无法执行

Stage 1: Prepare input wavs...
--2024-01-30 18:07:32--  https://modelscope.cn/api/v1/models/damo/speech_eres2net-large_speaker-diarization_common/repo?Revision=master&FilePath=examples/example.wav
正在解析主机 modelscope.cn (modelscope.cn)... 39.101.130.40
正在连接 modelscope.cn (modelscope.cn)|39.101.130.40|:443... 已连接。
已发出 HTTP 请求，正在等待回应... 200 OK
长度：30720078 (29M) [application/octet-stream]
正在保存至: “examples/example.wav”

examples/example.wav                   100%[==========================================================================>]  29.30M  43.9MB/s  用时 0.7s    

2024-01-30 18:07:34 (43.9 MB/s) - 已保存 “examples/example.wav” [30720078/30720078])

--2024-01-30 18:07:34--  https://modelscope.cn/api/v1/models/damo/speech_eres2net-large_speaker-diarization_common/repo?Revision=master&FilePath=examples/example.rttm
正在解析主机 modelscope.cn (modelscope.cn)... 39.101.130.40
正在连接 modelscope.cn (modelscope.cn)|39.101.130.40|:443... 已连接。
已发出 HTTP 请求，正在等待回应... 200 OK
长度：1329 (1.3K) [application/octet-stream]
正在保存至: “examples/example.rttm”

examples/example.rttm                  100%[==========================================================================>]   1.30K  --.-KB/s  用时 0s      

2024-01-30 18:07:34 (29.3 MB/s) - 已保存 “examples/example.rttm” [1329/1329])

Stage2: Do vad for input wavs...
2024-01-30 18:07:37,343 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:07:37,345 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:07:37,470 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
[2024-01-30 18:07:38,659] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Please install rotary_embedding_torch by: 
 pip install -U rotary_embedding_torch
Please install rotary_embedding_torch by: 
 pip install -U rotary_embedding_torch
Please install rotary_embedding_torch by: 
 pip install -U rotary_embedding_torch
Please install rotary_embedding_torch by: 
 pip install -U rotary_embedding_torch
2024-01-30 18:07:44,757 - modelscope - INFO - Use user-specified model revision: v2.0.4
2024-01-30 18:07:45,018 - modelscope - INFO - initiate model from /home/winner/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch
2024-01-30 18:07:45,018 - modelscope - INFO - initiate model from location /home/winner/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch.
2024-01-30 18:07:45,019 - modelscope - INFO - initialize model from /home/winner/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch
2024-01-30 18:07:49,164 - modelscope - WARNING - No preprocessor field found in cfg.
2024-01-30 18:07:49,164 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.
2024-01-30 18:07:49,164 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': '/home/winner/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch'}. trying to build by task and model information.
2024-01-30 18:07:49,164 - modelscope - WARNING - No preprocessor key ('funasr', 'voice-activity-detection') found in PREPROCESSOR_MAP, skip building preprocessor.
[INFO]: Start computing VAD...
rtf_avg: 0.225: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.69s/it]
rtf_avg: 594.604: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:11<00:00, 11.90s/it]
[INFO]: VAD json is prepared in exp/json/vad.json
Stage3: Prepare subsegments info...
[INFO]: Generate sub-segmetns...
[INFO]: Subsegments json is prepared in exp/json/subseg.json
Stage4: Extract speaker embeddings...
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
2024-01-30 18:08:21,239 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:08:21,241 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:08:21,262 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:08:21,264 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:08:21,274 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:08:21,275 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:08:21,362 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:08:21,363 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:08:21,382 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:08:21,384 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:08:21,386 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:08:21,388 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:08:21,394 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
2024-01-30 18:08:21,414 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
2024-01-30 18:08:21,430 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
2024-01-30 18:08:21,486 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
2024-01-30 18:08:21,502 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
2024-01-30 18:08:21,510 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
2024-01-30 18:08:21,716 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:08:21,718 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:08:21,829 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
2024-01-30 18:08:21,835 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:08:21,837 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:08:21,968 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
[2024-01-30 18:08:22,719] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-30 18:08:22,719] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-30 18:08:22,743] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-30 18:08:22,763] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-30 18:08:22,797] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-30 18:08:22,825] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-30 18:08:23,048] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-30 18:08:23,275] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-01-30 18:08:32,879 - modelscope - INFO - Use user-specified model revision: v1.0.0
WARNING: The number of threads exceeds the number of files
WARNING: The number of threads exceeds the number of files
WARNING: The number of threads exceeds the number of files
[INFO] Start computing embeddings...
[INFO] Start computing embeddings...
WARNING: The number of threads exceeds the number of filesWARNING: The number of threads exceeds the number of files

[WARNING] Embeddings has been saved previously. Skip it.
[WARNING] Embeddings has been saved previously. Skip it.
WARNING: The number of threads exceeds the number of files
Stage5: Perform clustering and output sys rttms...
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
[INFO] Start clustering...
[INFO] Start clustering...
[INFO] Start clustering...
WARNING: The number of threads exceeds the number of files
WARNING: The number of threads exceeds the number of files
WARNING: The number of threads exceeds the number of files
WARNING: The number of threads exceeds the number of files
WARNING: The number of threads exceeds the number of files
/home/winner/anaconda3/envs/py38-pt200/lib/python3.8/site-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  warnings.warn(
/home/winner/anaconda3/envs/py38-pt200/lib/python3.8/site-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  warnings.warn(
/home/winner/anaconda3/envs/py38-pt200/lib/python3.8/site-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  warnings.warn(
Stage6: Get the final metrics...
Computing DER...
2024-01-30 18:08:53,245 - INFO: Concatenating individual RTTM files...
2024-01-30 18:08:53,285 - INFO: MS: 2.069159, FA: 0.203668, SER: 0.000000, DER: 2.272828
Computing ACC...
error,there is no fileid_sys in ref rttm: output
seg pur error,there is no fileid_sys in ref rttm: %s output
eval_elems_seg error,there is no fileid_sys in ref rttm: %s output
All metrics have been done.

Could you provide the md5 value of train.tar.gz-part-{a-f} ?

We downloaded the train.tar.gz-part-{a-f}, but the md5 value of the merged file is wrong. We are not sure which file is the wrong one.

Problem with training part.

Hi, I am Nathan and i am facing some problem with training part.

My env
Centos7.5
#PIP
pytorch-wpe 0.0.1
rotary-embedding-torch 0.5.3
torch 1.12.1+cu113 //To use cuda, I did reinstall torch and torchaudio.
torch-complex 0.4.3
torchaudio 0.12.1+cu113
torchvision 0.13.1+cu113

#rpm
libcudnn8-devel-8.2.0.53-1.cuda11.3.x86_64
libcudnn8-8.2.0.53-1.cuda11.3.x86_64

libnccl-devel-2.9.9-1+cuda11.3.x86_64
libnccl-2.9.9-1+cuda11.3.x86_64

To run a script , I follow 'egs/voxceleb/sv-ecapa/run.sh'
I set 4 gpus. (When i set single gpu, It's not working too)
But I got error as below.

Stage3: Training the speaker model...
WARNING:torch.distributed.run:

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

2024-02-15 14:31:58,001 - INFO: Use GPU: 3 for training.
2024-02-15 14:31:58,003 - INFO: Use GPU: 2 for training.
2024-02-15 14:31:58,009 - INFO: Use GPU: 1 for training.
2024-02-15 14:31:58,011 - INFO: Use GPU: 0 for training.
Traceback (most recent call last):
File "speakerlab/bin/train.py", line 176, in
main()
File "speakerlab/bin/train.py", line 60, in main
model = torch.nn.parallel.DistributedDataParallel(model)
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 646, in init
Traceback (most recent call last):
File "speakerlab/bin/train.py", line 176, in
_verify_param_shape_across_processes(self.process_group, parameters)
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/utils.py", line 89, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1191, unhandled system error, NCCL version 2.10.3
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer.
main()
File "speakerlab/bin/train.py", line 60, in main
model = torch.nn.parallel.DistributedDataParallel(model)
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 646, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/utils.py", line 89, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1191, unhandled system error, NCCL version 2.10.3
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer.
Traceback (most recent call last):
File "speakerlab/bin/train.py", line 176, in
main()
File "speakerlab/bin/train.py", line 60, in main
model = torch.nn.parallel.DistributedDataParallel(model)
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 646, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/utils.py", line 89, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1191, unhandled system error, NCCL version 2.10.3
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer.
Traceback (most recent call last):
File "speakerlab/bin/train.py", line 176, in
main()
File "speakerlab/bin/train.py", line 60, in main
model = torch.nn.parallel.DistributedDataParallel(model)
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 646, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/utils.py", line 89, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1191, unhandled system error, NCCL version 2.10.3
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer.
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 121550 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 121547) of binary: /home/asr/miniconda3/envs/3D-Speaker/bin/python
Traceback (most recent call last):
File "/home/asr/miniconda3/envs/3D-Speaker/bin/torchrun", line 8, in
sys.exit(main())
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper
return f(*args, kwargs)
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main
run(args)
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run
elastic_launch(
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call**
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

speakerlab/bin/train.py FAILED

Failures:
[1]:
time : 2024-02-15_14:32:03
host : e7bcf3a85e2c
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 121548)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2024-02-15_14:32:03
host : e7bcf3a85e2c
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 121549)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2024-02-15_14:32:03
host : e7bcf3a85e2c
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 121547)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

关于 sv 识别结果的问题

问题描述

使用 SV 进行声纹验证，一段音频是存在人声的音频，另一段音频几乎没有声音（没有人声）。验证结果应该是低于阈值 0.6，但是结果却是高于0.6。想问下对于模型的识别结果，能获取到判断依据么？另外这个 threshold 一般应该设置多少合适？

使用模型

damo/speech_campplus_sv_cn_cnceleb_16k

识别结果

{'score': 0.68535, 'text': 'yes'}

分类器？

您好，eres2net模型分为两部分，embedding和classifier，但是只提供了提取embedding的预训练模型，是否考虑提供分类器的预训练模型？

DDP WARNING

您好，非常感谢您补充了ecapatdnn，我在训练egs中voxceleb ecapatdnn中遇到了这个warning，我不知道是因为什么

我能问一下，前两天在达摩院重新上传的cam++的预训练模型与没重新上传之前的预训练模型有什么不一样吗，因为我在这个模型上加了一些模块，发现效果还没之前那个版本的好

ValueError: need at least one array to stack

During handling of the above exception, another exception occurred:

===========================
运行这块时，labels = cluster(embeddings) # embedding [14, 192]
报了上边的错误；
请问是什么导致的问题，如何解决呢？

speech_eres2net_sv_zh-cn_16k-common预训练模型相关问题

1、提出使用200k的说话人进行训练，但是3D-Speaker中只有10000个说话人，请问是还使用了其他数据吗？
2、使用这个模型对CNCeleb的测试集和注册集分别提取embedding，然后再使用项目中的compute_score_metrics.py计算EER，我这边结果是4.08，这样对吗？比给出的结果2.8高出不少呢

案例写的是比较两个人音频是否为同一个人，那如果与很多人的音频库比较怎么操作呢

from modelscope.pipelines import pipeline
sv_pipeline = pipeline(
    task='speaker-verification',
    model='damo/speech_campplus_sv_zh-cn_16k-common',
    model_revision='v1.0.0'
)
speaker1_a_wav = 'https://modelscope.cn/api/v1/models/damo/speech_campplus_sv_zh-cn_16k-common/repo?Revision=master&FilePath=examples/speaker1_a_cn_16k.wav'
speaker1_b_wav = 'https://modelscope.cn/api/v1/models/damo/speech_campplus_sv_zh-cn_16k-common/repo?Revision=master&FilePath=examples/speaker1_b_cn_16k.wav'
speaker2_a_wav = 'https://modelscope.cn/api/v1/models/damo/speech_campplus_sv_zh-cn_16k-common/repo?Revision=master&FilePath=examples/speaker2_a_cn_16k.wav'
# 相同说话人语音
result = sv_pipeline([speaker1_a_wav, speaker1_b_wav])
print(result)
# 不同说话人语音
result = sv_pipeline([speaker1_a_wav, speaker2_a_wav])
print(result)
# 可以自定义得分阈值来进行识别，阈值越高，判定为同一人的条件越严格
result = sv_pipeline([speaker1_a_wav, speaker2_a_wav], thr=0.31)
print(result)

how to extract embeddings by ecapa or resnet

The old model extracting speaker features is no longer supported

How to compute the ERes2Net model param?

Hello, I use the same model params as your configs in https://github.com/alibaba-damo-academy/3D-Speaker/blob/6f6ed3189a4d1db040586a518c8e5d80f4fc0665/egs/3dspeaker/sv-eres2net/conf/eres2net.yaml, but I get 9.88M. (Yours is 4.6M)

Here is the way I compute the model params:

I'm wondering where the difference is ?

人声分离效果较差

一段电视剧声音，含背景音，几乎区分不出4个说话人，全部是speaker 0

windows

Is there a training environment deployed under Windows?

请问如果我要用CNceleb数据集训练cam++说话人验证模型，那我应该用哪个脚本

Inference acceleration

When applying the module of speaker classification, hundreds of millions of data inference, how to perform batch inference when vad, extraction embedding.

Thanks to the author for his reply and suggestions

Low GPU Training speed of CAM++?

Hello, thank you for your open source of CAM++ model. The results are impressive!

I tried to train CAM++, but found it a little bit slower than ResNet34. The same training configs are used for both models (2*A100).
The interesting thing is that after exporting the models into onnx types and infer them using onnxruntime in CPUs，I can still see that CAM++ is about 3 times faster than ResNet34 (about 1/3 in rtf), which is consistent with your conclusion in your recent PR on 20230420.

My question is that do you have the same training phenomenon as me that CAM++ is slower than ResNet34? And how do you explain this phenomenon? lower inference rtf in cpu while lower training speed in gpu?

Exmaple can not run

Stage5: Get the final metrics...
Refrttm.list is not detected. Can't calculate the result

Missing transcripts?

I read the FAQ on page. But I still find missing some transcripts, for example, the speaker 3D_SPK_00001 does not exist in transcription/train_transcription or transcription/test_transcription.
I missed something?
Or it just provides some transcripts.

cnceleb 训练配置问题

cnceleb的cam++ config中num_class是否应该是2793。
https://github.com/alibaba-damo-academy/3D-Speaker/blob/main/egs/cnceleb/sv-cam%2B%2B/conf/cam%2B%2B.yaml

如何用这些个模型跑自己的数据集？

你好，现有自己的数据集，如何用这些个模型跑自己的数据集？3dspeaker 数据集优的文件自己的数据集并没有，例如trials文件等，求教求教！

sv-rdino - RuntimeError

I’ve been trying to train sv-rdino, my code did report such an error at runtime:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [2048]] is at version 3;

How should we solve this problem？

关于learning rate

我用了你的cosine schedule之后，我发现我的learning rate 每轮都在增加。但是不应该是从0.2cosine schedule减小到0.00005吗

这个是我的learning rate变化到结果

The problem about the selection of num_of_spk in speaker-diarization

The spectral clustering in speakerlab/process/cluster.py, the following code is used to estimate the number of speakers

lambda_gap_list = self.getEigenGaps(
                lambdas[self.min_num_spks - 1:self.max_num_spks + 1])
num_of_spk = np.argmax(lambda_gap_list) + self.min_num_spks

But in other related projects, the following code is used to estimate the number of speakers

num_spks = num_spks if num_spks is not None \
                else cp.argmax(cp.diff(eig_values[:max_num_spks + 1])) + 1
num_spks = max(num_spks, min_num_spks)

# another
lambda_gap_list = self.getEigenGaps(lambdas[1 : self.max_num_spkrs])

num_of_spk = (
    np.argmax(
        lambda_gap_list[
            : min(self.max_num_spkrs, len(lambda_gap_list))
        ]
    )
    if lambda_gap_list
    else 0
) + 2

I would like to know what is the theoretical basis for your design? If the number of speakers' sentences is uneven, such as if a speaker speaks very little, is this estimation still valid? Perhaps you can provide relevant information? Thank you in advance for your answer.

我运行的是sv-cam++中的run.sh，只用了一个GPU，到Stage3的时候报错，是python的问题吗？求赐教。
Stage3: Training the speaker model...
/root/miniconda3/envs/3D-Speaker/bin/python: can't open file 'speakerlab/bin/train.py': [Errno 20] Not a directory
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 3209) of binary: /root/miniconda3/envs/3D-Speaker/bin/python
Traceback (most recent call last):
File "/root/miniconda3/envs/3D-Speaker/bin/torchrun", line 8, in
sys.exit(main())
File "/root/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, kwargs)
File "/root/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/root/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/root/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in call**
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

speakerlab/bin/train.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2023-11-02_16:34:10
host : autodl-container-9ee2119752-04687cb0
rank : 0 (local_rank: 0)
exitcode : 2 (pid: 3209)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

modelscope / 3d-speaker Goto Github PK

3d-speaker's Issues

speakerlab/bin/train.py FAILED

问题描述

使用模型

识别结果

speakerlab/bin/train.py FAILED

Failures: <NO_OTHER_FAILURES>

Root Cause (first observed failure): [0]: time : 2023-11-02_16:34:10 host : autodl-container-9ee2119752-04687cb0 rank : 0 (local_rank: 0) exitcode : 2 (pid: 3209) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Recommend Projects

Recommend Topics

Recommend Org

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2023-11-02_16:34:10
host : autodl-container-9ee2119752-04687cb0
rank : 0 (local_rank: 0)
exitcode : 2 (pid: 3209)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html