Giter Site home page Giter Site logo

asv-subtools's People

Contributors

1017549629 avatar boneyag avatar snowdar avatar sssyousen avatar tony-xie-182 avatar wangers avatar zengchang233 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

asv-subtools's Issues

train the standard xvector model on VoxCeleb1 trainset

I try to train the standard xvector model on VoxCeleb1 trainset using the script runVoxceleb.sh with 4 GPUs. And I completely use the default parameters in runStandardXvector-voxceleb1.py except for the weight decay changed to 5e-1 (I also tried 3e-1), but the result EER is only 3.531% for 21 epoch far embedding with PLDA backend. Unable to achieve 3.028% reported at the bottom of runStandardXvector-voxceleb1.py. Is there something I overlooked or what I need to modify?

Is utt2lang or utt2spk used for lid?

Hi,
I want to validate the recipe "ap-olr2020-baseline". I have some issues at scoring.
I want to know which file is used for language file, utt2lang or utt2spk?
Regards,
Luke

多卡运行报错join()

在使用在线模式训练的时候,使用多卡的时候报错显示join() got an unexpected keyword argument 'throw_on_early_termination',尝试去除throw_on_early_termination参数后,不报错但是模型不训练
报错信息如下,请问有人碰到类似的问题吗,我看作者说有交流群,能否拉我一下,谢谢各位 :)
image

online训练提示标签越界问题:Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"`

首先非常感谢这么优秀的开源项目。
使用在线训练脚本subtools/pytorch/lanucher/*_online.py训练时,报出标签越界问题。
经排查产生该问题的原因如下:
subtools/pytorch/pipeline/preprocess_wav_egs.sh中调用的subtools/pytorch/pipeline/onestep/get_raw_wav_chunk.py中的get_chunk_egs函数是先对整个dataset生成utt2spk_int文件(dataset.generate("utt2spk_int") ),然后划分trainset, valid集合(trainset, valid = dataset.split(args.valid_num_utts, args.valid_split_type))。当某个说话人仅有1条utt且runEcapaXvector_online.py中limit_utts=1时,说话人可能就被全部划分进valid集合,从而导致train的实际人数减少,但标签最大值仍为整个数据集的最大值。

请教:语种分类为什么不直接用softmax输出?

你好!
我看olr2021-baseline 语种分类用了xvector embedding + LDA + LR 的方法,但是xvector在训练的时候用 softmax 输出每个语种的概率 计算CE进行训练的。为什么在inference的时候不直接用xvector 的 softmax的输出?

谢谢!

发现了一个逻辑漏洞/found a bug

subtools/pytorch/libs/supports/utils.py line 319
这行中的方法的作用是将传入的字典与默认字典做对比,将传入字典与默认的不同的部分赋值给默认字典,然后返回默认字典。但是如果这个方法的最后两个布尔参数均为false,那么如果你传入的字典中包含了默认字典中不存在的键值,则既不会报错,也不会将不存在的键值赋值给默认字典并返回。那么你将得到一个只包含默认字段的字典,而你定义的新字段(可能是你修改了一些模型或方法,增加定义了一些新变量),将采用你在方法或模型中设定好的默认值。而你在主程序,如run****.py中定义的参数,则不会传入那些方法或模型中。

At least one of this function's last two variables which are 'force_check' and 'support_unkown' should be true. So that the function can raise an error or refine the new parameters of the individual definition for the default dictionary which will be return.
Or, the params you defined in the main program such as run****.py will be not passed into the methods or classes.

希望作者能看到这条问题,并对自己的程序作出修改。

多卡GPU运行失败

1

2

Environment:
Pytorch version: 1.10.0
Cuda version: 11.1
nccl version: 2.10.3
driver version: 470.63.01
OS version: Ubuntu 18.04

单卡可以正常训练,多卡失败

run Voxceleb Recipe [Speaker Recognition]

When running Voxceleb Recipe [Speaker Recognition], I met the error as shown below. I am not sure where the codes in "runSnowdarXvector-extended-spec-am.py" wrong to make this type error. Thank you for your help!

(xmuspeech) tcao7@c06:~/kaldi/egs/xmuspeech/voxceleb1$ subtools/runPytorchLauncher.sh runSnowdarXvector-extended-spec-am.py --stage=0
Traceback (most recent call last):
File "runSnowdarXvector-extended-spec-am.py", line 282, in
utils.init_multi_gpu_training(args.gpu_id, args.multi_gpu_solution, args.port)
TypeError: init_multi_gpu_training() takes from 0 to 2 positional arguments but 3 were given

the num_targets and the max label in train.egs.csv are not equal

Hi,
I try to run the CNCeleb recipe, but a RuntimeError appears:

#### Training will run for 6 epochs.
Traceback (most recent call last):
  File "/home/ubuntu/kaldi/egs/xmuspeech/sre/subtools/pytorch/libs/training/trainer.py", line 283, in run
    loss, acc = self.train_one_batch(batch)
  File "/home/ubuntu/kaldi/egs/xmuspeech/sre/subtools/pytorch/libs/training/trainer.py", line 182, in train_one_batch
    loss = model.get_loss(model_forward(inputs), targets)
  File "/home/ubuntu/kaldi/egs/xmuspeech/sre/subtools/pytorch/libs/support/utils.py", line 157, in wrapper
    return function(self, *transformed)
  File "/home/ubuntu/kaldi/egs/xmuspeech/sre/exp/SEResnet34_am_train_fbank40/config/resnet-se-xvector.py", line 559, in get_loss
    return self.loss(inputs, targets)
  File "/home/ubuntu/miniconda3/envs/subtools/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ubuntu/kaldi/egs/xmuspeech/sre/subtools/pytorch/libs/nnet/loss.py", line 360, in forward
    return self.loss_function(outputs/self.t, targets) + self.ring_loss * ring_loss
  File "/home/ubuntu/miniconda3/envs/subtools/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ubuntu/miniconda3/envs/subtools/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 1150, in forward
    return F.cross_entropy(input, target, weight=self.weight,
  File "/home/ubuntu/miniconda3/envs/subtools/lib/python3.8/site-packages/torch/nn/functional.py", line 2846, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
/opt/conda/conda-bld/pytorch_1634272172048/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [0,0,0], thread: [55,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.

That means the num_speakers output of FC classifier is less than the label.
And I find the num_targets in exp/egs/train_sequential/info is 2687, while the max label in train.egs.csv is 2711.
So could you please tell me which script generates the exp/egs/train_sequential/info/num_targets?

pytorch dataloader

使用dataloader 多线程,但是多线程处于D状态,CPU读取数据慢,导师GPU util 为0, 请问这种情况怎么解决

执行subtools/scoreSets.sh出错

Hi,
Snowdar!
我在ap-olr2020-baseline中,通过run_pytorch_xvector.py生成了xvector,现在我想用subtools/scoreSets.sh对我的enrollsets和testsets进行打分(我的数据集都是一个speaker只有一个utterance),但现在出现了如下错误提示:
**[Auto find] Your vectortype is xvector

[Notice] It will set the default config task3_enroll[task3_enroll task3_enroll task3_test] for lda, submean and whiten, if used.
allsets:task3_enroll task3_test task3_enroll task3_enroll task3_enroll task3_test task3_enroll task3_enroll task3_enroll task3_test
[ lr ]
ivector-normalize-length --scaleup=false scp:exp/pytorch_xvector/far_epoch_21/task3_enroll/xvector.scp ark:exp/pytorch_xvector/far_epoch_21/task3_enroll/xvector_norm.ark
LOG (ivector-normalize-length[5.5.8041-a8c6]:main():ivector-normalize-length.cc:90) Processed 21580 iVectors.
LOG (ivector-normalize-length[5.5.804
1-a8c6]:main():ivector-normalize-length.cc:94) Average ratio of iVector to expected length was 44.3168, standard deviation was 3.67065
ivector-compute-lda --dim=100 --total-covariance-factor=0.1 ark:exp/pytorch_xvector/far_epoch_21/task3_enroll/xvector_norm.ark ark:data/mfcc_20_5.0/task3_enroll/utt2spk exp/pytorch_xvector/far_epoch_21/task3_enroll/transform_100.mat
LOG (ivector-compute-lda[5.5.8041-a8c6]:main():ivector-compute-lda.cc:288) Read 21580 utterances, 0 with errors.
LOG (ivector-compute-lda[5.5.804
1-a8c6]:main():ivector-compute-lda.cc:294) Computing within-class covariance.
LOG (ivector-compute-lda[5.5.8041-a8c6]:main():ivector-compute-lda.cc:299) 2-norm of iVector mean is 0.771824
LOG (ivector-compute-lda[5.5.804
1-a8c6]:ComputeLdaTransform():ivector-compute-lda.cc:136) Stats have 21580 speakers, 21580 utterances.
ASSERTION_FAILED (ivector-compute-lda[5.5.804~1-a8c6]:ComputeLdaTransform():ivector-compute-lda.cc:137) Assertion failed: (!stats.Empty())

[ Stack-Trace: ]
/home/lanhaile/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x808) [0x7f4b1168e35c]
/home/lanhaile/kaldi/src/lib/libkaldi-base.so(kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)+0x59) [0x7f4b1168eda3]
ivector-compute-lda(kaldi::ComputeLdaTransform(std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, kaldi::Vector, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, kaldi::Vector> > > const&, std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > > > > const&, float, float, kaldi::MatrixBase*)+0x705) [0x40d38f]
ivector-compute-lda(main+0xd6e) [0x40e6eb]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f4b09e2a840]
ivector-compute-lda(_start+0x29) [0x40ca89]

ivector-transform exp/pytorch_xvector/far_epoch_21/task3_enroll/ scp:exp/pytorch_xvector/far_epoch_21/task3_enroll/xvector.scp ark:exp/pytorch_xvector/far_epoch_21/task3_enroll/xvector_lda100.ark
ERROR (ivector-transform[5.5.804~1-a8c6]:Read():kaldi-matrix.cc:1617) Failed to read matrix from stream. : Expected "[", got EOF File position at start is -1, currently -1

[ Stack-Trace: ]
/home/lanhaile/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x808) [0x7fbf6bdd735c]
ivector-transform(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x11) [0x40a709]
/home/lanhaile/kaldi/src/lib/libkaldi-matrix.so(kaldi::Matrix::Read(std::istream&, bool, bool)+0x1a82) [0x7fbf6c020f76]
/home/lanhaile/kaldi/src/lib/libkaldi-util.so(void kaldi::ReadKaldiObject<kaldi::Matrix >(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, kaldi::Matrix*)+0x239) [0x7fbf6c298bdc]
ivector-transform(main+0xeb) [0x409681]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fbf64573840]
ivector-transform(_start+0x29) [0x4094c9]

kaldi::KaldiFatalErrorawk: cmd. line:1: fatal: cannot open file `exp/pytorch_xvector/far_epoch_21/task3_test/lr_task3_enroll_task3_test_lda100_submean_norm.score' for reading (No such file or directory)
Tansforming score to table done.**
期待您的答疑,万分感谢!

Do we have external reference about model-based CORAL?

Hi thanks again for developing such amazing toolkit. I am now looking at the PLDA backend work, including the report from Jiafeng.

I do find very useful information about CORAL, which was originated from here. But that is for feature-based CORAL. In your implementation I believe it is model-based CORAL on PLDA. Do we have any external reference about model-based CORAL?

Thanks in advance.

Errors when do AS-Norm

Hi, thank you for this great tools, when i run voxcelebSRC recipe, i met some errors when do as-norm, they are:

  1. can't find score file error(name error, i have fixed it);
  2. when get subset of train set for cohort set, should also get subset of xvector of train set(i have fixed it);
  3. when i chose cohort_method="mean",i get an error like 2,but i can't fix it because i don't understand how this method do, so can you fix it? or you can provide some references for me, maybe i can implement it and pr.
    Thx !!!

Do we have a clear reference on ResNET34 settings?

Hi and big kudos to your asv-subtool with both academic and practical contributions!

I found the ResNET34 setting in the toolkit does not have a clear reference. While for other x-vector networks references are quite clear, can I have such for this model please? Maybe from your group?

By saying ResNET34, I am talking about this implemented class.

how to prepare our own data?

hello XMU Speech Lab, Thank you so much for the great work you shared.
I wonder how to prepare our own data? Preparing wav.scp, utt2spk and spk2utt like Kaldi formats?
I couldn't get information about data preparing in README. Looking forward to answer.
best wishes

少量数据

从网上下载数据量过大,复现流程过久,我是做NLP的,最近在弄声纹识别,作为一个新手来说,复现流程不是很友好,有一点费劲,如果能提供一份少量voxceleb数据,能够快速复现整体流程,而不需要去一直等数据下载下来才能复现流程。

关于预训练模型

有些任务只需要使用embedding而不关心具体的细节。是否可以麻烦作者提供一些已经训练好的模型供使用??

以及是否有建立的微信群,可以方便大家更及时的讨论和反馈??

感谢作者。非常出色的工作。

当使用ResNet模型进行迁移的问题

Hello, when I replaced the TDNN model with resnet-xvector.py in your model for Transfer learning, the following errors occurred during scoring. All but the loss layer were migrated. I hope to get your answer. Looking forward to your reply.Thank you.

ERROR (ivector-compute-plda[5.5]:Cholesky():tp-matrix.cc:110) Cholesky decomposition failed. Maybe matrix is not positive definite.

[ Stack-Trace: ]
/home/yqc/kaldi-master/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0xb42) [0x7f68e936c732]
ivector-compute-plda(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x21) [0x564450bdc4e9]
/home/yqc/kaldi-master/src/lib/libkaldi-matrix.so(kaldi::TpMatrix::Cholesky(kaldi::SpMatrix const&)+0x1b1) [0x7f68e95d73d1]
/home/yqc/kaldi-master/src/lib/libkaldi-ivector.so(+0x1b99a) [0x7f68e9a7399a]
/home/yqc/kaldi-master/src/lib/libkaldi-ivector.so(kaldi::PldaEstimator::GetOutput(kaldi::Plda*)+0x1c6) [0x7f68e9a75e00]
/home/yqc/kaldi-master/src/lib/libkaldi-ivector.so(kaldi::PldaEstimator::Estimate(kaldi::PldaEstimationConfig const&, kaldi::Plda*)+0x195) [0x7f68e9a76617]
ivector-compute-plda(main+0xd13) [0x564450bdb86d]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f68e894bc87]
ivector-compute-plda(_start+0x2a) [0x564450bdaa7a]

kaldi::KaldiFatalError(subtools)

[ Frank Discussion 1 ] Training Strategy

Welcome to discuss the training strategy here.

There are two typical training strategies, "SGD + Reduce Learning Rate on Plateau" and "Adam + Warm Restarts".

SGD + Reduce Learning Rate on Plateau

(1) Training slowly but could make a good generalization.
(2) The parameters of ReduceLROnPlateau should be set carefully, such as patience and learning rate scale.
......

Adam + Warm Restarts

(1) It is not clear to set the T for Warm Restarts.
(2) It is dizzy to make sure how many times the Restarts should be.
......

In fact, I am still not sure how the value of weight decay influences the results when training with these two strategies. And are there any other factors decide the final performance when comparing the two strategies?

Welcome to comment and share your experiments.

请教:关于训练数据egs分配min-chunk的问题。

您好,
感谢您百忙之中查看我的邮件。
在kaldi中,训练神经网络之前会将特征生成用于训练网络的egs特征,这个值默认为[200,400]之间的随机数。
我的理解是这个将原始训练特征,切分为随机取值的过程,是为了增强网络对不同长度音频xvector提取的鲁棒性。
在您的asv-subtools中,好像是min-chunk直接设置的固定值。也就是用于训练的egs大小固定。
请问:
1.egs的选取(包括随机取值范围,是否是固定值)对网络性能有何影响?
2.在asv-subtool中,为何采用固定的min-chunk大小?能够更新一个类似于kaldi中可变egs大小的版本?
期待您的回信。
祝好!

subtools

执行recipe下的sh文件时,相对路径寻找有很多的问题,尤其以找不到. subtools/*.sh文件这样的问题居多,因为recipe下没有subtools文件,我只能把subtools路径加入到PATH里,然后把代码改成. *.sh才能执行。不知道是我那里没设置好,还是您的代码相对路径存在问题

Issues about get_params_for_score()

Hi,

I'm trying to train my model based on recipe/voxcelebSRC, but I had a problem at the scoring stage.

In gather_results_from_epochs.sh, $enroll_cohort_name.score equals:

cosine_voxceleb1_O_enroll_voxceleb2_dev_submean_norm_voxceleb2_dev.score

But the generated score file appears to be:

cosine_voxceleb1_O_enroll_voxceleb2_devspk_xvector_submean_norm_mean_voxceleb2_dev.score

Looks like get_params_for_score() in score.sh failed to generate the correct $suffix:

final_file=spk_xvector_submean_norm_mean.ark
input_name=xvector

suffix=$(echo ${final_file%.*} | sed 's/^'"$inputname"'//g;'s/spk_xvector_mean//g'')

It only removes continuous spk_xvector_mean but is not functional for spk_xvector***_mean.

Best regards,
Ya-Qi Yu

关于AM-SOFTMAX收敛性问题的讨论

赵淼您好!
感谢百忙之中抽空查看我的邮件。
最近我在用开源工具ASV-subtools做一些声纹识别的研究。其中碰到了一个小问题想请教一下。
我目前用runResnetXvector.py的脚本训练resnet网络模型。在默认的参数下模型已经训练完毕了。查看loss曲线和acc曲线都比较正常。然后我把之前的softmax损失函数替换成am-softmax损失函数,把超参数m设置为0.3,同时用退火算法,算法会慢慢从softmax损失函数过度到am-softmax损失函数。这样的改动,导致模型在训练时的acc降到了70%,损失函数出现了先下降后上升的趋势。如果把超参数m设置为0.1,从acc曲线看,模型收敛的速度会快很多。根据这种现象,我有三个疑问想请教一下:
(1)从结果上看,超参数m大小似乎对模型的性能影响是很敏感的,不知道这是不是正常的现象。
(2)理论上讲,am-softmax可以使类间分得更开,从而应该比softmax损失函数有着更高的acc,但从附件上的图看,准确率变得低了很多,损失函数也在下降后又急速上升,这种现象是否是正常的?
(3)有没有什么好的方法能够在am-softmax损失函数下加快模型的收敛速度?
期待您的回信。
祝好!--------------------------------------------------------------------------

楼一杰你好,
总的来说,使用AM-softmax的时候,有一些参数需要注意。首先除了打开这个loss外,要考虑一下最后一层是否保留bn和relu,以及use_step的参数区域是否要使用渐变增加margin的策略。基于此,就你的疑问,我的理解如下:
(1)m作为惩罚,对模型训练是比较敏感的,太大可能导致收敛的问题,训练不好就会影响性能。另外一个是,如果你没有去掉最后一层relu,那么分类空间会小得多(非负意味着仅在第一象限),此时m更不适合取得太大。一般我们取0.2,仅供参考。
(2)关于acc的对比上,其实没有绝对的正比关系,acc更多的要考虑过拟合问题来审视。同时,应该以valid set的acc进行对比,trainset的acc对比意义会少很多。损失函数急速下降上升,可能是因为你画的trainset的loss,因为trainset的loss计算有惩罚的部分(要想获得真实loss,需要重复计算,这个一般不考虑,费时间),而惩罚在不断增加,所以这个loss是不可靠的,或许你可以看看validset。
(3)AM-softmax损失本身可以一定程度上加快训练速度,但是一般直接训练又可能会导致训练较差,所以默认选择比较鲁棒的渐变训练策略。在固定epoch的训练中,如果你发现后期AM比Softmax收敛的更差,往往这意味着你的惩罚太大,不能很好的收敛。另外,训练速度与优化器也有关系。
祝好!

The way to directly get the recognition result.

您好,请问一下您的baseline里面包含可以直接输入音频单个音频直接可以解码出识别结果的脚本吗?如果有的话可以告知一下是哪个嘛?如果没有的话希望您能提供一下大致的流程和思路,非常感谢!

Xuran

关于PhoneticXvector的训练问题

赵淼你好,
根据你的runPhoneticXvector.sh训练脚本,网络部分我自己修改了。
我一共有89个iter,但是训练到28个iter的时候出现了报错。
报错信息出现的情况相似于论坛中https://groups.google.com/g/kaldi-help/c/F7cud3lbDMo/m/VuNDG-qRBgAJ
我的报错信息是如下:
WARNING (nnet3-train[5.5]): ConstrainOrthonormalInternal():nnet-utils.cc:1055) Ratio is nan (should be >=1.0);component is tdnnf10.liner
ASSERTION_FAILED (net-trian [5.5]: ConstrainOrthonormalInternal():nnet-utils.cc:1057) Assertion failed: (ratio > 0.9)

How to prepare utt2spk utt2lang trails files for asv-subtools/recipe/ap-olr2020-baseline?

Hi @Snowdar ,

The recipe asv-subtools/recipe/ap-olr2020-baseline is designed for language recognition tasks.
So at the data preparation stage, should I put the language label in the utt2spk file or in the utt2lang file?

I am new to language recognition, so I am litter confused about the above codes.

So what file I should use as input for subtools/getTrials.sh to generate the trials file?

Thanks

请问为什么sre-fbank-81.conf配置的特征是81维而不是80维

您好,感谢提供这么一个优秀的工具。是这样的,我一直有个问题不理解,为什么sre-fbank-81.conf配置的特征维度是81维而不是80维,我看到num-mel-bins设置的是80,极其相似的配置,sre-fbank-40.conf里面num-mel-bins设置的是40,特征的维度就是40

Issues about evaluation results (Cavg and EER) of baseline system of OLR2020 Challenge

Hi.
We want to ask two questions about the evaluation results of OLR2020 Challenge baseline system.

1、Cavg and eer of task1 using test data AP19-OLR-channel

We noticed that in Table 2, the official results of task1 using i-vector is: Cavg--0.2965 EER%--19.40. But we get a results like: Cavg--0.2997, EER%--29.91. We are doubting that is there any probability that the official results have mistakenly put a wrong EER% number into Table 2. Just like the pictures below, we find that the EER number 19.40% present not only in Table2, but Table3, of the same task, and the Cavg and EER in Table3 are not too far different which is opposite to Table2. And in other literatures using Cavg and EER as their evaluation criterion, we also barely see any circumstances that has such a big difference between the two number. We hope the members of official would check the results, thanks!
image

2、The way computeCavg.py calculates Cavg in open-set identification task

image
We are using the formular like the picture above to calculate Cavg in each task. But we find something that we don't understand in the python code computeCavg.py which gives a prior probability greater than 1 in task2 which is an open-set task.
In task1, we have 6 languages in both enrollset and testset, and it gives a prior probability of 0.5 for target-language, and 0.1 for each non-target-languages. There is no problem here. It computes Cavg like
image
In task2, we have 6 languages in testset but 3 in enrollset. The program sees lang_num as 3, and get prior probability of each non-target-language of 0.25. But what is odd is that the program sees the other 3 languages which are not in enrollment as another non-target-language and gives it a prior probability of 0.25. I have already followed each step of this code and get some of the parameters showed below. The code is like

line 113 p_nontarget = (1 - p_target) / (lang_num - 1)  # lang_num=3, p_nontarget=0.25
line 114 target_cavg[lang] = p_target * p_miss + p_nontarget*sum(p_fa)  # p_fa is a list, length is 3, p_fa[2] represents false alarm probability of the overall of three languages not in enrollset

Finally it computes Cavg like
image
where Ln(n=3) represents the overall of those languages not in enrollset. So from the formular above, we get an entire prior probability of 0.5+0.25*3=1.25 which is greater than 1. I don't konw is there any misunderstanding on this formular or how the program works...Could you please give us some hints on this?

Sincerely
Yizhou Peng

issues about speaker info and language info usage in OLR2020 Baseline

When we are training the baseline system, we are wondering what to use as the speaker information.
Can we ask that when you are training the ivector system, did you change spkid to langid in spk2utt and utt2spk file? Or just used the original spk info to train an UBM and i-vector extractor. Whatever the condition is, when we train the classifier, I think we should use languages as the labels, is there any problem if we use speaker info to train an i-vector extractor and classify the vector to some languages?

DistributedSampler

I noticed the WARNNING in the source code for torch.utils.data.distributed.DistributedSampler :

    .. warning::
        In distributed mode, calling the :math`set_epoch(epoch) <set_epoch>` method at
        the beginning of each epoch **before** creating the :class:`DataLoader` iterator
        is necessary to make shuffling work properly across multiple epochs. Otherwise,
        the same ordering will be always used.

so we should add data.train_sampler.set_epoch(this_epoch) at the begin of every epoch? zhihu

subtools/scoreSets.sh中191行特征提取错误数统计值errorNum

image

问题:若$vectordir/$set/log/文件夹下有未清理的plda.log且plda.log中有ERROR,errorNum=$(grep ERROR $vectordir/$set/log/*.log | wc -l)会将plda的error误统计为”it means you lose many vectors which is so bad thing and I suggest you to extract vectors of this dataset again“。 建议:若为了统计特征提取数有误,可设置errorNum=$(grep ERROR $vectordir/$set/log/extract.*.log | wc -l) (好像用文中写的issue字会出错,下图是原文)

image

多gpu运行

首先感谢xmuspeech的subtools工具~
请问一下,当使用命令 subtools/runPytorchLauncher.sh run-resnet34-fbank-81-benchmark.py --gpu-id=0,1 --stage=3 --endstage=3 ,也就是 python3 -m torch.distributed.launch --nproc_per_node=2 run-resnet34-fbank-81-benchmark.py --gpu-id=0,1 --port 2345 --stage=3 --endstage=3 时,出现如下warning和error,可能是环境还是哪里出现问题导致多卡初始化失败呢?
image
image

Mixup object has no attributor 'lam'

Hi, Thanks for great job. I have tried to use Mixup Learning Strategies,
but I got an Error which says:
Mixup object has no attribute 'lam'
I think this may be an implementation bug.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.