Comments (14)
rdino的训练初期eer是超过10%的,继续训练就好。
from 3d-speaker.
from 3d-speaker.
我用的是dino,没有用rdino
from 3d-speaker.
看趋势是正常范围,可以继续训练,一般25epochs之后,EER会降低到5%以下。
from 3d-speaker.
感谢您的回答。我还有两个问题,在mlp那里三层的fc每一层后面都加了bn,这个我没有加影响大吗。还有
这里我也没有加,请问这里是在做什么。
from 3d-speaker.
- 前两个fc后最好加入bn,没有严格对比过缺失对训练影响。
- 图中支持多卡batchnorm。
from 3d-speaker.
好的谢谢,我发现经过mlp之后得到的是65536dim的向量,然后放入softmax里算分布。为什么最后要选择这么大的一个向量,这是有什么实验证明吗
from 3d-speaker.
参照DINO原文中实验配置,我们实验发现如果大幅缩小dim维度,性能会大幅降低。
from 3d-speaker.
非常感谢您的回答
from 3d-speaker.
不客气,期待您进一步的研究。
from 3d-speaker.
![image](https://private-user-images.githubusercontent.com/131787594/266195588-2dcdf43d-b438-4716-a1d8-95feef47f833.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjIzMTQwMzMsIm5iZiI6MTcyMjMxMzczMywicGF0aCI6Ii8xMzE3ODc1OTQvMjY2MTk1NTg4LTJkY2RmNDNkLWI0MzgtNDcxNi1hMWQ4LTk1ZmVlZjQ3ZjgzMy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzMwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDczMFQwNDI4NTNaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1iMTBlNGY1ZTI1ZDI2ODA2ZmNmODM2OWVhZTBlZmRmMTViZWI1NjNmZTAwYzdiZjhhMTRlM2UxZmQ3ZDEyNDJjJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.Qz3muVs93h9WcDWC8xYl8C_HdpX8ccqxve0gggedhYw)
from 3d-speaker.
是存在可能的,或者是因为你修改代码失误,建议可以先跑完整的源码。
from 3d-speaker.
我刚刚踏入这个领域,还想问一个简单的问题,dino在保存模型的时候是保存student model还是teacher model。最后在测试的时候是用teacher model还是student model进行测试呢
from 3d-speaker.
都保存,测试使用teacher model,详情见代码哈~
from 3d-speaker.
Related Issues (20)
- 使用speaker diarization结合视频的DER结果效果比单音频的还要差,请问这可以微调嘛? HOT 3
- 使用speaker diarization结合视频的DER结果效果比单音频的还要差,请问这可以微调嘛?
- 关于切分subseg的问题 HOT 1
- 关于人脸相关模型输入通道的问题。 HOT 1
- support real-time speaker diarization? HOT 1
- 数据集 HOT 3
- 有没有ERes2NetV2,m_channels = 32,在200k-Spkrs上面训练的模型发布? HOT 4
- 客户端没有所需的特权
- For ERes2NetV2 performance on short-duration wavs HOT 2
- SELF-DISTILLATION NETWORK WITH ENSEMBLE PROTOTYPES: LEARNING ROBUST SPEAKER REPRESENTATIONS WITHOUT SUPERVISION HOT 2
- 流式说话人识别可以实现吗? HOT 1
- 关于ERes2Net_VOX模型的效果问题 HOT 4
- Assertion error
- Inference index info in indentification from trained model HOT 5
- Numbers of speakers HOT 1
- GPU requirement for sv-eres2netv2 HOT 3
- 请教language-identification语料时数问题 HOT 2
- 训练问题 HOT 3
- 请教多模态说话人日志处理问题 HOT 1
- 训练CAM++时的问题 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from 3d-speaker.