Comments (4)
Currently, our text annotations are only available for audio clips recorded with DIRECTIONAL devices. The reason for this is that we focus on annotating clear and distinct audio rather than using audio data that is not as clear, such as those from far-field recordings or in dialects. Our dataset is more focused on speaker-related tasks. If further text annotation releases, we will update the information on our website.
from 3d-speaker.
Thanks for your explanation.
Ok, maybe off topic, if not appropriate, pls close it.
I read LAURAGPT, It says the the trainning data of TTS is LibriTTS and 3D-Speaker, and copied it 2 times, so the number of samples is 5.0M.
LibriTTS train set is about 206K, and all 3D-Speaker's train set is about 643k, if count annotations, it will be less.
So the number of samples for trainning TTS is wrong? should be 500k?
from 3d-speaker.
In the experiment with LauraGPT, data from the highest quality device of 3D-Speaker Datasets was utilized, and certain data augmentation was performed. For specific data details, please refer to the original paper.
from 3d-speaker.
After double-checking with the authors, it appears that the LibriTTS data you provided seems to be smaller than expected. Additionally, we have also utilized data from aishell-1,2,3 in the TTS tasks, which was inadvertently omitted in the current preprint version of our paper. We will rectify this detail in our subsequent revisions.
from 3d-speaker.
Related Issues (20)
- 使用speaker diarization结合视频的DER结果效果比单音频的还要差,请问这可以微调嘛? HOT 3
- 使用speaker diarization结合视频的DER结果效果比单音频的还要差,请问这可以微调嘛?
- 关于切分subseg的问题 HOT 1
- 关于人脸相关模型输入通道的问题。 HOT 1
- support real-time speaker diarization? HOT 1
- 数据集 HOT 3
- 有没有ERes2NetV2,m_channels = 32,在200k-Spkrs上面训练的模型发布? HOT 4
- 客户端没有所需的特权
- For ERes2NetV2 performance on short-duration wavs HOT 2
- SELF-DISTILLATION NETWORK WITH ENSEMBLE PROTOTYPES: LEARNING ROBUST SPEAKER REPRESENTATIONS WITHOUT SUPERVISION HOT 2
- 流式说话人识别可以实现吗? HOT 1
- 关于ERes2Net_VOX模型的效果问题 HOT 4
- Assertion error
- Inference index info in indentification from trained model HOT 5
- Numbers of speakers HOT 1
- GPU requirement for sv-eres2netv2 HOT 3
- 请教language-identification语料时数问题 HOT 2
- 训练问题 HOT 3
- 请教多模态说话人日志处理问题 HOT 1
- 训练CAM++时的问题 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from 3d-speaker.