Comments (4)
For the speaker encoder, instead of using lookup table, I use d-vecter so that it can work for unseen speakers. I will try this approach and update you soon.
Thank you!
from meta-tts.
Hi,
The red blocks are actually the same. It is done by pytorch-lightning automatically with distributed training (we use "DDP" in this paper) or accumulate gradients. The training_step()
function would return the meta-loss of each task, then DDP would synchronize and average the meta-losses of each task on each GPU. The number of tasks to be averaged can be controlled by the number of GPUs and the gradient accumulation steps.
The "out of space" operation is only required for iMAML training, which is implemented but not shown in the paper. The MAML.adapt()
in L2L does not return a new MAML object, instead it uses the same MAML object (inherited from torch.nn.Module) and simply replace the parameter with the calculated updated tensors. This is called "in-place" in pytorch, and would cause some problems in implementing the iMAML loss. So I inherit my custom MAML from L2L, change the original in-place adapt()
into adapt_()
(which is the pytorch coding style of in-place operation), and create an out-of-space adapt()
(which would return a new MAML object).
from meta-tts.
@SungFeng-Huang I am going to reimplement without using pytorch lightning, in the training step, if I have a multispeaker fastspeech with 10 speakers, is that mean I much have 10 cloned models for 10 tasks, can I use randomly 4 tasks at meta_update :))
I am trying to understand the MAML training process, is the target of MetaTTS is "to train a general model for multispeaker and then fast-tuning for new speaker in the adaptive stage"?
Can the new tuned model work for all speakers (training speaker + new speaker) or only the new speaker?
My target is the new model can work for all speakers?
from meta-tts.
@v-nhandt21 Sure you can randomly choose 4 tasks at meta-update! In my case, my multi-speaker FastSpeech 2 has more than 200 speakers, while each meta-update I use only 8 tasks. If you want to reimplement without pytorch-lightning, since the distributed training is the only way to run tasks in parallel, you might need to handle DDP by yourself, or else you would need gradient accumulation, which is a trade-off between time and computational resources.
Yes, your understanding about MAML/MetaTTS is correct.
In the basic MetaTTS setup, we would fine-tune the whole model for the new speaker, so the tuned model would only work for the new speaker.
You can also get a new tuned model to work for all speakers (training speaker + new speaker) if and only if you extend the speaker embedding lookup table then only tune the newly initialized speaker embedding while keeping the trained speaker embeddings unchanged. But this setting is less efficient, and would not take the "fast-tuning" advantage of MetaTTS as shown in our experiments.
from meta-tts.
Related Issues (10)
- What's the difference between the models in the "pretrained model" link? HOT 2
- PERSONALIZED LIGHTWEIGHT TEXT-TO-SPEECH: VOICE CLONING WITH ADAPTIVE STRUCTURED PRUNING HOT 3
- Have you tried to finetune only a few parameters? HOT 1
- LibriTTS-360 pretrained checkpoints HOT 2
- 8gpu CUDA out of memory HOT 3
- incompatible package versions HOT 9
- update evaluation/README HOT 1
- Is 4 positive samples and 4 negative samples enough for computing the threshold for asv acceptant rate? HOT 2
- try HiFi-GAN HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from meta-tts.