I have a question about meta update in the outer loop of Meta TTS I

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Question for Meta-update in Meta-TTS about meta-tts HOT 4 CLOSED

sungfeng-huang commented on September 10, 2024

Question for Meta-update in Meta-TTS

from meta-tts.

Comments (4)

v-nhandt21 commented on September 10, 2024 1

For the speaker encoder, instead of using lookup table, I use d-vecter so that it can work for unseen speakers. I will try this approach and update you soon.

Thank you!

from meta-tts.

SungFeng-Huang commented on September 10, 2024

Hi,

The red blocks are actually the same. It is done by pytorch-lightning automatically with distributed training (we use "DDP" in this paper) or accumulate gradients. The training_step() function would return the meta-loss of each task, then DDP would synchronize and average the meta-losses of each task on each GPU. The number of tasks to be averaged can be controlled by the number of GPUs and the gradient accumulation steps.

The "out of space" operation is only required for iMAML training, which is implemented but not shown in the paper. The MAML.adapt() in L2L does not return a new MAML object, instead it uses the same MAML object (inherited from torch.nn.Module) and simply replace the parameter with the calculated updated tensors. This is called "in-place" in pytorch, and would cause some problems in implementing the iMAML loss. So I inherit my custom MAML from L2L, change the original in-place adapt() into adapt_() (which is the pytorch coding style of in-place operation), and create an out-of-space adapt() (which would return a new MAML object).

from meta-tts.

v-nhandt21 commented on September 10, 2024

@SungFeng-Huang I am going to reimplement without using pytorch lightning, in the training step, if I have a multispeaker fastspeech with 10 speakers, is that mean I much have 10 cloned models for 10 tasks, can I use randomly 4 tasks at meta_update :))

I am trying to understand the MAML training process, is the target of MetaTTS is "to train a general model for multispeaker and then fast-tuning for new speaker in the adaptive stage"?

Can the new tuned model work for all speakers (training speaker + new speaker) or only the new speaker?

My target is the new model can work for all speakers?

from meta-tts.

SungFeng-Huang commented on September 10, 2024

@v-nhandt21 Sure you can randomly choose 4 tasks at meta-update! In my case, my multi-speaker FastSpeech 2 has more than 200 speakers, while each meta-update I use only 8 tasks. If you want to reimplement without pytorch-lightning, since the distributed training is the only way to run tasks in parallel, you might need to handle DDP by yourself, or else you would need gradient accumulation, which is a trade-off between time and computational resources.

Yes, your understanding about MAML/MetaTTS is correct.

In the basic MetaTTS setup, we would fine-tune the whole model for the new speaker, so the tuned model would only work for the new speaker.
You can also get a new tuned model to work for all speakers (training speaker + new speaker) if and only if you extend the speaker embedding lookup table then only tune the newly initialized speaker embedding while keeping the trained speaker embeddings unchanged. But this setting is less efficient, and would not take the "fast-tuning" advantage of MetaTTS as shown in our experiments.

from meta-tts.

Question for Meta-update in Meta-TTS about meta-tts HOT 4 CLOSED

Comments (4)

Related Issues (10)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent