Giter Site home page Giter Site logo

Comments (4)

v-nhandt21 avatar v-nhandt21 commented on September 10, 2024 1

For the speaker encoder, instead of using lookup table, I use d-vecter so that it can work for unseen speakers. I will try this approach and update you soon.

Thank you!

from meta-tts.

SungFeng-Huang avatar SungFeng-Huang commented on September 10, 2024

Hi,

The red blocks are actually the same. It is done by pytorch-lightning automatically with distributed training (we use "DDP" in this paper) or accumulate gradients. The training_step() function would return the meta-loss of each task, then DDP would synchronize and average the meta-losses of each task on each GPU. The number of tasks to be averaged can be controlled by the number of GPUs and the gradient accumulation steps.

The "out of space" operation is only required for iMAML training, which is implemented but not shown in the paper. The MAML.adapt() in L2L does not return a new MAML object, instead it uses the same MAML object (inherited from torch.nn.Module) and simply replace the parameter with the calculated updated tensors. This is called "in-place" in pytorch, and would cause some problems in implementing the iMAML loss. So I inherit my custom MAML from L2L, change the original in-place adapt() into adapt_() (which is the pytorch coding style of in-place operation), and create an out-of-space adapt() (which would return a new MAML object).

from meta-tts.

v-nhandt21 avatar v-nhandt21 commented on September 10, 2024

@SungFeng-Huang I am going to reimplement without using pytorch lightning, in the training step, if I have a multispeaker fastspeech with 10 speakers, is that mean I much have 10 cloned models for 10 tasks, can I use randomly 4 tasks at meta_update :))

I am trying to understand the MAML training process, is the target of MetaTTS is "to train a general model for multispeaker and then fast-tuning for new speaker in the adaptive stage"?

Can the new tuned model work for all speakers (training speaker + new speaker) or only the new speaker?

My target is the new model can work for all speakers?

from meta-tts.

SungFeng-Huang avatar SungFeng-Huang commented on September 10, 2024

@v-nhandt21 Sure you can randomly choose 4 tasks at meta-update! In my case, my multi-speaker FastSpeech 2 has more than 200 speakers, while each meta-update I use only 8 tasks. If you want to reimplement without pytorch-lightning, since the distributed training is the only way to run tasks in parallel, you might need to handle DDP by yourself, or else you would need gradient accumulation, which is a trade-off between time and computational resources.

Yes, your understanding about MAML/MetaTTS is correct.

In the basic MetaTTS setup, we would fine-tune the whole model for the new speaker, so the tuned model would only work for the new speaker.
You can also get a new tuned model to work for all speakers (training speaker + new speaker) if and only if you extend the speaker embedding lookup table then only tune the newly initialized speaker embedding while keeping the trained speaker embeddings unchanged. But this setting is less efficient, and would not take the "fast-tuning" advantage of MetaTTS as shown in our experiments.

from meta-tts.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.