Giter Site home page Giter Site logo

Comments (1)

yl4579 avatar yl4579 commented on August 25, 2024 1

The w/o augmentation in our ablation study is pretty much training all components together. We didn't specifically test training together, but we believe there should be no difference between training everything together and w/o augmentation because in most TTS systems the decoder does not depend on the gradient of variance predictor (there is a nograd operation after the text encoder output). When the decoder converges, the predictor should also converge, the same as in stage 2 of training, but we are not sure exactly what will happen.

You can also train two stages together with augmentation. It is not impossible to apply the duration-invariant data augmentation when you train them E2E, although you will need the decoder output with stretched or compressed representations to reconstruct the mel-spectrogram. If your decoder is not well-trained, however, this will derail the predictor and make it converge slower or maybe to a worse minimum, so I don't believe it should be better than 2-stage training as there is a nograd operation to the predictor (i.e., no other components need the gradient from predictor).

If you do not apply nograd operation, I don't know what will happen. You may try it and see. However, I do believe there is a reason why nograd operation is applied in most TTS systems between the variance predictor and the rest of the components. This is likely because the F0 predicted by the variance predictor cannot be exactly the ground truth F0, so if you force the decoder to reconstruct the mel-spectrogram with incorrect F0 and also force it to reconstruct with correct F0, it will find a point in-between as the optimal solution and lead to worse sound quality, or may not use the F0 information at all.

from styletts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.