Giter Site home page Giter Site logo

Comments (10)

cschaefer26 avatar cschaefer26 commented on September 18, 2024

Hi, first of all good luck with the multispeaker implementation, I actually have played around with it (branch multispeaker, very old). Regarding your question, usually the restriction to mel_len is there in case one wants to do batched inference (to remove the paddings). For batch size=1 the lengths should match.

from forwardtacotron.

thanhlong1997 avatar thanhlong1997 commented on September 18, 2024

Oh. Your code extract aliment from tacotron by pass forward not by inference. That why we need get aliment matrix up to mel_len element for removing the padding value. When I replace your tacotron model, i thought aliment matrix must be extracted by running inference and i get the warning "Sum of durations did not match mel length". Now i know why that happen
Thank you for the explain

from forwardtacotron.

thanhlong1997 avatar thanhlong1997 commented on September 18, 2024

Can I let the issues open until i finish the implement ?

Thank sir

from forwardtacotron.

cschaefer26 avatar cschaefer26 commented on September 18, 2024

Surely, keep me updated!

from forwardtacotron.

thanhlong1997 avatar thanhlong1997 commented on September 18, 2024

Hi, i am successful implement Forward tacotron multispeaker version. The result sound good. Still we are using pretrained tacotron or with my version is mellotron to extract aliment between character and melspectrogram, but in fastpitch and fastspeech i see they now use montre to extract aliment. Have you tried it before ? will it better or worse than using tacotron to extract? I have tried using it in fastpitch but still the result is pool

from forwardtacotron.

cschaefer26 avatar cschaefer26 commented on September 18, 2024

Hi, I tried using the MFA for duration extraction before and I found it to be slightly worse, there was also quite some fiddling involved with mapping the phonemes. It wasn't totally bad though.

from forwardtacotron.

thanhlong1997 avatar thanhlong1997 commented on September 18, 2024

thank sir
one more question, now i am using grapheme for encoding text, since i have to work with multilingual data. Will use phoneme is better or not ? in fastpitch too, i saw they using both grapheme and phoneme

from forwardtacotron.

cschaefer26 avatar cschaefer26 commented on September 18, 2024

In my experience phonemes trump graphemes big time because of the bijective nature of phonemes that is easier to learn for the tts model. Result is much more stable pronunciation and prosody. For multilingual data simply phonemize your metafile upfront and set use_phonemes=False in the config. You could checkout https://github.com/as-ideas/DeepPhonemizer for a stable phonemizer (you might need to train your own phonemizer model for your needs, but its probably worth it)

from forwardtacotron.

thanhlong1997 avatar thanhlong1997 commented on September 18, 2024

thank you sir for your advise, since i use phoneme instead of graphemes then the result is promising. Now i would to deploy this model to device like mobile phone or camera, but the model have a big size so i must compression them, but the result is not good. Have you done it before ? what can i do if i want to deploy to device and still keep its quality?
Thank sir

from forwardtacotron.

thanhlong1997 avatar thanhlong1997 commented on September 18, 2024

do you think non auto-regressive model like forward tacotron or fast speech2 is not as good as auto-regressive model like tacotron2 ? when i try bot forward tacotron and fast speech on same dataset, i found that the audio generated is not good as result by tacotron2, even it outperform tacotron2 in speed. I am trying to improve your model and fast speech 2 for comparable to tacotron2 but it seem too hard to do this

from forwardtacotron.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.