liusongxiang / efficient_tts Goto Github PK

View Code? Open in Web Editor NEW

114.0 114.0 22.0 62.23 MB

Pytorch implementation of "Efficienttts: an efficient and high-quality text-to-speech architecture"

License: MIT License

Shell 8.47% Python 56.80% Perl 34.59% Makefile 0.14%

efficient_tts's People

Contributors

Stargazers

Watchers

efficient_tts's Issues

synthesizing with HiFi-GAN

Hi. Did you try fine-tuing HiFi-GAN(https://github.com/jik876/hifi-gan) which can alleviate the metal noise and greatly improve the quality of synthesized voice?
(Generated mels using efficientTTS with teacher-forcing)

Inference speed

Hello everyone!
Great job! I see that is still has metal sound, but my question is about an inference speed. How does it compare to Tacotron 2? Is it much faster, as the paper says? Could you please tell approximate real time ratio on CPU (and CPU model)?
Thank you a lot!

questions about code

I wonder if the code in

efficient_tts/nntts/models/efficient_tts.py

Line 338 in d186a56

energies = -1 * ((imv.unsqueeze(1) - p.unsqueeze(-1))**2) * sigma

is correct？It seems two tensors with different size make subtraction，[B,T2,1] and [B,T1,1]

High eval mel loss when training on Mandarin datasets

Hi. Thank you for your implementation. I trained the model on some Mandarin datasets (12000/train_set & 100/eval_set) for about 695k steps. The train/mel loss is about 0.12 and train/dur_loss is about 0.0158. The eval/dur_loss is about 0.07. However, the eval/mel loss is high (~0.84).
Besides, I also notice that the model sometimes fails to synthesize reduplicated words (e.g. 嗯嗯、叽叽喳喳）and tone-5 words（轻读 e.g. 哎呀）

If you don't mind, could you tell me how did you process the DataBaker datasets, what does your input text look like, and how to solve the problems mentioned above? Thank you very much for your help

License

Hi!
Can you please add a license?

Thank you

Ondra

PS: MIT or Apache license would be much appreciated 😇

Metal noise in sample

Thanks for your great work. I have already heard some samples at link. There are some metal noises at high pitch and some words are mispronounced. Does longer training overcome this problem? or is it caused by vocoder? I wonder how quality audio of efficient TTS + another vocoder(Melgan)?

how to use your model?

hello! I am particularly interested in your implementation. Can you provide a tutorial for use?

Chinese supported?

Does it supported Chinese?

Question about optimizer

Hi. Thank you for your implementation, and I have a question about the optimizer.
It seems that you use Adam optimizer with lr=1e-3 and amsgrad=True.

Why you choose the options especially the learning rate,
even though the original paper says that they train their model with lr=1e-4.

Did it fail to train your model with lr=1e-3 or amsgrad=False?

CUDA out of memory

pseudo code in the paper is a little different from the equations of the paper.

I find the pseudo code in the paper is a little different from the equations of the paper(eg: eq.14 and eq.17). Wondering how the differences affect the results.

Can I finetune one people's voice (my own voice) with this model ?And how ?

Hi, can you tell me how to finetune one people's voice ,so I can get a specific people's speech？can you tell me the steps ? And how much( or how long ) the duration of this one-people's funetune data should be ?

Reproducing good results (as claimed in paper)

Somewhat related to issue #2 which was closed, but I think it's safe to say that the latest samples posted do not seem to be close to converging towards the strong results that were claimed by the paper's authors, and it would be good to have an issue tracking speech quality.

It's somewhat puzzling given that the implementation seems to be on point except for the missing hyperparameter sigma values that you mentioned. I'm doing my own experiments playing with hyperparameters but haven't been able so far to achieve something too competitive. If you have any ideas of what could be tried, let me know.

liusongxiang / efficient_tts Goto Github PK

efficient_tts's People

Contributors

Stargazers

Watchers

Forkers

efficient_tts's Issues

Recommend Projects

Recommend Topics

Recommend Org