Official repository of STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech, INTERSPEECH 2021
Is there a way to fine tune a model or training two languages side by side such that a very low resource language can be trained with the voices of a high resource language?
Hi, I noticed some undefined names around the code:
synthesize.py:495:67: F821 undefined name 'reference' noise_mixer_refs.py:56:42: F821 undefined name 'eps' noise_mixer_refs.py:59:40: F821 undefined name 'eps'
Hi, I want to ask the trimming operation whether is very important for training your model? Furthermore, can you share the scripts to trimming VCTK dataset?
1.in paper, why output of encoder (text_encoding) upsample and downsample?
2. what is the meaning of text_encoding_neck+pitch_encoding、text_encoding_neck+energy_encoding?
why not cat?