twidddj / tf-wavenet_vocoder Goto Github PK

View Code? Open in Web Editor NEW

56.0 12.0 16.0 2.73 MB

Wavenet and its applications with Tensorflow

License: MIT License

Jupyter Notebook 92.46% Python 7.54%

wavenet wavenet-vocoder speech-synthesis tensorflow vocoder

tf-wavenet_vocoder's People

Contributors

Stargazers

Watchers

Forkers

maozhiqiang ondal90 capnbanana shubhampachori12110095 toannhu neverjoe dyelax syang1993 entn-at haifengzeng hccho2 codeaudit exp-time-series-tools

tf-wavenet_vocoder's Issues

Pre-trained Weights

Hi There!

Can you release the pre-trained weights used to generate the demo samples?

Thanks!

I am not sure what to put for log path

Trying to restore saved checkpoints from /Users//desktop/log_dir_path ... No checkpoint found.

Integration Tacotron

So far, I couldn't find the model which attention works in "reduction factor" = 1. If we use the factor > 1, the prediction would seem like below image. It would be a bad news to wavenet performance.

Here, the original Mel-spectrum is

You can find some discussion for this issue on @Rayhane-mamah's repo and @keithito's repo also.

Hello, thanks to your great work!
I have seen your mixture code, how does the loss change in the training process?
I trained it in my vocoder project, but it can sample good $x$ in the sample code?
what do you changed? I have been in trobule with it for weeks.

NotFoundError (see above for traceback): Key vocoder/wavenet/dilated_stack/layer0/gc_filter not found in checkpoint

Title

Parallel Wavenet-Vocoder

Planed TODO

KL + Power - Single speaker

Properties not specified in the paper

Sampling number for the loss (We may have some limitation for GPU)
Number of mixture for IAF layers
Averaging method for Power loss
- ex) Just reduce_mean on time axis or using moving average or ..
.. (Please, let us know those)

Another implementations

https://github.com/zhf459/P_wavenet_vocoder (used r9y9's wavenet in pytorch)

Synthesis results of vocoder

Single speaker

We stopped the training at 680K step.
You can find some results at https://twidddj.github.io/docs/vocoder.

We tested the vocoder on the set of two group: 1) samples from the datasets 2) samples generated from Tacotron.

This is because my stupid mistake (So sorry, I did not separate the data for test).

However, I believe the result shows the performance to some extent. See first section in the page.

In other section, you can guess the performance of the vocoder.

It can generate enough as much as the target using only mel-spectrum of target.

Moreover, some part of the result has better quality than target (I hope you think so too). Note that the Tacotron was trained on sample rate = 24K audio data on the other hand our vocoder was trained sample rate = 22K. This means that the vocoder has never seen the frequencies over 11K. Therefore, If you synchronize the sample rate, your results would be better than the results we reported.

By the way, we believe the pre-trained model can be used as a teacher model for parallel wavenet.

Parallel Wavenet - Single speaker

Not yet tested.

Multi speaker