Hi, I have learned model in our language using your repository. Melgan as vocoder

I have tried default melgan vocoder (from <a href="https://github.com/seungwonpark/mel

Vocoder ParallelWaveGAN about forwardtacotron HOT 5 CLOSED

as-ideas commented on August 15, 2024

Vocoder ParallelWaveGAN

from forwardtacotron.

Comments (5)

lysektomas commented on August 15, 2024

I have tried default melgan vocoder (from https://github.com/seungwonpark/melgan). I have synthesized hello world 4x times using my model.

Output from melgan vocoder is like reversed https://soundcloud.com/tom-lysek-832824742/hello-world-melgan/s-liaMpH3Ijx5

Output from grifim limm vocoder is great.

This is spectrogram from mels

Transformation from mels to wav was performed using this code

`
m = (torch.tensor(mels).unsqueeze(0))
with torch.no_grad():
if len(m.shape) == 2:
m = m.unsqueeze(0)
wav = model.inference(m).cpu().numpy()

ipd.Audio(wav, rate=hp.sample_rate)
`

and mels is m from synthesize funkction in notebook_utils/synthesize.py

Do you have any thoughts?
Tom

from forwardtacotron.

cschaefer26 commented on August 15, 2024

Hi, the synthesize.py from notebook_utils already outputs mels in the correct format for melgan, maybe you are expanding too much? In the colab notebook its using the default melgan model for LJSpeech. PWGAN should actually work similarly I believe (it has the same preprocessing as melgan). Also, if you call gen_forward.py use the melgan flag to produce .mel files that are in the correct format for melgan already (see README).

Thats the synthesize.py mel processing for melgan:

m = torch.tensor(m).unsqueeze(0).cuda()

from forwardtacotron.

lysektomas commented on August 15, 2024

There are same result when i use gen_forward.py to produce .mel file and then convert it to wav with inference.py from melgan as if I was using jupyter.

I have tried to use your function synthesize with this setup

And result is same.

If i change this bad vocoder for default vocoder (from your example) result is this https://soundcloud.com/tom-lysek-832824742/hello-world-default-melgan/s-zpnjVYgxrAu

So i assume that this vocoder was learned with some error.

I was using this hyperparameters (default hyperparameters):

left is hparams from ForwardTacotron right is default.yaml from melgan repository.

Did I miss something? Is there some misconfiguration? Only difference is filter_length vs. fft_bins, but I don't think that is the difference :/

I have used sox to see if there is problem with input files but they have same parameters:

1.wav is my own dataset, LJXXX.wav is LJ speech.

from forwardtacotron.

cschaefer26 commented on August 15, 2024

Hmm I don't see a mismatch. Does the pretrained ForwardTacotron model work for you with the standard melgan?

from forwardtacotron.

lysektomas commented on August 15, 2024

i have used vocgan and everything was fine. Maybe model was learned badly.

from forwardtacotron.

Vocoder ParallelWaveGAN about forwardtacotron HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent