Giter Site home page Giter Site logo

Comments (5)

lysektomas avatar lysektomas commented on August 15, 2024

I have tried default melgan vocoder (from https://github.com/seungwonpark/melgan). I have synthesized hello world 4x times using my model.

Output from melgan vocoder is like reversed https://soundcloud.com/tom-lysek-832824742/hello-world-melgan/s-liaMpH3Ijx5

Output from grifim limm vocoder is great.

This is spectrogram from mels
spectrogram

Transformation from mels to wav was performed using this code

`
m = (torch.tensor(mels).unsqueeze(0))
with torch.no_grad():
if len(m.shape) == 2:
m = m.unsqueeze(0)
wav = model.inference(m).cpu().numpy()

ipd.Audio(wav, rate=hp.sample_rate)
`

and mels is m from synthesize funkction in notebook_utils/synthesize.py

Do you have any thoughts?
Tom

from forwardtacotron.

cschaefer26 avatar cschaefer26 commented on August 15, 2024

Hi, the synthesize.py from notebook_utils already outputs mels in the correct format for melgan, maybe you are expanding too much? In the colab notebook its using the default melgan model for LJSpeech. PWGAN should actually work similarly I believe (it has the same preprocessing as melgan). Also, if you call gen_forward.py use the melgan flag to produce .mel files that are in the correct format for melgan already (see README).

Thats the synthesize.py mel processing for melgan:

m = torch.tensor(m).unsqueeze(0).cuda()

from forwardtacotron.

lysektomas avatar lysektomas commented on August 15, 2024

There are same result when i use gen_forward.py to produce .mel file and then convert it to wav with inference.py from melgan as if I was using jupyter.

I have tried to use your function synthesize with this setup

obrazek

And result is same.

If i change this bad vocoder for default vocoder (from your example) result is this https://soundcloud.com/tom-lysek-832824742/hello-world-default-melgan/s-zpnjVYgxrAu

So i assume that this vocoder was learned with some error.

I was using this hyperparameters (default hyperparameters):

obrazek

left is hparams from ForwardTacotron right is default.yaml from melgan repository.

Did I miss something? Is there some misconfiguration? Only difference is filter_length vs. fft_bins, but I don't think that is the difference :/

I have used sox to see if there is problem with input files but they have same parameters:

obrazek

1.wav is my own dataset, LJXXX.wav is LJ speech.

from forwardtacotron.

cschaefer26 avatar cschaefer26 commented on August 15, 2024

Hmm I don't see a mismatch. Does the pretrained ForwardTacotron model work for you with the standard melgan?

from forwardtacotron.

lysektomas avatar lysektomas commented on August 15, 2024

i have used vocgan and everything was fine. Maybe model was learned badly.

from forwardtacotron.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.