Hi, thanks for your work, I've got a problem during training when I set the batch_size

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

About the inference speed about fftnet HOT 9 CLOSED

azraelkuan commented on August 23, 2024

About the inference speed

from fftnet.

Comments (9)

Maxxiey commented on August 23, 2024

I went through the code again and I figured out the reason. Problem solved, closing this issue now.

from fftnet.

Maxxiey commented on August 23, 2024

I have trained this model for over 100k iters, it is surprisingly fast, but when I try to synthesize a wav file, the inference is not as fast as I expect. For a 17s wav, it takes ~20m to finish synthesizing, is there anyone get a better preformance?

from fftnet.

azraelkuan commented on August 23, 2024

sorry for late reply, i also found that the buffer cannot accelerate the generation, may be we need to write a cuda op? i also test other repos, the speed is very slow.

from fftnet.

Maxxiey commented on August 23, 2024

@azraelkuan Thank you very much.

I tried some other repos too, same low speed, guess we miss the trick to implement fast generation... oh, by the way, could you please tell me why using lws in preprocessing the wavform in your repo, what's the difference between lws.stft and librosa.stft, I tried to train your model on the mels extracted by using librosa, but I did not get good results (totally nothing but noise) and I suspect that it has something to do with the data preprocess.

Thanks~
max

from fftnet.

azraelkuan commented on August 23, 2024

there is no much difference between lws and librosa stft, lws is a fast way to do stft, may be u should check the frame length and hop length?

from fftnet.

Maxxiey commented on August 23, 2024

Okay, I will check the hyperparams, closing this issue now, thanks for the quick reply.

from fftnet.

Maxxiey commented on August 23, 2024

@azraelkuan Hello again, I am a little confused about the following codes in cmu_arctic.py

if hparams.use_injected_noise:
noise = np.random.normal(0.0, 1.0 / hparams.quantize_channels, wav.shape)
wav += noise
...
if hparams.rescaling:
wav = wav / np.abs(wav).max() * hparams.rescaling_max

According to my humble understanding, the first part injects noise into raw wav and the second part is actually doing a normalization, which makes the "value" of the wav fall in [-1,1].

However, if I am getting it right, the np.abs(wav).max() varies, since it is very likely that two different clips of wav have different max value. So, if we add noise first, then norm, the distribution of noise may change from N(0, 1/256) to something else.

I think the right order is to norm the wav first and then apply noise injection, preventing the distribution of noise from being changed.

What is your opinion?

Thanks in advance~
max

from fftnet.

azraelkuan commented on August 23, 2024

Yes, i think u are right. Thanks.

from fftnet.

Maxxiey commented on August 23, 2024

Hey, quick update here, I tried to change the order of processing, but it is really embarrassing to find that the loss went to nan immediately, since I am focusing on something else right now, I do not have time to figure out why, if you have any idea, please let me know~ thanks

from fftnet.

About the inference speed about fftnet HOT 9 CLOSED

Comments (9)

Related Issues (5)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent