Hi,
I am trying to integrate GOLF vocoder in the NNSVS toolkit.
I created a script for inference based on the notebook. And wrote a class that takes in a spectrogram and returns the scaled (between 0 and 1) and takes the log to have the same format for the model. But unfortunately, it seems I am missing something on the format input that prevents me from using GOLF on acoustic features from another model.
I am using WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications already integrated in the toolkit for the acoustic model.
I am working on Ubuntu in Windows sub-system for linux.
Here is the scripts I am using for the inference .
If you have any clue on what I may be missing, it would be greatly appreciated.
Here is the full error, I am getting :
Traceback (most recent call last):
File "/home/linkdow/svs/recipes/opencpop/dev-48k-world/../../..//nnsvs/bin/anasyn_golf.py", line 228, in my_app
wav = anasyn(
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/linkdow/svs/recipes/opencpop/dev-48k-world/../../..//nnsvs/bin/anasyn_golf.py", line 87, in anasyn
wav = generate_audio(spectrogram,vocoder_config,sample_rate)
File "/home/linkdow/svs/nnsvs/golf/spec_infer_fun.py", line 70, in generate_audio
) = model.encoder(feats)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in call_impl
return forward_call(*args, **kwargs)
File "/home/linkdow/svs/nnsvs/golf/models/enc.py", line 158, in forward
) = super().forward(h)
File "/home/linkdow/svs/nnsvs/golf/models/enc.py", line 69, in forward
f0_logits, * = self.backbone(h).split(self.split_size, dim=-1)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/linkdow/svs/nnsvs/golf/models/mel.py", line 34, in forward
x = self.stack(mels.transpose(1, 2)).transpose(1, 2)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 310, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 306, in _conv_forward
return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [96, 80, 3], expected input[1, 818, 1025] to have 80 channels, but got 818 channels instead