GlOttal-flow LPC Filter (GOLF)

The source code of the paper Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables, accepted at ISMIR 2023.

Training

Install python requirements.

pip install requirements.txt

Download the MPop600 dataset. The dataset is conducted in a download-by-request manner. Please contact their third author Yi-Jhe Lee to get the raw files.
Resample the data to 24 kHz.

python scripts/resample_dir.py **/f1/ output_dir --sr 24000

Generate F0 labels (stored as .pv files).

python scripts/wav2f0.py output_dir

Train with the configurations config.yaml we used in the paper (available under ckpts/).

python main.py fit --config config.yaml --dataset.init_args.wav_dir output_dir

Evaluation

Objective Evaluation

python main.py test --config config.yaml --ckpt_path checkpoint.ckpt --data.init_args.duration 6 --data.init_args.overlap 0 --data.init_args.batch_size 16

Real-Time Factor

python test_rtf.py config.yaml checkpoint.ckpt test.wav

Notebooks

MOS: compute MOS score given the rating file from GO Listen.
time-domain l2 experiment: the notebook used to conduct the time-domain L2 loss ablation study in the paper.

Pre-trained Checkpoints

Female(f1)

Male(m1)

Trying to load spectrogram for inference

Hi,
I am trying to integrate GOLF vocoder in the NNSVS toolkit.
I created a script for inference based on the notebook. And wrote a class that takes in a spectrogram and returns the scaled (between 0 and 1) and takes the log to have the same format for the model. But unfortunately, it seems I am missing something on the format input that prevents me from using GOLF on acoustic features from another model.
I am using WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications already integrated in the toolkit for the acoustic model.
I am working on Ubuntu in Windows sub-system for linux.
Here is the scripts I am using for the inference .
If you have any clue on what I may be missing, it would be greatly appreciated.

Here is the full error, I am getting :
Traceback (most recent call last):
File "/home/linkdow/svs/recipes/opencpop/dev-48k-world/../../..//nnsvs/bin/anasyn_golf.py", line 228, in my_app
wav = anasyn(
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/linkdow/svs/recipes/opencpop/dev-48k-world/../../..//nnsvs/bin/anasyn_golf.py", line 87, in anasyn
wav = generate_audio(spectrogram,vocoder_config,sample_rate)
File "/home/linkdow/svs/nnsvs/golf/spec_infer_fun.py", line 70, in generate_audio
) = model.encoder(feats)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in call_impl
return forward_call(*args, **kwargs)
File "/home/linkdow/svs/nnsvs/golf/models/enc.py", line 158, in forward
) = super().forward(h)
File "/home/linkdow/svs/nnsvs/golf/models/enc.py", line 69, in forward
f0_logits, * = self.backbone(h).split(self.split_size, dim=-1)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/linkdow/svs/nnsvs/golf/models/mel.py", line 34, in forward
x = self.stack(mels.transpose(1, 2)).transpose(1, 2)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 310, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 306, in _conv_forward
return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [96, 80, 3], expected input[1, 818, 1025] to have 80 channels, but got 818 channels instead

yoyololicon / golf Goto Github PK

golf's Introduction

GlOttal-flow LPC Filter (GOLF)

Training

Evaluation

Objective Evaluation

Real-Time Factor

Notebooks

Pre-trained Checkpoints

Female(f1)

Male(m1)

golf's People

Contributors

Stargazers

Watchers

Forkers

golf's Issues

Recommend Projects

Recommend Topics

Recommend Org