Giter Site home page Giter Site logo

yoyololicon / golf Goto Github PK

View Code? Open in Web Editor NEW
81.0 3.0 5.0 82.8 MB

A DDSP-based neural vocoder.

Home Page: https://yoyololicon.github.io/golf-demo/

License: MIT License

Python 2.49% Jupyter Notebook 97.51%
ddsp glottal-flow-model iir-filters linear-predictive-coding pytorch-implementation vocoder

golf's Introduction

GlOttal-flow LPC Filter (GOLF)

arXiv

The source code of the paper Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables, accepted at ISMIR 2023.

Training

  1. Install python requirements.
pip install requirements.txt
  1. Download the MPop600 dataset. The dataset is conducted in a download-by-request manner. Please contact their third author Yi-Jhe Lee to get the raw files.

  2. Resample the data to 24 kHz.

python scripts/resample_dir.py **/f1/ output_dir --sr 24000
  1. Generate F0 labels (stored as .pv files).
python scripts/wav2f0.py output_dir
  1. Train with the configurations config.yaml we used in the paper (available under ckpts/).
python main.py fit --config config.yaml --dataset.init_args.wav_dir output_dir

Evaluation

Objective Evaluation

python main.py test --config config.yaml --ckpt_path checkpoint.ckpt --data.init_args.duration 6 --data.init_args.overlap 0 --data.init_args.batch_size 16

Real-Time Factor

python test_rtf.py config.yaml checkpoint.ckpt test.wav

Notebooks

  • MOS: compute MOS score given the rating file from GO Listen.
  • time-domain l2 experiment: the notebook used to conduct the time-domain L2 loss ablation study in the paper.

Pre-trained Checkpoints

Female(f1)

Male(m1)

golf's People

Contributors

yoyololicon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

golf's Issues

inference script

Hi,

Thank you for the work. Is there any inference script to generate wav?
Thanks

Trying to load spectrogram for inference

Hi,
I am trying to integrate GOLF vocoder in the NNSVS toolkit.
I created a script for inference based on the notebook. And wrote a class that takes in a spectrogram and returns the scaled (between 0 and 1) and takes the log to have the same format for the model. But unfortunately, it seems I am missing something on the format input that prevents me from using GOLF on acoustic features from another model.
I am using WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications already integrated in the toolkit for the acoustic model.
I am working on Ubuntu in Windows sub-system for linux.
Here is the scripts I am using for the inference .
If you have any clue on what I may be missing, it would be greatly appreciated.

Here is the full error, I am getting :
Traceback (most recent call last):
File "/home/linkdow/svs/recipes/opencpop/dev-48k-world/../../..//nnsvs/bin/anasyn_golf.py", line 228, in my_app
wav = anasyn(
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/linkdow/svs/recipes/opencpop/dev-48k-world/../../..//nnsvs/bin/anasyn_golf.py", line 87, in anasyn
wav = generate_audio(spectrogram,vocoder_config,sample_rate)
File "/home/linkdow/svs/nnsvs/golf/spec_infer_fun.py", line 70, in generate_audio
) = model.encoder(feats)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in call_impl
return forward_call(*args, **kwargs)
File "/home/linkdow/svs/nnsvs/golf/models/enc.py", line 158, in forward
) = super().forward(h)
File "/home/linkdow/svs/nnsvs/golf/models/enc.py", line 69, in forward
f0_logits, *
= self.backbone(h).split(self.split_size, dim=-1)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/linkdow/svs/nnsvs/golf/models/mel.py", line 34, in forward
x = self.stack(mels.transpose(1, 2)).transpose(1, 2)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 310, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/linkdow/miniconda3/envs/nnsvs/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 306, in _conv_forward
return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [96, 80, 3], expected input[1, 818, 1025] to have 80 channels, but got 818 channels instead

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.