Giter Site home page Giter Site logo

yatingmusic / ddsp-singing-vocoders Goto Github PK

View Code? Open in Web Editor NEW
247.0 9.0 35.0 317.52 MB

Official implementation of SawSing (ISMIR'22)

License: GNU Affero General Public License v3.0

Python 94.17% Jupyter Notebook 5.83%
ismir singing-synthesis singing-voice vocoders

ddsp-singing-vocoders's Introduction

DDSP Singing Vocoders

Authors: Da-Yi Wu*, Wen-Yi Hsiao*, Fu-Rong Yang*, Oscar Friedman, Warren Jackson, Scott Bruzenak, Yi-Wen Liu, Yi-Hsuan Yang

*equal contribution

Paper | Demo

Official PyTorch Implementation of ISMIR2022 paper "DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation".

In this repository:

  • We propose a novel singing vocoders based on subtractive synthesizer: SawSing
  • We present a collection of different ddsp singing vocoders
  • We demonstrate that ddsp singing vocoders have relatively small model size but can generate satisfying results with limited resources (1 GPU, 3-hour training data). We also report the result of an even more stringent case training the vocoders with only 3-min training recordings for only 3-hour training time.

A. Installation

pip install -r requirements.txt 

B. Dataset

Please refer to dataset.md for more details.

C. Training

Train vocoders from scratch.

  1. Modify the configuration file ..config/<model_name>.yaml
  2. Run the following command:
# SawSing as an example
python main.py --config ./configs/sawsinsub.yaml \
               --stage  training \
               --model SawSinSub
  1. Change --model argument to try different vocoders. Currently, we have 5 models: SawSinSub (Sawsing), Sins (DDSP-Add), DWS (DWTS), Full, SawSub. For more details, please refer to our documentation - DDSP Vocoders.

Our training resources: single Nvidia RTX 3090 Ti GPU

D. Validation

Run validation: compute loss and real-time factor (RTF).

  1. Modify the configuration file ..config/<model_name>.yaml
  2. Run the following command:
# SawSing as an example
python main.py --config ./configs/sawsinsub.yaml  \
              --stage validation \
              --model SawSinSub \
              --model_ckpt ./exp/f1-full/sawsinsub-256/ckpts/vocoder_27740_70.0_params.pt \
              --output_dir ./test_gen

E. Inference

Synthesize audio file from existed mel-spectrograms. The code and specfication for extracting mel-spectrograms can be found in preprocess.py.

# SawSing as an example
python main.py --config ./configs/sawsinsub.yaml  \
              --stage inference \
              --model SawSinSub \
              --model_ckpt ./exp/f1-full/sawsinsub-256/ckpts/vocoder_27740_70.0_params.pt \
              --input_dir  ./path/to/mel
              --output_dir ./test_gen

F. Post-Processing

In Sawsing, we found there are buzzing artifacts in the harmonic part singals, so we develop a post-processing codes to remove them. The method is simple yet effective --- applying a voiced/unvoiced mask. For more details, please refer to here.

G. More Information

H. Citation

@article{sawsing,
  title={DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation},
  author={Da-Yi Wu, Wen-Yi Hsiao, Fu-Rong Yang, Oscar Friedman, Warren Jackson, Scott Bruzenak, Yi-Wen Liu, Yi-Hsuan Yang},
  journal = {Proc. International Society for Music Information Retrieval},
  year    = {2022},
}

ddsp-singing-vocoders's People

Contributors

oscarfree avatar wayne391 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ddsp-singing-vocoders's Issues

Why my 16000 speech instance get poor performance?

Thanks for the good job.
We are finding a vocoder for TTS, we tried the SawSingSub, our audio sample rate is 16000, to cooperate with our acounstic model, I had changed the params in preprocess.py, set hop_length=200, win_length=800, n_mel_channels=80, I also changed the , sawsingsub.yaml, accordingly , set block_size = 200, left other settings unchanged.

I have trained the model for nearly 1 million steps, but the quality of generated waveform is far worse then that of HiFi-GAN with the same steps, Is there any parameter adjustments I had missed?

The Blured Spectrogram of SawSingSub generated speech
ad39608a8d826cd3684795591278932

make the project runable in older version of pytorch

thanks for the good job.
I am trying to run an example training process, but it can not run on my pytorch version 1.7.1. some functions are not there.

eg. ddsp/core.py:264 torch.fft.irfft()

how to adjust the project to run in my old torch?  it seems that the torch.irfft is not the answer.

Hard code frenquency

HI,
I found that in the vocoder.py:134, there are two parapmeters "hz_min" and "hz_max", are they the same with the preprocess.py: line 62, 63 mel_fmin, mel_fmax? why hard coded?

How to modify mel parameters

How to change mel parameters including num_mel_bins, hop_size, win_size, fmin, fmax to fit my own acoustic model?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.