Giter Site home page Giter Site logo

cdiffuse's Introduction

CDiffuSE

Apache License 2.0

CDiffuSE leverages recent advances in diffusion probabilistic models, and proposes a novel speech enhancement algorithm that incorporates characteristics of the observed noisy speech signal into the diffusion and reverse processes. More specifically, we propose a generalized formulation of the diffusion probabilistic model named conditional diffusion probabilistic model that, in its reverse process, can adapt to non-Gaussian real noises in the estimated speech signal. Conditional Diffusion Probabilistic Model for Speech Enhancement.

Audio samples

16 kHz audio samples

Pretrained models

CDiffuSE Pretrained models are released. The large model is the same as described in the paper. The base model was trained without the mel-spectrum pre-training phase, which is a little better than the ones on the paper.

Training

Before you start training, you'll need to prepare a training dataset. The default dataset is VOICEBANK-DEMAND dataset. You can download them from VOICEBANK-DEMAND and resample it to 16k Hz. By default, this implementation assumes a sample rate of 16 kHz. If you need to change this value, edit params.py.

You need to set the output path and data path under path.sh

output_path=[path_to_output_directory]
voicebank=[path_to_voicebank_directory]

Usage: Train SE model

./train.sh [stage] [model_directory]

Train SE model based on a pre-trained model.

./train.sh [stage] [model_directory] [pretrained_model_directory]/weights-[ckpt].pt

Note that the pre-training step with clean Mel-Spectrum conditioners is no longer needed in CDiffuSE. A randomly initialized CDiffuSE performs as well as one initialized from pre-trained parameters.

Multi-GPU training

By default, this implementation uses as many GPUs in parallel as returned by torch.cuda.device_count(). You can specify which GPUs to use by setting the CUDA_DEVICES_AVAILABLE environment variable before running the training module.

Validatoin and Inference API

Usage:

./valid.sh [stage] [model name] [checkpoint id] 
./inference.sh [stage] [model name] [checkpoint id]

The code of CDiffuSE is developed based on the code of Diffwave

References

cdiffuse's People

Contributors

neillu23 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.