Giter Site home page Giter Site logo

francesclluis / source-separation-wavenet Goto Github PK

View Code? Open in Web Editor NEW
220.0 15.0 34.0 84.16 MB

A neural network for end-to-end music source separation

License: MIT License

Python 100.00%
machine-learning deep-learning neural-networks source-separation wavenet audio-processing karaoke vocal-remover

source-separation-wavenet's Introduction

A Wavenet for Music Source Separation

A neural network for end-to-end music source separation, as described in End-to-end music source separation: is it possible in the waveform domain?

Listen to separated samples here

What is a Wavenet for Music Source Separation?

The Wavenet for Music Source Separation is a fully convolutional neural network that directly operates on the raw audio waveform.

It is an adaptation of Wavenet that turns the original causal model (that is generative and slow), into a non-causal model (that is discriminative and parallelizable). This idea was originally proposed by Rethage et al. for speech denoising and now it is adapted for monaural music source separation. Their code is reused.

The main difference between the original Wavenet and the non-causal adaptation used, is that some samples from the future can be used to predict the present one. As a result of removing the autoregressive causal nature of the original Wavenet, this fully convolutional model is now able to predict a target field instead of one sample at a time โ€“ due to this parallelization, it is possible to run the model in real-time on a GPU.

See the diagram below for a summary of the network architecture.

Installation

  1. git clone https://github.com/francesclluis/source-separation-wavenet.git
  2. Install conda
  3. conda env create -f environment.yml
  4. source activate sswavenet

Currently the project requires Keras 2.1 and Theano 1.0.1, the large dilations present in the architecture are not supported by the current version of Tensorflow

Usage

A pre-trained multi-instrument model (best-performing model described in the paper) can be found in sessions/multi-instrument/checkpoints and is ready to be used out-of-the-box. The parameterization of this model is specified in sessions/multi-instrument/config.json

A pre-trained singing-voice model (best-performing model described in the paper) can be found in sessions/singing-voice/checkpoints and is ready to be used out-of-the-box. The parameterization of this model is specified in sessions/singing-voice/config.json

Download the dataset as described below

Source Separation:

Example (multi-instrument): THEANO_FLAGS=device=cuda python main.py --mode inference --config sessions/multi-instrument/config.json --mixture_input_path audio/

Example (singing-voice): THEANO_FLAGS=device=cuda python main.py --mode inference --config sessions/singing-voice/config.json --mixture_input_path audio/

Speedup

To achieve faster source separation, one can increase the target-field length by use of the optional --target_field_length argument. This defines the amount of samples that are separated in a single forward propagation, saving redundant calculations. In the following example, it is increased 10x that of when the model was trained, the batch_size is reduced to 4.

Faster Example: THEANO_FLAGS=device=cuda python main.py --mode inference --target_field_length 16001 --batch_size 4 --config sessions/multi-instrument/config.json --mixture_input_path audio/

Training:

Example (multi-instrument): THEANO_FLAGS=device=cuda python main.py --mode training --target multi-instrument --config config_multi_instrument.json

Example (singing-voice): THEANO_FLAGS=device=cuda python main.py --mode training --target singing-voice --config config_singing_voice.json

Configuration

A detailed description of all configurable parameters can be found in config.md

Optional command-line arguments:

Argument Valid Inputs Default Description
mode [training, inference] training
target [multi-instrument, singing-voice] multi-instrument Target of the model to train
config string config.json Path to JSON-formatted config file
print_model_summary bool False Prints verbose summary of the model
load_checkpoint string None Path to hdf5 file containing a snapshot of model weights

Additional arguments during source separation:

Argument Valid Inputs Default Description
one_shot bool False Separates each audio file in a single forward propagation
target_field_length int as defined in config.json Overrides parameter in config.json for separating with different target-field lengths than used in training
batch_size int as defined in config.json # of samples per batch

Dataset

The MUSDB18 is used for training the model. It is provided by the Community-Based Signal Separation Evaluation Campaign (SISEC).

  1. Download here
  2. Decode dataset to WAV format as explained here
  3. Extract to data/MUSDB

source-separation-wavenet's People

Contributors

francesclluis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

source-separation-wavenet's Issues

musDB interface change

The correct line to retrieve the data from the dataset after extraction is

mus = musdb.DB(root=self.path, is_wav=True)

instead of

mus = musdb.DB(root_dir=self.path, is_wav=True)

theano not using GPU

Hi.

I was wondering how I can use GPU for training?

I checked with tutorial theano python code and confirmed it uses GPU.

However, when I run wavenet, it does not use GPU and instead uses CPU.

Could you share how I can use GPU?

Thanks

Running this in real time on an audio stream

Hey!
First of all I want to thank you for publishing this awesome work.

In the readme you mention, that this is capable of doing real time source separation when run on a GPU. I'm really really interested in that use case.
However I can only find ways to give it input wav files and no way to tell it to use i.e. an audio device from the PC.

I'm assuming that this functionality hasn't been implemented yet. Would it be trivial to do so?

Judging by https://github.com/francesclluis/source-separation-wavenet/blob/master/separate.py#L82 there is some kind of minimal amount of samples one would have to give the network.
Thus would it be feasible to say pass it every new frame of sound data with the last few frames also attached to make it work on an audio stream?

Or is there a way to maybe feed to the network frame by frame? that would be awesome because then one wouldn't have to deal with stitching the different results back together which would probably result in some quirkiness.

I would be happy to contribute this feature, but want to make sure it's possible first :D

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.