richardassar / samplernn_torch Goto Github PK

View Code? Open in Web Editor NEW

156.0 8.0 25.0 56 KB

Torch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

Lua 98.21% Shell 1.79%

torch audio generative deep-learning neural-network

samplernn_torch's Introduction

SampleRNN_torch

A Torch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model.

Samples

Listen to a selection of generated output at the following links:

piano
mozart

Feel free to submit links to any interesting output you generate or dataset creation scripts as a pull request.

Dependencies

The following packages are required to run SampleRNN_torch:

nn
cunn
cudnn
rnn
optim
audio
xlua
gnuplot

NOTE: Update nn and cudnn even if they were already installed as fixes have been submitted which affect this project.

Datasets

To retrieve and prepare the piano dataset, as used in the reference implementation, run:

cd datasets/piano/
./create_dataset.sh

Other dataset preparation scripts may be found under datasets/.

Custom datasets may be created by using scripts/generate_dataset.lua to slice multiple audio files into segments for training, audio must be placed in datasets/[dataset]/data/.

Training

To start a training session run th train.lua -dataset piano. To view a description of all accepted arguments run th train.lua -help.

To view the progress of training run th generate_plots, the loss and gradient norm curve will be saved in sessions/[session]/plots/.

Sampling

By default samples are generated at the end of every training epoch but they can also be generated separately using th train.lua -generate_samples with the session parameter to specify the model.

Multiple samples are generated in batch mode for efficiency, however generating a single audio sample is faster with th fast_sample.lua. See -help for a description of the arguments.

Models

A pretrained model of the piano dataset is available here. Download and copy it into your sessions/ directory and then extract it in place.

More models will be uploaded soon.

Theano version

This code is based on the reference implementation in Theano.

https://github.com/soroushmehr/sampleRNN_ICLR2017

samplernn_torch's People

Contributors

Stargazers

Watchers

samplernn_torch's Issues

lots of noise and dc-offset output

i can see here https://github.com/richardassar/SampleRNN_torch/blob/master/scripts/generate_dataset.lua#L39 that you're removing the mean from the input audio, and then normalizing to the absolute max. this seems correct. but then in the output i often see results with dc offset.

frequently the non-dc-offset region comes first, and is followed by a dc offset region, sometimes with a burst of noise in between the two. this is using the pretrained model.

is this to be expected, or is there something i'm doing wrong?

edit: just realized that norm_type is one of the options during training, too. i switched in to abs-max and will try training for a few days. it would be good to know what options the pretrained model was trained with for comparison.

update: i tried training for 5 days on a single K80 with the following parameters:

th train.lua \
-dataset piano\
-multigpu \
-cudnn_rnn \
-rnn_type LSTM \
-norm_type abs-max \
-dropout \
-max_epoch 10

the final ouput samples i listened to were from 181896 iterations. the samples seemed to oscillate between being quiet white noise/silence, and sounding a bit like someone was throwing hammers at a piano in the best case:

i hope this is useful to somebody else -- at least to know what settings don't work :)

Mozart dataset down?

Hey,

the Mozart dataset file at
https://archive.org/compress/MozartCompleteWorksBrilliant170CD/formats=OGG%20VORBIS&file=/MozartCompleteWorksBrilliant170CD.zip
seems to be not available anymore. Can you rehost it somewhere else or find a working link for this please? I am trying to use this for my own experiments.

Pretrained model link

Hi, I know this is a quite old repo, but is there any option to repair link to pretrained model? Thanks for any help

The condition to stop

The parameter 'max_epoch' was set to 'math.huge', as I did not change it before training, would it be in training all the time? Are there any other terminating conditions? If not, how could I stop it?

Issue generating dataset from many short audio files

First of all, excellent work on the port!

I noticed a problem today when I attempted to generate a new dataset. The given data was 530 short .ogg files (each one around 1-3 seconds in length). When attempting to process this data with generate_dataset.lua, the resulting data folder would be empty, although no error would be shown in the terminal to indicate that something had gone awry.

I combined my data into one long audio file instead, and with this I could generate my dataset with no issues at all. My hunch is that if the files that make up the dataset are all shorter in length than the -seg_len value, this causes problems.

Reproducing the issue should be simple; create a few short audio files and attempt to generate a dataset using them.

Impressive port!

Impressive port, I'm new to either of these frameworks (theano or Torch) but I can tell significant work went into the port, thanks for sharing it!

Curious, what was your motivation? For example, do you expect better perf, or maybe easier multi-GPU support, etc?

Ive thought of porting to tensorflow, but low prio for me, I guess it would be a good learning project.

running fast_sample

thanks for this port. i've tried the original and wanted to compare in terms of performance. i successfully got the training running (~20 hours for 33 minibatches on a K80 -- which i think is one epoch? i couldn't find the epoch indication in the output and haven't dug through the code yet).

it looks like th train.lua -generate_samples will take ~1 hour to generate 5x 20 second samples. that's or about 36 seconds to generate 1 second of audio. does that sound right?

i want to get fast_sample running but it gives me errors. first it said that it couldn't find LinearWeightNorm. i made sure that i had the most recent nn installed. that didn't help, but when i changed the line to require 'nn. LinearWeightNorm' then it worked.

now it is giving me errors about not finding SeqGRU_WN, SeqLSTM_WN and SeqLSTMP_WN. is there something else i need to do with these files or should lua be able to identify them since they're adjacent to the script?

edit: that should be 5x 55 second samples since this is at 16kHz instead of 44.1kHz. making it closer to 13 seconds to generate 1 second of audio.

Reason for big number of threads?

I frequently ran into the issue that the training does not start but gets stuck in the create_thread_pool() method. Looking at the number of threads it tries to allocate I got 128. This seems to be quite a lot. It looks like the thread count is equal to the number of Mini-Batches:

local minibatch_size = args.minibatch_size
local n_threads = minibatch_size

So when I manually set the thread count to 8 (based on the number of cores in my cpu) I do not notice any slowdown in training and the script does not get stuck in initialization anymore.

So I wonder what the reason for this high thread count is - maybe I am missing something important here?

Training on piano dataset, loss floats around 2.5

Hi there!

First of all: thanks for a great code release! This project is incredibly well-packaged and easy to get started with.

Using your pretrained model, I was able to generate some really nice piano samples. However, when I retrieve the piano dataset and attempt to train a model myself, I consistently get a run that looks like this --

-- with commensurately bad/noisy samples.

I'm training with all the default settings.

Any ideas or tips?

(One miscellaneous thing: I'm running on a single GPU, but train.lua remains flagged for multi-GPU. If the multi-GPU flag is set to false, the code bombs out with an error. I think I can see where to fix it, but haven't tried to do so yet 😅)