Giter Site home page Giter Site logo

Pretty amazing results from a training run on classical guitar music (single instrument), at epoch 5 & 7 about samplernn_iclr2017 HOT 9 CLOSED

soroushmehr avatar soroushmehr commented on August 16, 2024 1
Pretty amazing results from a training run on classical guitar music (single instrument), at epoch 5 & 7

from samplernn_iclr2017.

Comments (9)

soroushmehr avatar soroushmehr commented on August 16, 2024 1

Thanks for sharing. Good to know! :)
People have tried it on different datasets, including Korean, classical music, and ambient. @richardassar even got interesting results from training on a couple of hours of Tangerine Dream works. See: https://soundcloud.com/psylent-v/tracks

from samplernn_iclr2017.

LinkOne1A avatar LinkOne1A commented on August 16, 2024

Do you know if the generated sound (of t-dream) is purely by the network vs mixed with some supporting passages (such as drums) by a human?

from samplernn_iclr2017.

richardassar avatar richardassar commented on August 16, 2024

from samplernn_iclr2017.

richardassar avatar richardassar commented on August 16, 2024

@LinkOne1A These guitar samples are really nice

from samplernn_iclr2017.

LinkOne1A avatar LinkOne1A commented on August 16, 2024

Credit to the N/network and folks with the sampelRNN paper!

I'm surprised that trainng on multi instrument data worked so well, I'm puzzled why that is. My intuition was that multiple instruments playing at the same time limit the space of valid ("pleasing" maybe is a better word) combinations (output), not sure how I would go about proving (or disprovinf this).

What has been your experience in this area?

from samplernn_iclr2017.

richardassar avatar richardassar commented on August 16, 2024

For Tangerine Dream the validation loss (in bits) was near to 3.2 after
300k iterations vs below 1.0 for solo piano. Note, I used the
three_tier.py model in both cases.

It seems weight normalisation, both in the linear layers and the
transformations inside the GRUs, helps with generalisation. I'm conducting
some experiments to verify this for myself.

Some of SampleRNN's capacity to generalise might be due to the quantization
noise introduced in going to 8 bits. It may be interesting to try something
like https://arxiv.org/pdf/1701.06548.pdf to further improve generalisation
however I've yet to observe an increase in validation loss so we're
probably slightly underparameterised on these datasets.

Ignoring independent_preds the three_tier model has around 18 million
parameters which is evidently sufficient to capture the temporal dynamics
of the signal to an acceptable degree. If you think about it from an
information theoretic point of view, kolmogorov complexity / minimum
description length, there's a lot of redundancy in the signal that can be
"compressed" away by the network.

The model seems to capture various synthesiser sounds, crowd noise from
the live recordings, both synthetic and real drums including correct
percussive patterns however it could not maintain a consistent tempo - this
could be helped with some conditioning by an auxiliary time series.

If you used preprocess.py in datasets/music/ then you may want to run
your experiments again. See:
#14

from samplernn_iclr2017.

LinkOne1A avatar LinkOne1A commented on August 16, 2024

Thanks for the detailes! Interestring and surprising that loss validation of 3.2 produces the t/dream segment.

What was the length of the original (total) audio?

How long did it take to get to 300K steps, and what kinda GPU do you have?

The quantization noise (going 8 bit), are you refereing to the mu-law encode/decode? I ran a stand alone test of mu-law affected wav file vs original wav, could not hear the differance and an inverted/summation of the two sources in audacity showed very little amplitude, mostly I think on the high end of the spectrum.

I haven't thought about the descriptive complexity of the source signal. Now that you mention it, I'd say that if a network had to deal with already compressed data and had to figure out a predictive solution to the next likely outcome in a series, I don't know... but my hunch is that it would be more difficult for the network, which means we would need more complexity (layers) in the network. This is my off-the-cuff though!

I'll check it out ( #14 ).

I have not yet looked into the 3 tier training.

from samplernn_iclr2017.

richardassar avatar richardassar commented on August 16, 2024

Total audio was about 32 hours, although due to the bug I didn't end up
training on all of it!

It seems that the required loss for acceptable samples is really relative
to the dataset, multiple instruments increase the entropy of the signal and
unlike piano it seems to get "lost" far less frequently because it has
more of a varied space in which to recover. Before fully converging the
piano samples sometimes go unstable, this effect was almost non-existent
when training on Tangerine Dream.

No, I'm referring to quantization noise as x - Q(x) for any quantization
scheme. Introducing noise of any kind acts as a regulariser e.g.
https://en.wikipedia.org/wiki/Tikhonov_regularization

It's true that compressed data has more entropy but if you decompress the
signal again, assuming lossy compression, the resulting entropy is lower
than the original signal and should be easier to model. I was referring to
the compressability of the signal, it seems there's plenty scope for that.

Something that would be interesting to try, akin to speaker conditioning as
mentioned in the Char2Wav paper, is conditioning on instruments on genre
with an embedded one-hot signal. This might allow interpretation between
styles etc. This is an area of research I'll be looking into over the next
while.

It took a couple of days to get to 300k steps, I'm training on a GTX1080.
The machine I have has two but the script does not split minibatches over
both GPUs. I have implemented SampleRNN myself and it can train 4x faster
(without GRU weightnorm), soon to be released.

from samplernn_iclr2017.

richardassar avatar richardassar commented on August 16, 2024

Although I have avoided it so far it's probably worth filtering out low amplitude audio segments from the training set. These get amplified during normalization which pulls up the noise floor introducing lots of high energy noise which can only disrupt or slow down training.

A plot of the RMS power over each segment in the piano database shows the distribution and you can see the tail of low energy signals on the right which could probably be pruned (especially the one segment with zero energy).

rms_powers

from samplernn_iclr2017.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.