Giter Site home page Giter Site logo

Comments (7)

gb96 avatar gb96 commented on August 11, 2024

Today I found my model trained on different music generated what sounded like white noise.

My problem appeared to be due to some of the converted wav files (generated in datasets/YourMusicLibrary/wave/ ) being mono 32bit PCM audio at 8kHz whereas the GRUV conversion functions assume mono 16bit PCM audio at 8kHz.

If you find some of your wav files have the wrong bit-size, you can convert them with sox, e.g.:
sox oldfile.wav -b 16 newfile.wav

This might be the cause of the second issue you mentioned.

The first issue is probably due to over-fitting. Your trained model fits the training data well, but does not generalize to the validation data. However you want your validation loss to decrease at some point epochs earlier on. Some people have reported that for LSTM networks the validation loss can move up and down unpredictably during training before the optimal minimum is reached.

from gruv.

nixingyang avatar nixingyang commented on August 11, 2024

@gb96 Thanks for your reply. Have you trained a model which is capable of producing meaning sound?
I re-implemented the code and I forgot to normalize the raw audio data. That might be the reason for these two issues.

from gruv.

gb96 avatar gb96 commented on August 11, 2024

@nixingyang I have trained models that produce sound (e.g, https://soundcloud.com/gb96/stairway-to-gruv-hd512-epoch48000-loss067-seed3x3 )

Have you tried running the audio_unit_test or equivalent? (see https://github.com/MattVitelli/GRUV/blob/master/data_utils/parse_files.py#L190 )

That verifies methods for loading/saving sound files, converting between wave and Numpy formats, and converting between time-domain and frequency-domain representations (via Fast Fourier Transform and its reverse)

from gruv.

nixingyang avatar nixingyang commented on August 11, 2024

I have defined a function which is similar to audio_unit_test and I can confirm that the transformation process is lossless.
The audio you shared contains informative sound at the beginning. However, the model simply repeats useless sound after that. My prediction does not contain informative sound at all. Did you modify the generate_from_seed function and did you train your model solely on 65 seconds audio?

from gruv.

gb96 avatar gb96 commented on August 11, 2024

Looks like I have made some significant modifications to the generate_from_seed function.
The main idea of my changes is to keep a fixed seed-sequence length. New predicted values are appended to the end and initial values are deleted from the beginning to maintain constant length.

# Extrapolates from a given seed sequence
def generate_from_seed(model, seed, sequence_length, data_variance, data_mean):
    seedSeq = seed.copy()
    output = []
    # The generation algorithm is simple:
    # Step 1 - Given A = [X_0, X_1, ... X_n], generate X_n + 1
    # Step 2 - Concatenate X_n + 1 onto A
    # Step 3 - Repeat MAX_SEQ_LEN times
    for it in xrange(sequence_length):
        seedSeqNew = model.predict(seedSeq) #Step 1. Generate X_n + 1
        # Step 2. Append it to the sequence
        newSeq = seedSeqNew[0][seedSeqNew.shape[1]-1]
        output.append(newSeq.copy()) 
        # Construct new seedSeq
        newSeq = np.reshape(newSeq, (1, 1, newSeq.shape[0]))
        seedSeq = np.concatenate((seedSeq, newSeq), axis=1)
        seedSeq = np.delete(seedSeq, 0, 1)
    # Finally, post-process the generated sequence so that we have valid frequencies
    # We're essentially just undo-ing the data centering process
    for i in xrange(len(output)):
        output[i] *= data_variance
        output[i] += data_mean
    return output

from gruv.

gb96 avatar gb96 commented on August 11, 2024

To answer your question about training data, I used the first 65 seconds audio from each channel of a stereo source, for a total of 130 seconds. The reason I did that was because the source music had quite distinct sounds in each channel (e.g. guitar notes in one and vocals in the other) and I figured it would be easier to train a LSTM network on the separate sounds rather than the combined mono version.

from gruv.

nixingyang avatar nixingyang commented on August 11, 2024

The modification of generate_from_seed is reasonable. From my point of view, the algorithm devised in GRUV is not capable of handling real-world audio signals. Google has revealed WaveNet which is probably the state of the art.

from gruv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.