Comments (7)
Today I found my model trained on different music generated what sounded like white noise.
My problem appeared to be due to some of the converted wav files (generated in datasets/YourMusicLibrary/wave/
) being mono 32bit PCM audio at 8kHz whereas the GRUV conversion functions assume mono 16bit PCM audio at 8kHz.
If you find some of your wav files have the wrong bit-size, you can convert them with sox, e.g.:
sox oldfile.wav -b 16 newfile.wav
This might be the cause of the second issue you mentioned.
The first issue is probably due to over-fitting. Your trained model fits the training data well, but does not generalize to the validation data. However you want your validation loss to decrease at some point epochs earlier on. Some people have reported that for LSTM networks the validation loss can move up and down unpredictably during training before the optimal minimum is reached.
from gruv.
@gb96 Thanks for your reply. Have you trained a model which is capable of producing meaning sound?
I re-implemented the code and I forgot to normalize the raw audio data. That might be the reason for these two issues.
from gruv.
@nixingyang I have trained models that produce sound (e.g, https://soundcloud.com/gb96/stairway-to-gruv-hd512-epoch48000-loss067-seed3x3 )
Have you tried running the audio_unit_test or equivalent? (see https://github.com/MattVitelli/GRUV/blob/master/data_utils/parse_files.py#L190 )
That verifies methods for loading/saving sound files, converting between wave and Numpy formats, and converting between time-domain and frequency-domain representations (via Fast Fourier Transform and its reverse)
from gruv.
I have defined a function which is similar to audio_unit_test and I can confirm that the transformation process is lossless.
The audio you shared contains informative sound at the beginning. However, the model simply repeats useless sound after that. My prediction does not contain informative sound at all. Did you modify the generate_from_seed function and did you train your model solely on 65 seconds audio?
from gruv.
Looks like I have made some significant modifications to the generate_from_seed function.
The main idea of my changes is to keep a fixed seed-sequence length. New predicted values are appended to the end and initial values are deleted from the beginning to maintain constant length.
# Extrapolates from a given seed sequence
def generate_from_seed(model, seed, sequence_length, data_variance, data_mean):
seedSeq = seed.copy()
output = []
# The generation algorithm is simple:
# Step 1 - Given A = [X_0, X_1, ... X_n], generate X_n + 1
# Step 2 - Concatenate X_n + 1 onto A
# Step 3 - Repeat MAX_SEQ_LEN times
for it in xrange(sequence_length):
seedSeqNew = model.predict(seedSeq) #Step 1. Generate X_n + 1
# Step 2. Append it to the sequence
newSeq = seedSeqNew[0][seedSeqNew.shape[1]-1]
output.append(newSeq.copy())
# Construct new seedSeq
newSeq = np.reshape(newSeq, (1, 1, newSeq.shape[0]))
seedSeq = np.concatenate((seedSeq, newSeq), axis=1)
seedSeq = np.delete(seedSeq, 0, 1)
# Finally, post-process the generated sequence so that we have valid frequencies
# We're essentially just undo-ing the data centering process
for i in xrange(len(output)):
output[i] *= data_variance
output[i] += data_mean
return output
from gruv.
To answer your question about training data, I used the first 65 seconds audio from each channel of a stereo source, for a total of 130 seconds. The reason I did that was because the source music had quite distinct sounds in each channel (e.g. guitar notes in one and vocals in the other) and I figured it would be easier to train a LSTM network on the separate sounds rather than the combined mono version.
from gruv.
The modification of generate_from_seed is reasonable. From my point of view, the algorithm devised in GRUV is not capable of handling real-world audio signals. Google has revealed WaveNet which is probably the state of the art.
from gruv.
Related Issues (20)
- Exception: Layer timedistributeddense_1 requires to know the length of its input, but it could not be inferred automatically. HOT 7
- parse_files.py:72: ComplexWarning: Casting complex values to real discards the imaginary part HOT 2
- CPU not GPU HOT 1
- How to continue training process? HOT 4
- How to continue training without losing the song structure | hidden_dimension_size handling
- local variable 'epoch_logs' referenced before assignment HOT 1
- Exception: Compilation failed (return status=1)
- Is training process being persisted?
- I can't run convert_directory.py successfully HOT 1
- python convert_directory.py HOT 6
- AssertionError HOT 3
- Keras Issue HOT 10
- generation after trainig - empty wav file HOT 2
- Error on training
- Generation shape issue
- slice indeces must be integers HOT 2
- name 'xrange' is not defined HOT 4
- 'sox' is not recognized as an internal or external command, operable program or batch file.
- struggling to install lame HOT 1
- It would be great to have a docker image (or similar)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gruv.