Comments (4)
Any chance you could post your v2 or at least the implementation of the chop function you put in conversion.py, please?
from acapellabot.
It's an out-of-memory error; the code currently tries to process songs all at once (rather than splitting them up and processing segments individually), and that's a problem for long songs!
A quick-and-dirty workaround is just to split up your input file into separate files and run it on each of those separately, then join the outputs. https://unix.stackexchange.com/questions/280767/how-do-i-split-an-audio-file-into-multiple https://superuser.com/questions/571463/how-do-i-append-a-bunch-of-wav-files-while-retaining-not-zero-padded-numeric
The good way is to have the code automatically slice up the input until it can fit a slice in memory, and then run each slice through and reassemble them for you.
Try adding a predict
method on acapellabot.py
line 88, something like:
def predict(self, spectrogram):
expandedSpectrogram = conversion.expandToGrid(spectrogram, self.peakDownscaleFactor)
sliceSizeTime = 6144
predictedSpectrogramWithBatchAndChannels = None
while sliceSizeTime >= self.peakDownscaleFactor and predictedSpectrogramWithBatchAndChannels is None:
try:
slices = conversion.chop(expandedSpectrogram, sliceSizeTime, expandedSpectrogram.shape[0])
outputSlices = []
for s in slices:
sWithBatchAndChannels = s[np.newaxis, :, :, :]
outputSlices.append(self.model.predict(sWithBatchAndChannels))
predictedSpectrogramWithBatchAndChannels = np.concatenate(outputSlices, axis=2)
except (RuntimeError, MemoryError):
console.info(sliceSizeTime, "is too large; trying", sliceSizeTime // 2)
sliceSizeTime = sliceSizeTime // 2
predictedSpectrogram = predictedSpectrogramWithBatchAndChannels[0, :, :, :]
newSpectrogram = predictedSpectrogram[:spectrogram.shape[0], :spectrogram.shape[1]]
return newSpectrogram
and replacing lines 95-103 with:
newSpectrogram = self.predict(spectrogram)
newAudio = conversion.spectrogramToAudioFile(newSpectrogram, fftWindowSize=fftWindowSize, phaseIterations=phaseIterations)
(this is copy-pasted from my v2 code, which uses stereo instead of mono, so there might be some issues actually getting it to run–will try to test later...)
from acapellabot.
I try to split the song into much smaller pieces, and it worked perfectly! Thanks a lot for your early reply!
Besides, I'm wondering what will happen if I run the full song on a server with large memory.
Actually what I am trying to do is separating two or more people's voices from each other. I think training this model with my own training set might be necessary. Got any tips for that?
from acapellabot.
The network is fully convolutional, so there's not much of a difference between running it on segments and running it on the whole thing (the one possible difference is artifacts at the boundaries between sections).
Multi-speaker source separation (from a monophonic source) is something this architecture will probably do poorly at. The way this architecture is designed right now, it only really makes judgement about individual harmonics, which isn't enough to separate speakers. For example, here's a vocal over sin/square wave chords–it's incredibly easy for the network to identify the vocals, since all you need to do is filter out all of the straight lines:
I would suggest using a deeper U-net architecture (to take a larger context into account) if you want to do multi-speaker separation. Even that will only be able to succeed by memorizing facts about specific speakers, though... a better implementation might have two input spectrograms to a large U-net: the multi-speaker spectrogram, and a "reference" spectrogram of one of the speakers, with the target output being that speaker's separated audio. Generating data for that is still pretty easy (you can probably even use my same script, just run it on lots of single-speaker recordings) but getting a good network is the tricky part. It might be worthwhile to start on a simpler case like decomposing saw waves/square waves, where it's more obvious what the network is (and should be) doing.
I found visualizing spectrograms (Audacity works well for this, as does Sonic Visualiser) was really helpful in understanding what the network should and shouldn't be able to do–if you can't tell the two speakers apart in spectrogram view (again, on a monophonic file), then it's unlikely that an image-to-image network in spectrogram space will be able to either.
from acapellabot.
Related Issues (13)
- Indexing elements must be in increasing order HOT 7
- python data.py . gives IndexError: too many indices for array HOT 4
- ValueError: all the input array dimensions except for the concatenation axis must match exactly
- how to extract only the music part?
- Acapellabot not responsing
- TypeError: Indexing elements must be in increasing order
- AssertionError: AbstractConv2d Theano optimization failed HOT 12
- how to train this model by myself? HOT 4
- Fitting step runs out of memory on GPU's HOT 1
- network construction question HOT 4
- TypeError: 'float' object cannot be interpreted as an index HOT 3
- ERROR (theano.gof.opt)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from acapellabot.