algorithmic-music-exploration / amen Goto Github PK
View Code? Open in Web Editor NEWA toolbox for algorithmic remixing, after Echo Nest Remix
License: BSD 2-Clause "Simplified" License
A toolbox for algorithmic remixing, after Echo Nest Remix
License: BSD 2-Clause "Simplified" License
1 ⌂ py35 master × ~/git/amen/amen/examples
→ python reverse.py ~/git/librosa/tests/data/test1_44100.wav
/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/scipy/sparse/compressed.py:739: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
SparseEfficiencyWarning)
Traceback (most recent call last):
File "reverse.py", line 19, in <module>
out = synthesize(beats)
File "/home/bmcfee/git/amen/amen/synthesize.py", line 92, in synthesize
sparse_array[1, right_start:right_end] += resampled_audio[1]
File "/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/scipy/sparse/csr.py", line 272, in __getitem__
return self._get_row_slice(row, col)
File "/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/scipy/sparse/csr.py", line 353, in _get_row_slice
row_slice = self._get_submatrix(i, cslice)
File "/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/scipy/sparse/csr.py", line 420, in _get_submatrix
check_bounds(j0, j1, N)
File "/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/scipy/sparse/csr.py", line 415, in check_bounds
" %d <= %d" % (i0, num, i1, num, i0, i1))
IndexError: index out of bounds: 0 <= 52919990 <= 52920000, 0 <= 88360 <= 52920000, 52919990 <= 88360
As per @Cortexelus on the timbre thread, it would be nice to have Echo Nest style segments as a Timing.
Will doing this make anything weird, in terms of how we handle Feature data with pandas? I don't think so.
Librosa has defaults of a 22050 sample rate, and a hop length of 512.
Brian comment that we may eventually use other feature generation / analysis tools that having other defaults, and should consider that.
I feel like we can worry about that when we get there, myself?
FileNotFoundError: [Errno 2] No such file or directory: '/private/var/folders/2c/99n_4g3n0ml7d1gz40y77jc00000gn/T/pip-build-5nik37q_/amen/examples'
Ooops. Working on it.
For upstream acceleration prioritizing.
Just a heads up:
I took the liberty of adding review ninja on here, in an effort to better keep track of what's already been reviewed via PR status.
If people hate it, we can shut it off.
This is related to #66, but is a bit different: if I want to put an effect (a delay or a compressor or a pitch shifter) on to a certain chunk of audio, how do I do that?
I think a decent answer is to build up effect chains that can then be applied to a given TimeSlice, and that are not applied until synthesize
is called. So track-level deformations trigger a new analysis, but you can also just out an EQ on some signal without making a new Audio object.
/home/bmcfee/git/amen/amen/synthesize.py in synthesize(inputs)
66 for i, (time_slice, start_time) in enumerate(inputs):
67 # if we have a mono file, we return stereo here.
---> 68 resampled_audio, left_offset, right_offset = time_slice.get_samples()
69
70 # set the initial offset, so we don't miss the start of the array
/home/bmcfee/git/amen/amen/timing.py in get_samples(self)
35 left_offsets, right_offsets = self._get_offsets(starting_sample,
36 ending_sample,
---> 37 self.audio.num_channels)
38
39 samples = self._offset_samples(starting_sample, ending_sample,
/home/bmcfee/git/amen/amen/timing.py in _get_offsets(self, starting_sample, ending_sample, num_channels)
60 ending_offset = 0
61 else:
---> 62 ending_crossing = zero_index[bisect_right(zero_index, ending_sample)]
63 ending_offset = ending_crossing - ending_sample
64
IndexError: index 355701 is out of bounds for axis 0 with size 355701
I think the problem here is that bisect_right(arr, x)
can return len(arr)
if x > arr[i]
for all i
. We can detect this case and fall back to bisect_left
(or just set it to the last element).
We summon this when computing beats - we just need to add it to Audio, so we can do audio.tempo
In implementing a few remix hacks yesterday, I kept finding myself wanting to construct a new Audio
object from a time slice. Currently, the only way to do this is to extract the waveform by get_samples()
and then instantiate a new Audio
object.
This is undesirable for a few reasons:
1 is okay, but 2 is a deal breaker if you're extracting short clips (eg beats), which may be too short for certain analyses to make sense.
What do folks think about making a shortcut for this kind of operation that propagates features (and timings) from the source audio of a time slice? This way, we can also preserve things like beat timings within a sliced interval, which might come out differently if the interval is analyzed independently of the full track.
If we're careful about things, the audio buffer could also be shared between audio objects by slicing.
@bmcfee had some thoughts about this - I feel like we said that we would not use librosa for this?
One of my favorite things about Remix was the ability to do for beat in beats
, and so on.
We don't currently have an iterator over features, e.g. for amp in amplitudes
.
Likewise, we don't have an easy way to get TimeSlices and the features to use to manipulate them, unless we do something like:
amps = audio.features['amplitude'].at(audio.timings['beats'])
for feature, beat in zip(amps, audio.timings['beats']):
# do things to each beat based on feature
I feel like we should:
a) make the data in the dataframe of a feature iterable.
b) Allow a feature to reference its timings. Something like feature.with_time()
, maybe?
To contrast:
amps = audio.features['amplitude'].at(audio.timings['beats'])
for feature, beat in amps.with_time()
# do things to each beat based on feature
A problem with that is that feature objects that have not been resampled do not have any TimeSlices to reference.
Thoughts?
See http://developer.echonest.com/docs/v4/_static/AnalyzeDocumentation.pdf
I suspect we can get this with librosa.onset.onset_detect
You guys are preventing it from being used for something more interesting 😄
I guess the parent Audio object should have a key?
Or should we be cute and make it computable per-TimeSlice?
Let's start scoping this thing out!
What functionality does the Analysis object need to provide, and what should the interface look like?
For now, let's not limit ourselves to compatibility with EN remix. Backwards compatibility can always be tacked on with a translation layer. I'm more interested in making sure the core is well designed and extensible in the right kinds of ways.
I'll start off a check-list of features it should expose, but the interface can come later.
[EDIT: 2015-06-06, restructured the feature list by type]
Blue-sky feature wish list:
Some general design principles:
...I feel like @bmcfee had some thoughts about how to do this, but I may be misremembering.
[documenting offline conversation with @blacker ]
Some quick thoughts about how the interface for synthesizing waveforms should look.
>>> def my_generator(Track):
start = Track.duration
for beat in Track.beats[::-1]:
start = start - beat.duration
yield start, beat
>>> syn = synthesize(duration=Track.duration, my_generator)
synthesize
function iterates over the generator and adds samples into the output stream. It returns a new audio object (I guess, an audio container object itself). Stereo/resampling/zc-alignment are all handled within synthesize
This makes it easy to do concatenative synthesis (as above). You can also do additive mixing by having overlapping target times.
Just jotting this down before I forget.
librosa 0.5 will add dynamic time warping (not totally relevant for amen), and as a side-effect, optional numba jit compilation for certain methods.
This should make it much easier to accelerate certain bottleneck ops like zero-crossing alignment.
As per #4. Let's do these after we do all the other ones.
Blue-sky feature wish list:
I am currently giving us amplitude
, by using librosa.feature.rmse
. Please close if I am doing the right thing!
As per the comments in #86, we should get to this soon!
http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#resample-api
At a cursory glance, this appears to resample at constant intervals (df.resample('2s',how='sum')
, for example), whereas we need to resample at varying intervals. Will try to read more at the hack this weekend.
Starting a new thread to expand on the feature object design that I vaguely started in #4.
In the previous jam, we had features like 'timbre' and 'pitch' connected to segments. I'd like to abstract out features from timing in amen, since different features may come at different sampling rates. Here, I'm distinguishing feature observations like 'pitch' and 'timbre' from time-index observations, like 'beat' and 'segment'.
To make this all work, I'm thinking of the following design. First, features are stored as pandas dataframes with a time-valued index. This gives us a few nice features right off the bat:
pitch['D#']
instead of pitch[4]
Then, we can define a Feature
class which wraps the dataframe, and provides a few extra operations:
TimeSlice
collection. This way, we can index a feature object (say, pitch_class
) by any type of time-interval indices (eg, 'beats' or 'segments')At the end of the day, the old style of
>>> [beat.pitches for beat in track.beats]
would look more like
>>> track.features.pitch[beats]
with the added benefit that indexing the Feature
object pitch
will return a new Feature
object (with time indexing and column headers), rather than a flat list.
..brought to you by a frustrating day reinstalling my Ubuntu partition.
For the moment, I think we just worry about loading WAV and MP3 - Brian's said that librosa can deal with both of them.
I feel like we want to do analysis = amen.load('some_audio_file.wav')
. Does this generate all analyses that are possible? Or do we do something like:
audio = amen.load('some_audio_file')
audio.get_analysis('pitches')
audio.get_time_slices('beats') # this needs a better name
audio['pitches'].at(audio.beats)
# do something with beats and pitch analysis
The former has the advantage of being simple - even people who don't know what they're looking for can get analysis data with a single line of code. The latter has the advantage of being faster, more modular, and more specific.
Or, maybe we generate some basic things when we do amen.load
(beats, pitches, etc?) - but if you want, say, some black-magic timbre analysis, you can run `audio.get_analysis('black_magic_timbre')
I'm not sure this logic is entirely correct for mapping slice points to zero crossings.
The bisection search finds the insertion index i
of a value v
into a sorted list a
, but does not tell you which of a[i], a[i+1]
is closer to v
.
For example, if a = [10, 20]
, both 11
and 19
have insertion index of 1
, but the closest value positions are different.
This is easily fixable, and probably doesn't matter much in practice anyway.
I was playing with the reverse.py example, and noticed that things were sounding ... strange.
Looking into the code, I noticed this, which is incorrect. The problem here is that fix_frames
is intended for use with frame indices, which must be integer typed. When you call this after mapping back to the time representation, everything gets rounded to the nearest second.
I'll fix and PR.
This package is awesome: https://pypi.python.org/pypi/vamp
I think it would be pretty easy to build some glue that converts vamp outputs into feature objects ala #6 .
...with py.test? (https://docs.pytest.org/en/latest/)
This is ultra-boring tech debt stuff, right here.
https://github.com/marl/medleydb/blob/master/medleydb/sox.py
^^ Are there things we can do with this? Or does the fact that sox mostly works on files, not np.arrays make it not worth it.
Tagging @bmcfee to do this, because he's done it for librosa.
I am open to not doing this, but it feels sort of nice to give such things to people.
On the other hand, they'll be down in some awkward place, and everyone can just copy the example code from here.
Is there a way to use https://github.com/echonest/remix-examples/tree/master/waltzify and https://github.com/echonest/remix-examples/tree/master/swinger with amen?
Or do they need some rewriting?
What Remix called "sections". Once again, I feel like @bmcfee had special magic.
When running the installation test script as described on the README I get this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.7/site-packages/amen/synthesize.py", line 100, in synthesize
truncated_array = sparse_array[:, 0:max_samples].toarray()
File "/Library/Python/2.7/site-packages/scipy/sparse/lil.py", line 289, in __getitem__
return self._get_row_ranges(i, j)
File "/Library/Python/2.7/site-packages/scipy/sparse/lil.py", line 320, in _get_row_ranges
j_start, j_stop, j_stride = col_slice.indices(self.shape[1])
TypeError: only integer scalar arrays can be converted to a scalar index
I was able to trace this down to the max_samples
array not being parsed properly to an array index when used in sparse_array[:, 0:max_samples].toarray()
. I was able to fix the issue on my local system by changing this to sparse_array[:, 0:max_samples[0]].toarray()
.
Not sure if this is an error associated with my installation or with the code here. This is a fresh install of Anaconda on Python 2.7.
As per @bmcfee:
Ok, my original bad test case works now, but this one fails
→ python reverse.py ~/data/CAL500/mp3/art_tatum-willow_weep_for_me.mp3
Traceback (most recent call last):
File "reverse.py", line 19, in <module>
out = synthesize(beats)
File "/home/bmcfee/git/amen/amen/synthesize.py", line 64, in synthesize
resampled_audio, left_offsets, right_offsets = time_slice.get_samples()
File "/home/bmcfee/git/amen/amen/time.py", line 32, in get_samples
left_offsets, right_offsets = self._get_offsets(starting_sample, ending_sample)
File "/home/bmcfee/git/amen/amen/time.py", line 46, in _get_offsets
zero_crossings = librosa.zero_crossings(channel)
File "/home/bmcfee/git/librosa/librosa/core/audio.py", line 526, in zero_crossings
y[np.abs(y) <= threshold] = 0
TypeError: 'numpy.float32' object does not support item assignment
Can we expand the test fixtures here to have both stereo and mono examples?
(Yes, yes we can.)
Thank you! We're porting a web app from Echonest that is basically a fork of P. Sobot's Forever.fm and a couple of features we require are:
We were using a few other Capsule functions as well and I'm wondering if there are any plans to incorporate any of these remix helpers into Amen.
If it does make sense to include AudioData (and even AudioStream) I can add them to my fork and submit a PR.
Librosa gives us MFCCs - we should use those.
I was kinda dissapointed when I saw this as the successor to the Remix API only to find it didn't have a feature that I wanted to try out. So far, importing an mp4 works and everything, all up until the export process. I'd love to see this feature implemented.
...is my computer amazingly slow? How long should it take for librosa to analyze a five minute long wav file?
I am getting like 45 second to a minute to create an Audio object, and even longer for my apparently awful synthesis code to run. Has anyone had comparable experiences?
Old Remix had lots of people on it, and we should mention them, 'cause they're great.
pip install is failing to find new lib changes because there are no version changes, the version is always 0.0.0.
In order to install, I can not use pip install amen
, I need to instead use pip install git+git://github.com/algorithmic-music-exploration/amen
How do we want/expect people to manipulate audio within amen?
The synthesize
function is great for re-arranging a clip by timing, but doesn't give us a handle on how to do things like, say, vocal subtraction or time-stretching.
Do we want to provide an object interface for this kind of thing? Or just let folks hack functions themselves? Either way, I think we should not support/allow in-place modification of the audio buffers, since it would either trigger an (expensive) feature analysis or have inconsistent results.
For example, a time-stretcher might look something like:
import pyrubberband as pyrb
def amen_time_stretch(audio, rate=1.0):
y_stretch = pyrb.time_stretch(audio.raw_samples, audio.sample_rate, rate=rate)
return Audio(raw_samples=y_stretch,
sample_rate=sample_rate,
analysis_sample_rate=audio.analysis_sample_rate)
This is pretty simple, but it bothers me that you have to access the Audio
object's internals directly and propagate them manually. Maybe that's the only way though?
More generally, I could imagine effects that return multiple clips (eg, source separation), so a consistent object interface might be tricky to pull off here.
@bmcfee says "cgt", which I can't find in librosa. Any wisdom?
How far out from this are we? We clearly don't have comparable feature extraction compared to the old remix, but is it worth announcing it / putting it on PyPi anyways?
Related to this is that the Monthly Music Hackathon for February is Automatic Music, so we could announce it as one of the talks.
Thoughts?
Started in #40. I think the only open question is what we should name the keys in the FeatureCollection.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.