algorithmic-music-exploration / amen Goto Github PK

View Code? Open in Web Editor NEW

332.0 332.0 87.0 1.55 MB

A toolbox for algorithmic remixing, after Echo Nest Remix

License: BSD 2-Clause "Simplified" License

Python 98.32% Shell 1.68%

amen's People

Contributors

Stargazers

Watchers

Forkers

eq4 ruohoruotsi wxy656 jonatassaraiva mth2610 kyeoh rtao shaoyizhang joaozenos majidf999 firstlineschools sprakellab v1dlak timhysniu alycd bernalpatrick samueltenka ignacionf shotaaaaa kotaishida pmeadley ashbeats soyungenio garymama chrislawrence21 akshayadav asudano hakal miftahulfajri paulohamaral vystydp vshkola alberto-grande girishc13 ginasirois sq-ricardo-pereira lucvanw iwabuchishin virth nikit-ac dexter1902 pearsquirrel igor4566 suname george2718 ghayyas walmokrani thekozitwo martynaluc svenabels haocity curly-mo nihimoto bupterambition ricardocalegaro romansharapov hannahpun mrbfrank vyraun drensin jonathanmarmor kevd1337 bwhitman agangzz yuhongqian daitomanabe auctores-varii templeblock epinnock jormacmoo elfreeman vwinter batermj mkb218 music-apps jokandre phi-line drmelon kalki7 8secz-johndpope rohanaggarwal45 limkokholefork undermybrella itsbrex bsturk

amen's Issues

reversal example fails in the wild

 1  ⌂ py35   master ×  ~/git/amen/amen/examples 
 →  python reverse.py  ~/git/librosa/tests/data/test1_44100.wav 
/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/scipy/sparse/compressed.py:739: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
  SparseEfficiencyWarning)
Traceback (most recent call last):
  File "reverse.py", line 19, in <module>
    out = synthesize(beats)
  File "/home/bmcfee/git/amen/amen/synthesize.py", line 92, in synthesize
    sparse_array[1, right_start:right_end] += resampled_audio[1]
  File "/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/scipy/sparse/csr.py", line 272, in __getitem__
    return self._get_row_slice(row, col)
  File "/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/scipy/sparse/csr.py", line 353, in _get_row_slice
    row_slice = self._get_submatrix(i, cslice)
  File "/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/scipy/sparse/csr.py", line 420, in _get_submatrix
    check_bounds(j0, j1, N)
  File "/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/scipy/sparse/csr.py", line 415, in check_bounds
    " %d <= %d" % (i0, num, i1, num, i0, i1))
IndexError: index out of bounds: 0 <= 52919990 <= 52920000, 0 <= 88360 <= 52920000, 52919990 <= 88360

Per-TimeSlice timestreching / effects

Needed to emulate the Swinger! See #104 and #66 and #105 for current thinking and code around this.

It may be helpful to allow a per-TimeSlice timestrech without impacting the entire parent Audio object – and maybe this happens at synthesis time? Unsure yet.

Feature: Segments

As per @Cortexelus on the timbre thread, it would be nice to have Echo Nest style segments as a Timing.

Will doing this make anything weird, in terms of how we handle Feature data with pandas? I don't think so.

Default DSP Settings

Librosa has defaults of a 22050 sample rate, and a hop length of 512.

Brian comment that we may eventually use other feature generation / analysis tools that having other defaults, and should consider that.

I feel like we can worry about that when we get there, myself?

Pip install for 0.0.1 fails

FileNotFoundError: [Errno 2] No such file or directory: '/private/var/folders/2c/99n_4g3n0ml7d1gz40y77jc00000gn/T/pip-build-5nik37q_/amen/examples'

Ooops. Working on it.

Profile amen analyzer

For upstream acceleration prioritizing.

https://docs.python.org/3.6/library/profile.html

Code review process

Just a heads up:

I took the liberty of adding review ninja on here, in an effort to better keep track of what's already been reviewed via PR status.

If people hate it, we can shut it off.

Effects

This is related to #66, but is a bit different: if I want to put an effect (a delay or a compressor or a pitch shifter) on to a certain chunk of audio, how do I do that?

I think a decent answer is to build up effect chains that can then be applied to a given TimeSlice, and that are not applied until synthesize is called. So track-level deformations trigger a new analysis, but you can also just out an EQ on some signal without making a new Audio object.

zero-crossing alignment walks off the end of the sample buffer

/home/bmcfee/git/amen/amen/synthesize.py in synthesize(inputs)
     66     for i, (time_slice, start_time) in enumerate(inputs):
     67         # if we have a mono file, we return stereo here.
---> 68         resampled_audio, left_offset, right_offset = time_slice.get_samples()
     69 
     70         # set the initial offset, so we don't miss the start of the array

/home/bmcfee/git/amen/amen/timing.py in get_samples(self)
     35         left_offsets, right_offsets = self._get_offsets(starting_sample,
     36                                                         ending_sample,
---> 37                                                         self.audio.num_channels)
     38 
     39         samples = self._offset_samples(starting_sample, ending_sample,

/home/bmcfee/git/amen/amen/timing.py in _get_offsets(self, starting_sample, ending_sample, num_channels)
     60                 ending_offset = 0
     61             else:
---> 62                 ending_crossing = zero_index[bisect_right(zero_index, ending_sample)]
     63                 ending_offset = ending_crossing - ending_sample
     64 

IndexError: index 355701 is out of bounds for axis 0 with size 355701

I think the problem here is that bisect_right(arr, x) can return len(arr) if x > arr[i] for all i. We can detect this case and fall back to bisect_left (or just set it to the last element).

Feature: Tempo

We summon this when computing beats - we just need to add it to Audio, so we can do audio.tempo

TimeSlice.get_audio()

In implementing a few remix hacks yesterday, I kept finding myself wanting to construct a new Audio object from a time slice. Currently, the only way to do this is to extract the waveform by get_samples() and then instantiate a new Audio object.

This is undesirable for a few reasons:

It's clunky
It triggers a full copy and re-analysis of the audio

1 is okay, but 2 is a deal breaker if you're extracting short clips (eg beats), which may be too short for certain analyses to make sense.

What do folks think about making a shortcut for this kind of operation that propagates features (and timings) from the source audio of a time slice? This way, we can also preserve things like beat timings within a sliced interval, which might come out differently if the interval is analyzed independently of the full track.

If we're careful about things, the audio buffer could also be shared between audio objects by slicing.

Feature: Bars

@bmcfee had some thoughts about this - I feel like we said that we would not use librosa for this?

Iterating over features?

One of my favorite things about Remix was the ability to do for beat in beats, and so on.

We don't currently have an iterator over features, e.g. for amp in amplitudes.

Likewise, we don't have an easy way to get TimeSlices and the features to use to manipulate them, unless we do something like:

amps = audio.features['amplitude'].at(audio.timings['beats'])
for feature, beat in zip(amps, audio.timings['beats']):
    # do things to each beat based on feature

I feel like we should:
a) make the data in the dataframe of a feature iterable.
b) Allow a feature to reference its timings. Something like feature.with_time(), maybe?

To contrast:

amps = audio.features['amplitude'].at(audio.timings['beats'])
for feature, beat in amps.with_time()
    # do things to each beat based on feature

A problem with that is that feature objects that have not been resampled do not have any TimeSlices to reference.

Thoughts?

Echo Nest Converter Features

See http://developer.echonest.com/docs/v4/_static/AnalyzeDocumentation.pdf

Feature: Onsets

I suspect we can get this with librosa.onset.onset_detect

Don't snipe http://readme.md

You guys are preventing it from being used for something more interesting 😄

Feature: Key

I guess the parent Audio object should have a key?

Or should we be cute and make it computable per-TimeSlice?

Analysis object

Let's start scoping this thing out!

What functionality does the Analysis object need to provide, and what should the interface look like?

For now, let's not limit ourselves to compatibility with EN remix. Backwards compatibility can always be tacked on with a translation layer. I'm more interested in making sure the core is well designed and extensible in the right kinds of ways.

I'll start off a check-list of features it should expose, but the interface can come later.

[EDIT: 2015-06-06, restructured the feature list by type]

Features (time-indexed)
- tempo
- key
- time signature
- timbre (ie MFCC)
- pitch class (ie chroma)
- pitch (ie cqt)
- loudness (log-RMSE)
Timings
- onsets and/or segments/tatums
- beat
- bar
- structural boundaries

Blue-sky feature wish list:

instrument activation
chord estimations
melody/f0 contours
high-level rhythmic analysis

Some general design principles:

all timing measurements should be aligned to the closest zero-crossings, though I'm not sure how to reconcile that with stereo-vs-mono. Anyone have thoughts on this?
feature extractors should be modular and pluggable. I could imagine wanting to refine a particular model's implementation without changing its interface. Extractors should therefore include semantic versioning in their metadata. I have a prototype of something like this in seymour, but it could be done much better.
In the interest of minimizing redundancy, it might be good to design so that any feature (pitch, pitch class, timbre, etc) can be sampled relative to any timing (track, beat, onset) with appropriate aggregation (mean, max, median, etc). I'm not sure what exactly this means for serialization of analysis objects.
As we've discussed offline, I'm planning to implement a suite of analysis modules that live on top of librosa, but in a separate package. Probably that package should be a dependency for amen once it stabilizes, since analysis is useful in broader contexts than remix.

Feature: Time Signature

...I feel like @bmcfee had some thoughts about how to do this, but I may be misremembering.

Remix interface

[documenting offline conversation with @blacker ]

Some quick thoughts about how the interface for synthesizing waveforms should look.

We'll need some kind of audio container object that includes an audio buffer + metadata (sampling rate, # channels, duration, etc)
Synthesis happens via generators, eg something like this for a beat-reverser:

>>> def my_generator(Track):
    start = Track.duration
    for beat in Track.beats[::-1]:
        start = start - beat.duration
        yield start, beat
>>> syn = synthesize(duration=Track.duration, my_generator)

The generator returns an audio container corresponding to a particular sample, and a target position for the sample in the output buffer
The synthesize function iterates over the generator and adds samples into the output stream. It returns a new audio object (I guess, an audio container object itself). Stereo/resampling/zc-alignment are all handled within synthesize

This makes it easy to do concatenative synthesis (as above). You can also do additive mixing by having overlapping target times.

Accelerating various ops

Just jotting this down before I forget.

librosa 0.5 will add dynamic time warping (not totally relevant for amen), and as a side-effect, optional numba jit compilation for certain methods.

This should make it much easier to accelerate certain bottleneck ops like zero-crossing alignment.

Blue Sky Features

As per #4. Let's do these after we do all the other ones.

Blue-sky feature wish list:

instrument activation
chord estimations
melody/f0 contours
high-level rhythmic analysis

Feature: Loudness

I am currently giving us amplitude, by using librosa.feature.rmse. Please close if I am doing the right thing!

Make 0.0.1

As per the comments in #86, we should get to this soon!

Can we use pandas 0.18 to make things nicer?

http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#resample-api

At a cursory glance, this appears to resample at constant intervals (df.resample('2s',how='sum'), for example), whereas we need to resample at varying intervals. Will try to read more at the hack this weekend.

Feature object

Starting a new thread to expand on the feature object design that I vaguely started in #4.

In the previous jam, we had features like 'timbre' and 'pitch' connected to segments. I'd like to abstract out features from timing in amen, since different features may come at different sampling rates. Here, I'm distinguishing feature observations like 'pitch' and 'timbre' from time-index observations, like 'beat' and 'segment'.

To make this all work, I'm thinking of the following design. First, features are stored as pandas dataframes with a time-valued index. This gives us a few nice features right off the bat:

direct connection between a feature observation and its position in the audio signal
sane column-headings, eg, pitch['D#'] instead of pitch[4]
numpy-friendly operations (slicing, math)
sql-like operations (joining, merging)
time-friendly resampling

Then, we can define a Feature class which wraps the dataframe, and provides a few extra operations:

re-indexing, relative to a TimeSlice collection. This way, we can index a feature object (say, pitch_class) by any type of time-interval indices (eg, 'beats' or 'segments')
data-dependent interpolation logic. Re-indexing will necessarily involve some statistical summarization if the time points span multiple values. Numeric types may be summarized by different statistics (mean, median, max); categorical types (eg, chord labels) may be summarized by mode. This definition will be part of the corresponding derived Feature class
iteration over observations (rows) of the dataframe

At the end of the day, the old style of

>>> [beat.pitches for beat in track.beats]

would look more like

>>> track.features.pitch[beats]

with the added benefit that indexing the Feature object pitch will return a new Feature object (with time indexing and column headers), rather than a flat list.

Loading Audio

..brought to you by a frustrating day reinstalling my Ubuntu partition.

For the moment, I think we just worry about loading WAV and MP3 - Brian's said that librosa can deal with both of them.

I feel like we want to do analysis = amen.load('some_audio_file.wav'). Does this generate all analyses that are possible? Or do we do something like:

audio = amen.load('some_audio_file')
audio.get_analysis('pitches')
audio.get_time_slices('beats') # this needs a better name
audio['pitches'].at(audio.beats)
# do something with beats and pitch analysis

The former has the advantage of being simple - even people who don't know what they're looking for can get analysis data with a single line of code. The latter has the advantage of being faster, more modular, and more specific.

Or, maybe we generate some basic things when we do amen.load (beats, pitches, etc?) - but if you want, say, some black-magic timbre analysis, you can run `audio.get_analysis('black_magic_timbre')

TimeSlice.get_samples() minor error

I'm not sure this logic is entirely correct for mapping slice points to zero crossings.

The bisection search finds the insertion index i of a value v into a sorted list a, but does not tell you which of a[i], a[i+1] is closer to v.

For example, if a = [10, 20], both 11 and 19 have insertion index of 1, but the closest value positions are different.

This is easily fixable, and probably doesn't matter much in practice anyway.

beat timing is incorrectly quantized

I was playing with the reverse.py example, and noticed that things were sounding ... strange.

Looking into the code, I noticed this, which is incorrect. The problem here is that fix_frames is intended for use with frame indices, which must be integer typed. When you call this after mapping back to the time representation, everything gets rounded to the nearest second.

I'll fix and PR.

More Examples!

Combining two tracks
Chaining functions
Sorting
Echo Nest json

Features via vamp

This package is awesome: https://pypi.python.org/pypi/vamp

I think it would be pretty easy to build some glue that converts vamp outputs into feature objects ala #6 .

Replace Nosetest

...with py.test? (https://docs.pytest.org/en/latest/)

This is ultra-boring tech debt stuff, right here.

Potential processing using sox.py

https://github.com/marl/medleydb/blob/master/medleydb/sox.py

^^ Are there things we can do with this? Or does the fact that sox mostly works on files, not np.arrays make it not worth it.

Move to new Travis system

Tagging @bmcfee to do this, because he's done it for librosa.

Package examples with installation

I am open to not doing this, but it feels sort of nice to give such things to people.

On the other hand, they'll be down in some awkward place, and everyone can just copy the example code from here.

Swinger / Waltzify?

Is there a way to use https://github.com/echonest/remix-examples/tree/master/waltzify and https://github.com/echonest/remix-examples/tree/master/swinger with amen?

Or do they need some rewriting?

Is there any way to give the Audio input in the form of a URL?

Feature: Structural Boundaries

What Remix called "sections". Once again, I feel like @bmcfee had special magic.

Installation test bug: TypeError: only integer scalar arrays can be converted to a scalar index

When running the installation test script as described on the README I get this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/amen/synthesize.py", line 100, in synthesize
    truncated_array = sparse_array[:, 0:max_samples].toarray()
  File "/Library/Python/2.7/site-packages/scipy/sparse/lil.py", line 289, in __getitem__
    return self._get_row_ranges(i, j)
  File "/Library/Python/2.7/site-packages/scipy/sparse/lil.py", line 320, in _get_row_ranges
    j_start, j_stop, j_stride = col_slice.indices(self.shape[1])
TypeError: only integer scalar arrays can be converted to a scalar index

I was able to trace this down to the max_samples array not being parsed properly to an array index when used in sparse_array[:, 0:max_samples].toarray(). I was able to fix the issue on my local system by changing this to sparse_array[:, 0:max_samples[0]].toarray().

Not sure if this is an error associated with my installation or with the code here. This is a fresh install of Anaconda on Python 2.7.

Synthesize fails on mono tracks.

As per @bmcfee:

Ok, my original bad test case works now, but this one fails

→  python reverse.py  ~/data/CAL500/mp3/art_tatum-willow_weep_for_me.mp3 
Traceback (most recent call last):
  File "reverse.py", line 19, in <module>
    out = synthesize(beats)
  File "/home/bmcfee/git/amen/amen/synthesize.py", line 64, in synthesize
    resampled_audio, left_offsets, right_offsets = time_slice.get_samples()
  File "/home/bmcfee/git/amen/amen/time.py", line 32, in get_samples
    left_offsets, right_offsets = self._get_offsets(starting_sample, ending_sample)
  File "/home/bmcfee/git/amen/amen/time.py", line 46, in _get_offsets
    zero_crossings = librosa.zero_crossings(channel)
  File "/home/bmcfee/git/librosa/librosa/core/audio.py", line 526, in zero_crossings
    y[np.abs(y) <= threshold] = 0
TypeError: 'numpy.float32' object does not support item assignment

Can we expand the test fixtures here to have both stereo and mono examples?

(Yes, yes we can.)

Porting from Echonest

Thank you! We're porting a web app from Echonest that is basically a fork of P. Sobot's Forever.fm and a couple of features we require are:

Crossfade (which was coming from Action/cAction)
2. AudioData (which I [grabbed](https://github.com/echonest/remix/blob/master/src/echonest/remix/audio.py along with the ffmpeg wrapper it depended on)

We were using a few other Capsule functions as well and I'm wondering if there are any plans to incorporate any of these remix helpers into Amen.

If it does make sense to include AudioData (and even AudioStream) I can add them to my fork and submit a PR.

Feature: Timbre

Librosa gives us MFCCs - we should use those.

Video support?

I was kinda dissapointed when I saw this as the successor to the Remix API only to find it didn't have a feature that I wanted to try out. So far, importing an mp4 works and everything, all up until the export process. I'd love to see this feature implemented.

Performance Issues

...is my computer amazingly slow? How long should it take for librosa to analyze a five minute long wav file?

I am getting like 45 second to a minute to create an Audio object, and even longer for my apparently awful synthesis code to run. Has anyone had comparable experiences?

TODO: Add list of prior contributors

Old Remix had lots of people on it, and we should mention them, 'cause they're great.

version 0.0.0

pip install is failing to find new lib changes because there are no version changes, the version is always 0.0.0.

In order to install, I can not use pip install amen, I need to instead use pip install git+git://github.com/algorithmic-music-exploration/amen

Deformation architecture

How do we want/expect people to manipulate audio within amen?

The synthesize function is great for re-arranging a clip by timing, but doesn't give us a handle on how to do things like, say, vocal subtraction or time-stretching.

Do we want to provide an object interface for this kind of thing? Or just let folks hack functions themselves? Either way, I think we should not support/allow in-place modification of the audio buffers, since it would either trigger an (expensive) feature analysis or have inconsistent results.

For example, a time-stretcher might look something like:

import pyrubberband as pyrb

def amen_time_stretch(audio, rate=1.0):
    y_stretch = pyrb.time_stretch(audio.raw_samples, audio.sample_rate, rate=rate)
    return Audio(raw_samples=y_stretch,
                         sample_rate=sample_rate,
                         analysis_sample_rate=audio.analysis_sample_rate)

This is pretty simple, but it bothers me that you have to access the Audio object's internals directly and propagate them manually. Maybe that's the only way though?

More generally, I could imagine effects that return multiple clips (eg, source separation), so a consistent object interface might be tricky to pull off here.

Feature: Pitch

@bmcfee says "cgt", which I can't find in librosa. Any wisdom?

Releasing 0.0.1?

How far out from this are we? We clearly don't have comparable feature extraction compared to the old remix, but is it worth announcing it / putting it on PyPi anyways?

Related to this is that the Monthly Music Hackathon for February is Automatic Music, so we could announce it as one of the talks.

Thoughts?

Feature: Pitch Class

Started in #40. I think the only open question is what we should name the keys in the FeatureCollection.