craffel / pretty-midi Goto Github PK

Utility functions for handling MIDI data in a nice/intuitive way.

License: MIT License

Python 30.26% Jupyter Notebook 69.74%

pretty-midi's Introduction

pretty_midi contains utility function/classes for handling MIDI data, so that it's in a format which is easy to modify and extract information from.

Documentation is available here. You can also find a Jupyter notebook tutorial here (click here to load in Colab).

pretty_midi is available via pip or via the setup.py script. In order to synthesize MIDI data using fluidsynth, you need the fluidsynth program and pyfluidsynth.

If you end up using pretty_midi in a published research project, please cite the following report:

Colin Raffel and Daniel P. W. Ellis. Intuitive Analysis, Creation and Manipulation of MIDI Data with pretty_midi. In Proceedings of the 15th International Conference on Music Information Retrieval Late Breaking and Demo Papers, 2014.

Example usage for analyzing, manipulating and synthesizing a MIDI file:

import pretty_midi
# Load MIDI file into PrettyMIDI object
midi_data = pretty_midi.PrettyMIDI('example.mid')
# Print an empirical estimate of its global tempo
print(midi_data.estimate_tempo())
# Compute the relative amount of each semitone across the entire song, a proxy for key
total_velocity = sum(sum(midi_data.get_chroma()))
print([sum(semitone)/total_velocity for semitone in midi_data.get_chroma()])
# Shift all notes up by 5 semitones
for instrument in midi_data.instruments:
    # Don't want to shift drum notes
    if not instrument.is_drum:
        for note in instrument.notes:
            note.pitch += 5
# Synthesize the resulting MIDI data using sine waves
audio_data = midi_data.synthesize()

Example usage for creating a simple MIDI file:

import pretty_midi
# Create a PrettyMIDI object
cello_c_chord = pretty_midi.PrettyMIDI()
# Create an Instrument instance for a cello instrument
cello_program = pretty_midi.instrument_name_to_program('Cello')
cello = pretty_midi.Instrument(program=cello_program)
# Iterate over note names, which will be converted to note number later
for note_name in ['C5', 'E5', 'G5']:
    # Retrieve the MIDI note number for this note name
    note_number = pretty_midi.note_name_to_number(note_name)
    # Create a Note instance for this note, starting at 0s and ending at .5s
    note = pretty_midi.Note(velocity=100, pitch=note_number, start=0, end=.5)
    # Add it to our cello instrument
    cello.notes.append(note)
# Add the cello instrument to the PrettyMIDI object
cello_c_chord.instruments.append(cello)
# Write out the MIDI data
cello_c_chord.write('cello-C-chord.mid')

pretty-midi's People

Contributors

Stargazers

Watchers

Forkers

slychief bmcfee jonathanmarmor tygeng ompugao rafaelvalle adarob uncompiled circleksk bloodbare beckgom douglaseck takitsuba mistobaan ejhumphrey stefan-balke jsleep zhou13 bzamecnik schollz dcerny sccds wkddnjset ogugugugugua agangzz pukkapies silky eraoul akashmjn aozhi maezawa-akira ziptholomew kristinarakova afcarl frettable albert-han areeves87 ajk4 yx28 sricketts ho9science katsugeneration mondaugen azirly drunkwcodes apmcleod tengyifei vickyching aagnone3 nintorac blindelephants leavelove karimkalimu bdrydyk justinsalamon cflamant anshen666 otobox hinder23 popgun-labs tanikawa04 terrywang15 focus-yan jaeyeun97 shidong07 russul fourks annabelle115 sinhlt58 sleep-yearning cielbyt nielsrolf davidkant lzxzy siruih1 shiva12121 coryz182 sudanenator jasonzhang0619 adrienycart hitshydev suasy gulnazaki tingyu1215 timfelixbeyer linnabrown yaningxu muthissar spyroot lessse 99bomber5 chris666-sys jaedukseo dpwe harrisonexe kaplanalper zhangsanfeng86 mxkrn neslihancekic techthiyanes

pretty-midi's Issues

Timing issues in fluidsynth

For some MIDI files, some instruments have slightly different timing and get out of sync over time when using the fluidsynth synthesis function, due to rounding errors when accumulating the current sample (observed by @dkario)

Don't create a time signature by default when time signatures are present

As reported by @douglaseck here, we should not add a time signature event at tick 0 when there are time signature events present, as we currently do here.

Unit/regression tests

We need tests.

Drums don't get synthesized when sf2 doesn't have a preset

Sometimes a .mid file asks for a preset in the drum channel which is not 0. The built-in .sf2 file only has a single (?) preset in the drum bank, so fluidsynth throws an error and no drums are synthesized. Found by @dkario

get_pitch_class_transition_matrix returns unexpected results for some sequences.

It's hard to do simple next-note transition probabilities using get_pitch_class_transition_matrix for certain midi files.

This sequence:
Note(start=0.000000, end=0.200000, pitch=60, velocity=100)
Note(start=0.250000, end=0.450000, pitch=61, velocity=100)
Note(start=0.500000, end=0.700000, pitch=60, velocity=100)
Note(start=0.750000, end=0.950000, pitch=62, velocity=100)
Note(start=1.000000, end=1.200000, pitch=67, velocity=100)
Yields get_pitch_class_transition_matrix()
[[ 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]

By only changing the offset time (making the notes less staccato) I get different results:
Note(start=0.000000, end=0.237500, pitch=60, velocity=100)
Note(start=0.250000, end=0.487500, pitch=61, velocity=100)
Note(start=0.500000, end=0.737500, pitch=60, velocity=100)
Note(start=0.750000, end=0.987500, pitch=62, velocity=100)
Note(start=1.000000, end=1.237500, pitch=67, velocity=100)
Yields get_pitch_class_transition_matrix()
[[ 0. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]

This is due to the use of a hardcoded time_thresh:
# use 20hz(0.05s) as the maximum time threshold for transitions
time_thresh = 0.05

I imagine this is desired behavior for more complicated files. I believe the right thing to do is to compute vertical grouping first (chunking nearby notes into chords based on a time threshold) and then in a seconds step calculate next-note transition probabilities by moving horizontally through the chunked sequence. I guess you still might want to drop a transition based on a long pause. But in many kinds of music this is an odd claim to make. It amounts to claiming that a staccato performance of a piece should yield different note transition probabilities than the same piece played legato.

I don't think this warrants a fix(!) I just wanted to communicate my thoughts. Feel free to close out "Working as intended"

Remove midi dependency

Make pretty_midi able to do the MIDI data parsing, too

_tick_to_time is too slow

Instead, just make a big array which is index by tick and returns a time.

Tempo change on non-zero track warning causes an error now

This warning causes an error:

warnings.warn(("Tempo change events found on non-zero tracks."
    "  This is not a valid type 0 or type 1 MIDI "
    "file.  Timing may be wrong.", RuntimeWarning))

Should be

warnings.warn(("Tempo change events found on non-zero tracks."
    "  This is not a valid type 0 or type 1 MIDI "
    "file.  Timing may be wrong."), RuntimeWarning)

Thanks to @hkmogul for finding this

get_all_events method

Create a method which returns a list of all events, both for Instrument and PrettyMIDI objects. It should probably return a list of lists, where each inner list has the event's time, data, and probably the event itself.

Time is not a float. Note timings are sometimes float, sometimes np.float64

https://github.com/craffel/pretty-midi/blob/master/pretty_midi/containers.py#L17
In fact there's an explicit test which raises an error if time is a float. It seems to be an np.float64. This is just a documentation issue. However it raises a question: do you intend for note.start and note.end to be np.float64 or to be float? The documentation says float, but in fact you get np.float64 values. This has caused us some problems in going back and forth between protos and PrettyMIDI instances. I believe the transformation from float to np.float64 happens when timing is manipulated via self.__tick_to_time = np.zeros(max_tick + 1) (this default constructor does yield an np array of type np.float64).

In [1]: import pretty_midi

In [2]: x = pretty_midi.Note(10, 10, 1.00, 1.00)

In [3]: type(x.start)
Out[3]: float

In [4]: mf = pretty_midi.PrettyMIDI('/tmp/example.mid')
Reading /tmp/example.mid

In [5]: type(mf.instruments[0].notes[0].start)
Out[5]: numpy.float64

Initialize PrettyMIDI with a path to a midi file

or a file pointer instead of a midi.FileReader. This will make the midi module a dependency until pretty_midi implements MIDI parsing. It will also break backwards compatibility.

Utility functions

Most functions should have an inverse:

midi note number to hz
midi note number to note name
percussion note number to drum name
program number to instrument name for general MIDI
program number to instrument class for general MIDI (should not have inverse)
pitch bend value to absolute pitch in semitones

Don't allow the user to adjust the tempo over time

...without messing things up

Make tick_to_time and tick_scales private
add tick_to_time function

Notes go missing when the have overlapping start/end times

This may be a playback bug, not a pretty_midi bug, but Douglas reports

"This plays back really freaky if the multiplier is >= 1.0, fine if
it's < 1.0. Although at faster tempos it still gets kinda freaky and
has volume variations where it shouldn't. But that's probably a
playback problem, not an encoding problem?"

pm = pretty_midi.PrettyMIDI()

pm.instruments.append(pretty_midi.Instrument(56))

beat_dur = 0.2
note_dur = beat_dur * 1.0
time = 0.0

for beat in range(0,100):

    note = 60;
    pm.instruments[0].notes.append(pretty_midi.Note(90, note, time,
time + note_dur))
    time += beat_dur

midi_filename = "borked.mid"
pm.write(midi_filename)

synthesize() clips notes when they overlap in time and pitch

If the MIDI file contains two notes (in the same instrument) that have the same pitch and overlap in time, the second of the two will not get synthesized. I.e., say I have a C4 from 1-3s, and another C4 from 2.5-4s: the second note gets "killed" during synthesis.

Admittedly two overlapping notes by the same instrument is not physically possible for all instruments (it is for a guitar for example though), but I think that the correct behavior should be to give the onset of the second note priority over the offset of the first note?

Pitch bends on first note are ignored

Any pitch bends which occur before the first note off for some instrument are not stored because the instrument doesn't exist yet.

Don't collapse instruments by program number

When loading in a MIDI file, a single Instrument instance is created for all events on all channels and all tracks which have the same program number. This is problematic primarily because pitch bend and other control events will get collapsed onto a single instrument, when they shouldn't be. For example:

import pretty_midi
pm = pretty_midi.PrettyMIDI()
i = pretty_midi.Instrument(0)
i.notes.append(pretty_midi.Note(100, 36, 0.0, 1.0))
i.notes.append(pretty_midi.Note(100, 40, 0.0, 1.0))
i.notes.append(pretty_midi.Note(100, 45, 0.0, 1.0))
pm.instruments.append(i)
i = pretty_midi.Instrument(0)
i.notes.append(pretty_midi.Note(100, 66, 0.0, 1.0))
for n in range(8000):
    i.pitch_bends.append(pretty_midi.PitchBend(n, (n + 10)/8000.))
pm.instruments.append(i)
pm.write('test.mid')
pm2 = pretty_midi.PrettyMIDI('test.mid')
print len(pm2.instruments)
print len(pm2.instruments[0].notes)
print len(pm2.instruments[0].pitch_bends)

yields

1
4
8000

but

import midi
print len(midi.read_midifile('test.mid'))

yields 3. I.e., the net result is that when loading in this MIDI file which has 3 tracks (one timing track, one "chord" track with no pitch bends, and one "single note" track with a single note with many pitch bends) is that the two instrument tracks get merged into one Instrument which has all four notes and all pitch bends - meaning that the "chord" gets pitch bent, when it shouldn't. So, we should create separate Instrument instances for all channels and tracks.

MIDI files with corrupt pitch values are not handled correctly

Sometimes, midi reads in a MIDI file with has NoteOnEvents with data[0] > 127:

In [1]: import midi
In [2]: m = midi.read_midifile("data/clean_midi/mid/Celine Dion/That's The Way It Is.mid")
In [3]: for t in m:
   ...:     for e in t:
   ...:         if type(e) == midi.NoteOnEvent:
   ...:            if e.data[0] > 127:
   ...:                print e
   ...:
midi.NoteOnEvent(tick=4656, channel=7, data=[253, 75])
midi.NoteOnEvent(tick=0, channel=7, data=[254, 75])
midi.NoteOnEvent(tick=48, channel=7, data=[253, 0])
midi.NoteOnEvent(tick=0, channel=7, data=[254, 0])

This is because midi just loads in data via ord https://github.com/vishnubob/python-midi/blob/master/src/fileio.py#L94, so as long as the argument is a 8-bit char it will happily set a data value to a number more than 127. So, this likely happens for other events too, not just NoteOnEvents data[0]. This can create issues later on. Either midi should raise an exception, we should raise an exception, or we should issue a warning and ignore those events with invalid data values.

hooks to modify/save metadata?

I'm finding myself wanting to be able to generate midi files with embedded metadata (author, software used, version number, etc). Is that possible, and/or within the scope of pretty-midi?

Unused function arguments in get_pitch_class_histogram()

https://github.com/craffel/pretty-midi/blob/master/pretty_midi/pretty_midi.py#L617
The arguments use_velocity and use_duration aren't passed through from pretty_midi.py to each instrument.

Allow the user to adjust the tempo over time

Currently, tick_scales and tick_to_time are functionally read-only. Making this change will require a good deal of legwork.

Pitch bend semitones->value

Allow PrettyMIDI objects to be initialized with resolution, tempo change, etc

Instead of just setting the values by default.

Handle pitch bends

Come on, just handle pitch bends already!!! Sheesh...

get_beats returns different beats on different machines

If you run get_beats() on different machines, it can return different beat times.

Writing and reading back in should be roughly lossless

I.e. if you write a MIDI file out and read it back in, you should get at least the same collection of notes, time signature changes, tempo changes, pitch bends, key signatures, etc. (all of the data that pretty_midi stores)

Convert back to midi.Pattern for .mid writing

Currently the PrettyMIDI class can't convert BACK to a writeable midi.Pattern.

First note is sometimes not synthesized

For some tracks (e.g. duran_duran-come_undone.mid from cal500 midis), the first note is not synthesized. Spotted by @dkario

IndexError when calling get_beats on MIDI objects with no time signatures

At this line:
https://github.com/craffel/pretty-midi/blob/master/pretty_midi/pretty_midi.py#L444

Though rare, there are MIDI files with no time signature change events. We could auto-populate self.time_signature_changes with a 4/4 time signature for those files, but I think a better fix is to wrap a conditional around this line that just sets bpm to tempi[tempo_idx] when len(self.time_signature_changes) == 0.

Control changes

Allow for control changes to be stored, just like pitch bends.

Get "track" names for instruments

@craffel is it possible to obtain the MIDI "track" names for the midi_data.instruments ?

I need to convert a set of MIDI files to JAMS files, but I only want to extract the notes of one "track" (or instrument) from each MIDI file, and I know the name of that track. I've been playing around with pretty_midi but haven't been able to get at this information. As an example, if I import one of these MIDI files into GarageBand I can see the track names: http://i.imgur.com/ziGf3I7.png

Is there a way to extract this information using pretty_midi?

New logic in midi module breaks .write

There's a new flag in midi objects:
vishnubob/python-midi@0964c0b
It denotes whether the ticks are abs or rel. It defaults to True, which means the ticks are rel. In PrettyMIDI.write, we construct a midi object using absolute ticks, but midi thinks the ticks are rel. Then, when we try to make ticks rel
https://github.com/craffel/pretty-midi/blob/master/pretty_midi/pretty_midi.py#L517
the new code doesn't allow it.

Additional features for key/time signatures

cc @rafaelvalle

I'm writing unit tests for the whole library, and found a few things I think should be changed/fixed with the time/key signature changes.

They currently aren't being written out in PrettyMIDI.write; they should be included here. I started to do this myself, but I was unsure how to convert from our KeySignature/TimeSignature to python-midi data/events.
Related - I think it would be useful, for writing out any maybe in general, if midi_key_to_key_number got moved to a separate function in utilities.py and called something like mode_accidentals_to_key_number; instead of taking in a midi.event.KeySignature it would just take in num_accidentals and mode. Then, we should also have a key_number_to_mode_accidentals, which will be handy for writing out, I think.
midi_key_to_key_number and key_name_to_key_number don'tt have a Returns section in their docstring, I overlooked this when merging.
When constructing a PrettyMIDI object without a MIDI file, key_changes and time_signature_changes aren't getting created because they're created in the _load_metadata function which isn't called in that case. So, we need to create empty lists for them manually, as is done for instruments; https://github.com/craffel/pretty-midi/blob/master/pretty_midi/pretty_midi.py#L96

I'll update if I find anything else!

bend_range causes out of bounds error in get_piano_roll

When an instrument has a pitch bend which is past the last note, this causes an out of bounds error:

# Column indices effected by the bend
bend_range = np.r_[int(start_bend.time*fs):int(end_bend.time*fs)]
# Construct the bent part of the piano roll
bent_roll = np.zeros(piano_roll[:, bend_range].shape)

because the piano roll is initialized to only be big enough for notes (ignores pitch bends). Should use get_end_time instead.

Add synthesis method

Mostly for fun comparison, also beeps.

Add fs parameter to get_piano_roll/get_chroma

Right now, piano rolls (and therefore chroma matrices) are created by first sampling at 100 Hz, then computing the mean to aggregate over the supplied time intervals. If you want a higher sampling rate, or even just a different one, it's more simple and principaled to just use a different sampling rate and not do averaging. The averaging is mostly for longer, non-uniform intervals (like beats).

instrument.events -> instrument.notes

A better name. Will break backwards compatibility.

Instrument.fluidsynth fails when len(self.notes) == 0

Due to this line:
https://github.com/craffel/pretty-midi/blob/master/pretty_midi/instrument.py#L452
It should be handled as a special case and return an empty array.

Sphinx docs

estimate_tempo can give an index out of bounds error

In [1]: import pretty_midi

In [2]: a = pretty_midi.PrettyMIDI()

In [3]: a.estimate_tempo()
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-3-aa51704cfe91> in <module>()
----> 1 a.estimate_tempo()

/usr/local/lib/python2.7/site-packages/pretty_midi-0.0.1-py2.7.egg/pretty_midi/pretty_midi.pyc in estimate_tempo(self)
    342                 Estimated tempo, in bpm
    343         '''
--> 344         return self.estimate_tempii()[0][0]
    345
    346     def get_beats(self):

IndexError: index 0 is out of bounds for axis 0 with size 0

Make classes use utility functions for better printing

In their reprs.

Writing out and reading back in can lead to max tick error

With this file: http://www.jsbach.net/midi/bwv988/988-v04.mid
This code:

import pretty_midi
midi_data = pretty_midi.PrettyMIDI('988-v04.mid')
midi_data.write('/tmp/test.mid')
new_midi_data = pretty_midi.PrettyMIDI('/tmp/test.mid')

results in

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

pretty_midi/pretty_midi.pyc in __init__(self, midi_file, resolution, initial_tempo)
     66             if max_tick > MAX_TICK:
     67                 raise ValueError(('MIDI file has a largest tick of {},'
---> 68                                   ' it is likely corrupt'.format(max_tick)))
     69 
     70             # Create list that maps ticks to time in seconds

ValueError: MIDI file has a largest tick of 268435458, it is likely corrupt

Reported by @douglaseck.

Add example usage to README or tests/examples

For the seeds

Notes disappear when writing

Seems to be an issue either with how midi.write_midifile is being called or midi.write_midifile itself, because the notes are being collected correctly beforehand.

In [4]: for i in xrange(10):
    pm = pretty_midi.PrettyMIDI('test.mid')
    print pm.get_onsets().size
    pm.write('test.mid')
   ...:
4359
3518
3490
3483
3480
3478
3476
3475
3475
3474

Spotted by @hkmogul

Implement get_beats

Currently it just returns None.

Add intro to docs

The docs could use an intro and usage examples, also a reference to a paper to cite if/when it's available.

PrettyMIDI.fluidsynth fails when all of self.instruments have no notes

In [1]: import pretty_midi
In [2]: pm = pretty_midi.PrettyMIDI()
In [3]: pm.instruments.append(pretty_midi.Instrument(0, 0))
In [4]: pm.instruments.append(pretty_midi.Instrument(0, 0))
In [5]: pm.fluidsynth()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-8fb00d507fbe> in <module>()
----> 1 pm.fluidsynth()

pretty_midi/pretty_midi.pyc in fluidsynth(self, fs, sf2_path)
    717             synthesized[:waveform.shape[0]] += waveform
    718         # Normalize
--> 719         synthesized /= np.abs(synthesized).max()
    720         return synthesized
    721

numpy/core/_methods.pyc in _amax(a, axis, out, keepdims)
     24 # small reductions
     25 def _amax(a, axis=None, out=None, keepdims=False):
---> 26     return umr_maximum(a, axis, None, out, keepdims)
     27
     28 def _amin(a, axis=None, out=None, keepdims=False):

ValueError: zero-size array to reduction operation maximum which has no identity

Should return np.array([]).

Empty array dtype issue in latest numpy

Due to this change in numpy 1.10:

https://github.com/numpy/numpy/blob/master/doc/release/1.10.0-notes.rst#default-casting-rule-change

empty arrays are cast as type np.float64 and cannot be added to type np.int16, which is what all calls to get_piano_roll (both pretty_midi.py and instrument.py). This can be solved I think by forcing all arrays to explicitly be of type np.int16 (even the empty ones). MWE to reproduce the error:

`import jams
import pretty_midi
import numpy as np

print jams.version
print pretty_midi.version
print np.version

jam = jams.load('./jams/TRAAAZF12903CCCF6B.jams')
ann = jam.search(namespace='beat')[0]

midi_md5 = ann.annotation_metadata.annotator.midi_md5

midi_object = pretty_midi.PrettyMIDI(
'mid_aligned/TRAAAZF12903CCCF6B/{}.mid'.format(midi_md5))

piano_roll = midi_object.get_piano_roll()`

Feature extraction

Some enthusiastic individual might be interested in implementing all of the features described in section 4.5 here:

http://jmir.sourceforge.net/publications/PhD_Dissertation_2010.pdf

is_drum referenced before assignment

  File "build/bdist.macosx-10.9-x86_64/egg/pretty_midi/pretty_midi.py", line 179, in _load_instruments
UnboundLocalError: local variable 'is_drum' referenced before assignment

Should be moved up, was broken in 4a6a714