xavriley / crepe_notes Goto Github PK

Post-processing for CREPE to turn f0 pitch estimates into discrete notes e.g. MIDI

License: GNU General Public License v3.0

Makefile 5.15% Python 94.85%

crepe_notes's Introduction

A Github Pages template for academic websites. This was forked (then detached) by Stuart Geiger from the Minimal Mistakes Jekyll Theme, which is © 2016 Michael Rose and released under the MIT License. See LICENSE.md.

I think I've got things running smoothly and fixed some major bugs, but feel free to file issues or make pull requests if you want to improve the generic template / theme.

Note: if you are using this repo and now get a notification about a security vulnerability, delete the Gemfile.lock file.

Instructions

Register a GitHub account if you don't have one and confirm your e-mail (required!)
Fork this repository by clicking the "fork" button in the top right.
Go to the repository's settings (rightmost item in the tabs that start with "Code", should be below "Unwatch"). Rename the repository "[your GitHub username].github.io", which will also be your website's URL.
Set site-wide configuration and create content & metadata (see below -- also see this set of diffs showing what files were changed to set up an example site for a user with the username "getorg-testacct")
Upload any files (like PDFs, .zip files, etc.) to the files/ directory. They will appear at https://[your GitHub username].github.io/files/example.pdf.
Check status by going to the repository settings, in the "GitHub pages" section
(Optional) Use the Jupyter notebooks or python scripts in the markdown_generator folder to generate markdown files for publications and talks from a TSV file.

See more info at https://academicpages.github.io/

To run locally (not on GitHub Pages, to serve on your own computer)

Clone the repository and made updates as detailed above
Make sure you have ruby-dev, bundler, and nodejs installed: sudo apt install ruby-dev ruby-bundler nodejs
Run bundle clean to clean up the directory (no need to run --force)
Run bundle install to install ruby dependencies. If you get errors, delete Gemfile.lock and try again.
Run bundle exec jekyll liveserve to generate the HTML and serve it from localhost:4000 the local server will automatically rebuild and refresh the pages on change.

Changelog -- bugfixes and enhancements

There is one logistical issue with a ready-to-fork template theme like academic pages that makes it a little tricky to get bug fixes and updates to the core theme. If you fork this repository, customize it, then pull again, you'll probably get merge conflicts. If you want to save your various .yml configuration files and markdown files, you can delete the repository and fork it again. Or you can manually patch.

To support this, all changes to the underlying code appear as a closed issue with the tag 'code change' -- get the list here. Each issue thread includes a comment linking to the single commit or a diff across multiple commits, so those with forked repositories can easily identify what they need to patch.

crepe_notes's People

Contributors

Stargazers

Watchers

Forkers

michspark lindarahaoui

crepe_notes's Issues

Questions regarding the input audio and output MIDI.

Thanks for creating this awesome package.

I have a few questions during my trials:

I tried transcribing two separated tracks from one song (bass/vocal) and I found that the tempo of predicted MIDI is different. It seems that the bass transcription gives a better estimation of the tempo with the beats at the first note of the bars. The transcribed vocal track has a bpm of 120, which seems like a default setting. How is the tempo estimated? Can I manually designate the tempo in output MIDIs with the same absolute time?
Are there any effects if I use the same input track with different amplitudes, e.g., an unnormalized one and a normalized one?

Thanks!

Publish updated GT Flute 99 annotations

https://github.com/izzymaclachlan/datasets/tree/master/GT-ITM-Flute-99 is the dataset used in the paper but it was tricky to evaluate. It seems several files use flutes with a pitch standard of A=432Hz.

I tried my best to correct this but automatic analysis of tuning isn't always accurate. I need to go through the 99 files and manually verify the ground truth ideally.

Integrate onset detection for repeated notes more gracefully

At present we refer to a file generated by madmom in a separate process which is not ideal.

Also the current method of checking for high probability onsets within existing segments is slightly hacky.

It would be better to integrate the onset activation time series into the combined signal used for segmentation. The issue with repeated notes is that the pitch gradient is basically flat so we can't multiply anything with it at those points.

An idea is to extract the maximally flat sections, turn those into a stream of 1s and then multiply that with the onset activations. That should give us only the onset activations that occur during long segments (ie repeated notes)

Evaluate performance in the presence of background noise

Following the original CREPE paper, testing with white, pink, brown and pub noise would be good. Allow testing on outputs that have been through a source separation model.

The hypothesis is that CREPE's robustness to noise will carry through to this method and make it better in these contexts than other similar methods.

Experiment with quantized pitch contour at different granularity

As per https://archives.ismir.net/ismir2017/paper/000061.pdf we could quantize the pitch contour to something less than a semitone and see if it improves the output.

Handle slides and scoops better - possibly with pitch bends

During the initial segmentation, a segment with a wide variance is likely to be a slide e.g. at the very start of a note. Currently we take the median of this sement resulting in a short note with essentially a random pitch.

Anecdotally this doesn't sound too bad but it does harm the accuracy metrics for precision, recall and f-measure.

Other methods treat these slides as note transitions (e.g. https://www.mdpi.com/2076-3417/12/15/7391) which makes sense in the vocal context, but I'm not sure that it helps if the target output is MIDI. Either a note is on or off.

We could also model the pitch contour more accurately by using MIDI pitch bends. This is the approach taken by Basic Pitch. Working with pitch bends in ground truth annotations is cumbersome though. It also becomes more difficult when the recordings are not tuned to the A440Hz standard e.g. at a quarter tone out are you bending to a standard midi note from above? Or from below? This needs more thought.

Add CSV output as an option

Something that output midi notes and hz directly to CSV would be useful for tools like mir_eval, avoiding the conversion steps to/from MIDI.

This would also help in cases where the tuning standard is not A440Hz as we could output the median for the note in Hz directly and avoid having to rely on pitch bends in MIDI.

Add Pesto implementation ?

Can you add Pesto implementation ?

Add unit tests

The Filosax dataset has plenty of examples which challenge this method. Fast moving passages, repeated notes, slow semitone shifts - these are all things where it would be good to have automated testing for. Just needs the time and effort to curate some example audio files

Improve debugging - output combined signal graphs, mutli-channel MIDI

As per title, allow export of the combined signal and detected peaks for debugging/analysis

Include comparison table with other methods, including MIDI DDSP

It would be nice to include an up to date table of results for common datasets, possibly updated by Github Actions on deploy (or similar)

In particular, I'd still like to get a proper evaluation for the MIDI segmentation code in MIDI DDSP. They have some quite sophisticated code, using strided checks for pitch changes etc., but it isn't readily available as a library so it needs some work to extract into a command.