Comments (11)
I have tried the same on one of the mirex 2009 database. There also I could observe the overshoots when plotted against the ground truths given in mirex database.
from essentia.
Have you looked at the vector of confidence values? If the values are negative it means the algorithm has estimated these frames as unvoiced and their pitch value should be discarded.
from essentia.
Hi Justin Salamon,
Thanks for your reply. I have tried what you suggested. Some of the overshoot portions got eliminated while discarding the negative pitch confidence values. But still some are remaining as I am plotting below(with reference to the first figure in this thread).
When I analyzed further those seem to be silence portions, but returns some positive pitch confidence values. I even tried applying some threshold value for the pitch confidence, but that affects some of the voiced section contours.
Now I am doubting whether it has to do anything with the voice vibrato inside the voicing detection part.
Once again thanks for the suggestions.
from essentia.
Actually a likely cause is the fact that because the algorithm was originally designed for polyphonic music (and not a monophonic melody with silences) the salience function is generating some fake contours from the background noise (even if it's very low) during silent segments of the recording. You could play with the voicing tolerance parameter (setting it to a lower value than the default 0.2) to see if that helps. If you know the expected frequency range of the melody you can limit the min/max frequencies as well. I don't think it's likely to be the vibrato detection because the segments you've indicated don't present a stable vibrato, and anyway unless you set it manually by default vibrato detection is deactivated in the implementation.
Since quite a few people are using the algorithm for f0 estimation of monophonic sources in addition to polyphonic sources, we may look into adding some functionality to handle this issue specifically, but it's not implemented at the moment.
from essentia.
Hi Justin Salamon,
Thanks again for your inputs. The change in voicing tolerance is not giving any improvements. :-(
Just now, after reading the reference paper ([1] J. Salamon and E. Gómez, "Melody extraction from polyphonic music signals using pitch contour characteristics," IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 6, pp. 1759–1770, 2012.) , I could observe the following things:
- pitch continuity cue (maximum allowed pitch change during 1 ms time period) is 27.5625 in the implementation, but it seems in the paper the suggested value is 80 cents
- the pitch confidence values will become negative only after declaring "guessUnvoiced" as "true" in predominatmelody.h
- minDuration", "the minimum allowed contour duration [ms] is default 100ms,- Do we have to play with this?
- Just now I downloaded the vamp plugin for melodia from MTG website and after choosing ployphonic settings (with voicing tolerance 0.2), I got a plot for the same input as follows. Here I think the silence portions are well taken . So my thinking is that it would not be a problem with the voice tolerance or the monophonic input. :-)
Can this be an issue with the filtering out of non melody contours in the final melody selection or some issues with the silence detection?
Thanks for your inputs.
from essentia.
Hi,
Even I have been trying the same with polyphonic songs. I also got lots of overshoots there which according to me was not correct. It seems there is some problem with melody contour creation in essentia
Thanks
C. Smith
from essentia.
@Philipsciby regarding your observations:
- that's because in the paper the hop size used is 2.9ms rather than 1ms. 80/2.9 = 27.5625, so it's the same.
- yes, because the negative values are only used to indicate unvoiced contours. If the algorithm is not estimating any unvoiced contours, then there won't be any negative confidence values.
- you can but I don't think it's related.
- The vamp plugin and the Essentia version are different implementations of the same algorithm, hence the differences in their output.
@camillussmith remember melody extraction is still an open research problem and you can't expect perfect results for every file. The MELODIA algorithm is state-of-the-art, but it will still make some estimation errors (e.g. the overshoot may be octave errors for example).
from essentia.
Thanks for your inputs. Now I have tried with the values "pitch continuity cue" = 13.5625 and the "minDuration" = 50ms. After discarding the negative pitch confidence values, I am almost getting a satisfactory contour with the monophonic song I tried. I am not sure whether it is the correct way to handle the issue. Hope essentia will soon come up with a correct implementation similar to that of melodia.dll(vamp) where the melody extraction seems most correct.
Thanks
Philips
from essentia.
I am still getting issues for some other inputs with the above configuration.
from essentia.
While essentia allows it, it is not recommended to change the pitch continuity cue and the minDuration parameters as these are internal to the algorithm and changing them can lead to undesired effects.
Again, the algorithm is designed for polyphonic signals, and while it often works well for monophonic signals, this requires slightly different processing which as you have noted is included in the vamp plug-in but not yet in the essentia implementation. You could also try the PitchYinFFT algorithm in essentia which is designed for monophonic signals.
Anyway, while this functionality may be added in the future, it's more of a feature request than an actual issue (the code works fine). The issue has been labeled as 'enhancement'.
from essentia.
Hi Justin Salamon,
I understand the same.What I observed was, for the same monophonic song recorded through a microphone when given to melodia vamp plug in and essentia software, the vamp plug in gives a very good reults while the essentia output seems a little corrupted. My point was, melodia vamp plugin implemented by you is correct and some thing would have missed in essentia implementations.
Thanks
Philips
from essentia.
Related Issues (20)
- PredominantPitchMelodia requires reinstantiation for sequential audio processing HOT 2
- Installation issues HOT 1
- documentation website is down HOT 3
- Essentia and Unity HOT 4
- Is there a model to determine the time-stamps to beat mapping? HOT 1
- RhythmExtractor2013 ticks doesn't accurately provide the information of when the first beat occurs HOT 2
- TempoCNN minimum expected audio length
- Problem: cannot import name [TensorflowPredict...] from essentia.streaming HOT 2
- Problem connecting the Pipeline for DiscogsEffnet ('StreamingAlgo' object has no attribute 'poolIn') HOT 3
- Extractor Docs: "See detailed documentation" link 404's. HOT 1
- Import and runtime error on M1 Mac. Is Essentia Python wheels compatible with older version like Big Sur? HOT 10
- cannot import name 'TensorflowPredictMusiCNN' from 'essentia.standard' HOT 4
- Adding CMake and VS2019/2022 support HOT 30
- Update Vamp plugin to the latest Vamp SDK
- MSVC Vamp plugin builds for Windows
- speech vs music algorithms HOT 3
- Unexpected performance in melody tracking
- PitchMelodia requires 8192 samples
- Build fails with "error: ‘av_register_all’ was not declared in this scope"
- Pip install fail on Linux HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from essentia.