Comments (3)
Hi, this is expected because the CREPE model is only trained for pitch accuracy and not voice activity detection.
Although you could devise a heuristic based on the confidence metrics, please note that the model has never seen silence during training and may not classify all silence as "low confidence".
Detecting when exactly there is a pitched sound is nontrivial and studied with the keywords such as voice activity detection (related: #47).
Regarding the pitch inaccuracy, while it could be an issue with the sample rate (the model expects 16kHz audio) or simply it is making a wrong prediction, I believe the 220-260Hz pitch range you have for female voice is normal, as it can depend on the individual and the prosody of the utterance you're using. I, for example, am male and can produce between 70Hz and 400Hz. To make sure, you can cross-validate using other pitch tracking methods such as pYIN, SWIPE, or SPICE.
from crepe.
thanks for the explanation!
I could understand that the pitch should not be zero during the silent period. But it is still not making sense to me that the pitch estimated in silent period is not constant. I tried the sweep.wav included in the repo and it seems that the pitch is constant during silence (but the confidence is also 0?). But for the audio I used, crepe will give a non constant pitch during silent period. I am not sure if it is the correct thing. You said that the model has never seen silence during training, so I am considering that if removing the silent period could give me a better pitch estimation.
I also notice that the same audio segment could have different f0 estimation. for this segment, if I include it in a very long audio and estimate f0, or just use this small segment to estimate f0, or delete some silent region to estimate f0, I could get different estimation. I am a little confused about the result
(I could also notice a lag in pitch signal compared with the audio signal, is it common? for example:
)
from crepe.
But for the audio I used, crepe will give a non constant pitch during silent period. I am not sure if it is the correct thing.
During silence the model will still do its best to detect whatever in the silent segment, which might just be some static noise from the microphone. So the output may not be always the same during silence. Again, the model wasn't trained on silent audio and will just try to extrapolate what it knows about pitched signal, so its output during silence is not reliable.
You said that the model has never seen silence during training, so I am considering that if removing the silent period could give me a better pitch estimation.
If you have the voicing labels or a good heuristic to get them, you can post-process the outputs to suppress whatever was predicted during silence. This can be done since the model works in a fully convolutional way, and slicing the silent audio won't make a big difference.
I also notice that the same audio segment could have different f0 estimation. for this segment, if I include it in a very long audio and estimate f0, or just use this small segment to estimate f0, or delete some silent region to estimate f0, I could get different estimation. I am a little confused about the result.
Depending on the way the signal was sliced, the 1024-sample segment might be from a different locations, and the model's output might be sensitive to how they're sliced especially on the silent portions. It'll help diagnose the issue if you zero out the predictions during silence.
Also I think it'll help if you plot all graphs w.r.t time (in seconds) as opposed to samples. That'll help identify any misalignments between audio and annotations.
from crepe.
Related Issues (20)
- Can you support the input of [Batch,samples]?
- New question old issue on install error HOT 3
- CREPE model Tensorflow on Android HOT 4
- Can you provide these 6 training data?
- Monterey HOT 1
- performance on MIK-1K HOT 2
- Re-training for alternate samplerate HOT 2
- I ran into an error when I install crepe HOT 1
- Fail to install crepe
- update PYPI package to work with hmmlearn==0.2.8 and viterbi=True HOT 1
- Permute layer HOT 2
- Poor confidence issue
- Maximum real-time speed of the model HOT 1
- Pitch estimation for 44.1kHz audio
- AttributeError: type object 'Distribution' has no attribute '_finalize_feature_opts'
- Where does 1997.3794084376191 come from?
- SSL Error when installing crepe HOT 13
- preprocessing implementation in other languages
- Support `hmmlearn==0.3.0` for M1 Mac HOT 2
- Getting a graph execution error / JIT compilation failed with latest versions
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from crepe.