Comments (7)
Only doing checksums on mp3s is pretty limiting.
I think something like this is what we want. I'd prefer sha1, but it doesn't matter too much. Basically what is important is that the hashing is done on small parts of the file rather than the entire thing to avoid memory issues.
from dejavu.
what about skipping the metadata?. two files with different id3 tags, are still the same audio file.
from dejavu.
Ah yes, good point. Perhaps for now, if the file is mp3, we use the library you mentioned above (it appears as though it processes only small bits in memory at a time), and otherwise just a straight up md5/sha-1 hash of file contents? Not perfect, but a good way to start. Further contributions would be to add support for other audio file types (.wav, .ogg, etc).
from dejavu.
Sounds good. The project I linked is a standalone C app. But If we can use that for the .mp3s, Ill make a wrapper in python for that project.
from dejavu.
I don't really feel dejavu should be the one keeping track of what files have been fingerprinted.
At best I would just use an md5 of the full files content. Metadata changes are going to of course change the result but is that really something dejavu should be worrying about?
from dejavu.
It's an interesting idea, perhaps adding a checksum
field to the songs
table? Then before you fingerprint you check that. It wouldn't add any appreciable disk usage.
The good news is that since we enforce the uniqueness constraint, the exact same file being hashed won't take up space unnecessarily (inserts will be ignored). It would however waste CPU cycles. Given that, I can see the argument for including checksums.
Again as long as we don't affect the performance of other aspects, I think this is a worthy feature.
from dejavu.
Let's weigh the options..
if we add the feature we will:
/+ not refingerprint same songs
/+ we shall not need to lookup for the same hashes in the DB during recognition
/- we will have to do additional CPU job, but it will only depend on how often do we add new songs.
from dejavu.
Related Issues (20)
- run_tests.py: error: the following arguments are required src HOT 4
- Trying to fingerprint about 200 000 files. After 15000 files INSERT operation is very slow. HOT 1
- Do maximum_filter with cupy instead of scipy
- Failed to solve HOT 1
- Comparing short audio files
- Python Docker image bloated
- A directory of potentially duplicate audio files?
- it works well with python3.6 ~
- Not working for recordings
- Some errors when I use python3.7
- multiple concurrency
- Fingerprinting Audio With Differing Sample Rates HOT 1
- Jscght
- figerprinting file HOT 1
- Any way to generate the result image of the plots? HOT 1
- Does this package supports Python 3 HOT 4
- does this project work well on recognizing human speak? HOT 5
- Problem running the sample project HOT 1
- erors using fingerprint_file HOT 1
- Audio file
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dejavu.