Comments (9)
I haven't actually tried it with that many songs, it could very well be.
Could you perhaps do some SQL profiling and see where the issue is? I'm somewhat surprised since the hash is indexed.
How many GB is your fingerprints
table?
from dejavu.
I'm sorry , I have checked that the sql query is fast , but a 5s audio have about 4000 fingerprints,those fetch about 2000000 rows from mysql, to compute this will spend so much times.
Also After #26 fixed hash is not primary key which caused the data became much bigger。
hope you could read my poor english,^_^
from dejavu.
when I read Code ,I found the fignerprints are hashed from two points, I think this will make many repeat fingerprint ,could i can choose three points so that it would make the fingerprints more dispersible ,and when reconizing that will fetch little rows from mysql?
from dejavu.
Fingerprints table has a UNIQUE
constraint, thus there should never be a duplicate (hash, offset, song_id)
tuple, and thus no repeat fingerprints. The hash itself, yes, there will be many repeats of that, this is important as songs often repeat themselves.
from dejavu.
diffent songs have same hash,when reconizing ,query only use hash as condition,that will get too many rows about 2000000 that will spend much time,for 400 or more songs
from dejavu.
diffent songs have same hash,when reconizing ,query only use hash as condition,that will get too many rows about 2000000 that will spend much time,for 400 or more songs
from dejavu.
The query can't use the song_id
or true offset
, they are unknown.
Dejavu's constants in fingerprint.py
aren't magic. They are tunable parameters that control how many hashes are made and how they are made.
You can try lowering DEFAULT_FAN_VALUE
and the overlap ratio too perhaps. A larger PEAK_NEIGHBORHOOD_SIZE
may also make fewer fingerprints, though possible at the cost of accuracy. A larger DEFAULT_WINDOW_SIZE
will cause there to be more frequency bins, and thus more likely less collisions.
The problem you are seeing is not too many hashes, but specifically too many hashes shared between songs.
Experiment. See what works for you. And if you find good parameters for your particular use case and corpus size, do let us know.
from dejavu.
very very thanks for your reply ,I'll try
from dejavu.
for i in range(len(peaks)):
for j in range(fan_value):
I found this code in fingerprint.py ,when j equal to zero ,the hash was made from the same two points,that will cause so many repeat hashs, is this a bug? now I have changed the fan_value to 3,and make the j start from 1 which works very good!
from dejavu.
Related Issues (20)
- run_tests.py: error: the following arguments are required src HOT 4
- Trying to fingerprint about 200 000 files. After 15000 files INSERT operation is very slow. HOT 1
- Do maximum_filter with cupy instead of scipy
- Failed to solve HOT 1
- Comparing short audio files
- Python Docker image bloated
- A directory of potentially duplicate audio files?
- it works well with python3.6 ~
- Not working for recordings
- Some errors when I use python3.7
- multiple concurrency
- Fingerprinting Audio With Differing Sample Rates HOT 1
- Jscght
- figerprinting file HOT 1
- Any way to generate the result image of the plots? HOT 1
- Does this package supports Python 3 HOT 4
- does this project work well on recognizing human speak? HOT 5
- Problem running the sample project HOT 1
- erors using fingerprint_file HOT 1
- Audio file
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dejavu.