Giter Site home page Giter Site logo

Comments (9)

worldveil avatar worldveil commented on July 22, 2024

I haven't actually tried it with that many songs, it could very well be.

Could you perhaps do some SQL profiling and see where the issue is? I'm somewhat surprised since the hash is indexed.

How many GB is your fingerprints table?

from dejavu.

suxianbaozi avatar suxianbaozi commented on July 22, 2024

I'm sorry , I have checked that the sql query is fast , but a 5s audio have about 4000 fingerprints,those fetch about 2000000 rows from mysql, to compute this will spend so much times.
Also After #26 fixed hash is not primary key which caused the data became much bigger。
hope you could read my poor english,^_^

from dejavu.

suxianbaozi avatar suxianbaozi commented on July 22, 2024

when I read Code ,I found the fignerprints are hashed from two points, I think this will make many repeat fingerprint ,could i can choose three points so that it would make the fingerprints more dispersible ,and when reconizing that will fetch little rows from mysql?

from dejavu.

worldveil avatar worldveil commented on July 22, 2024

Fingerprints table has a UNIQUE constraint, thus there should never be a duplicate (hash, offset, song_id) tuple, and thus no repeat fingerprints. The hash itself, yes, there will be many repeats of that, this is important as songs often repeat themselves.

from dejavu.

suxianbaozi avatar suxianbaozi commented on July 22, 2024

diffent songs have same hash,when reconizing ,query only use hash as condition,that will get too many rows about 2000000 that will spend much time,for 400 or more songs

from dejavu.

suxianbaozi avatar suxianbaozi commented on July 22, 2024

diffent songs have same hash,when reconizing ,query only use hash as condition,that will get too many rows about 2000000 that will spend much time,for 400 or more songs

from dejavu.

worldveil avatar worldveil commented on July 22, 2024

The query can't use the song_id or true offset, they are unknown.

Dejavu's constants in fingerprint.py aren't magic. They are tunable parameters that control how many hashes are made and how they are made.

You can try lowering DEFAULT_FAN_VALUE and the overlap ratio too perhaps. A larger PEAK_NEIGHBORHOOD_SIZE may also make fewer fingerprints, though possible at the cost of accuracy. A larger DEFAULT_WINDOW_SIZE will cause there to be more frequency bins, and thus more likely less collisions.

The problem you are seeing is not too many hashes, but specifically too many hashes shared between songs.

Experiment. See what works for you. And if you find good parameters for your particular use case and corpus size, do let us know.

from dejavu.

suxianbaozi avatar suxianbaozi commented on July 22, 2024

very very thanks for your reply ,I'll try

from dejavu.

suxianbaozi avatar suxianbaozi commented on July 22, 2024

for i in range(len(peaks)):
for j in range(fan_value):

I found this code in fingerprint.py ,when j equal to zero ,the hash was made from the same two points,that will cause so many repeat hashs, is this a bug? now I have changed the fan_value to 3,and make the j start from 1 which works very good!

from dejavu.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.