Giter Site home page Giter Site logo

Comments (18)

Wessie avatar Wessie commented on July 22, 2024

The primary key isn't the only constraint there, UNIQUE KEY song_id (song_id,offset,hash) tells the database that there can't ever be a duplicate row that contains the exact same song_id, offset and hash pairing.

from dejavu.

Wessie avatar Wessie commented on July 22, 2024

On second look, no that is indeed an error in the schema. We should have a separate id as PRIMARY KEY and keep the UNIQUE constraint for the other fields.

This is a backwards incompatible change though, we need to check this out with @worldveil.

from dejavu.

pguridi avatar pguridi commented on July 22, 2024

Yes, I can see that. But doesn't apply both constraints?.
I think, that with the current schema, is impossible to save a same hash, with different offset and song_id. Because the composite unique constraint will not apply, but the primary key on "hash" will.

from dejavu.

pguridi avatar pguridi commented on July 22, 2024

because the insert query uses "INSERT IGNORE" , all the constraint errors will be silent, not only the ones from the UNIQUE KEY song_id (song_id,offset,hash), also the failing rows from the "hash primary key"..

from dejavu.

Wessie avatar Wessie commented on July 22, 2024

@pguridi indeed, this is a bug. Although not one that will reduce accuracy most likely. Would still be good to fix it. You should use the correct method in the ORM.

from dejavu.

pguridi avatar pguridi commented on July 22, 2024

yes. Ill add an id primary key field, remove the primary key from hash and keep the composite constraint.

from dejavu.

worldveil avatar worldveil commented on July 22, 2024

The primary key on hash is for performance reasons, why do you want to change it?

from dejavu.

Wessie avatar Wessie commented on July 22, 2024

The change is needed because it is impossible to enter both (examples) (hash=10, offset=0, song_id=0) and (hash=10, offset=100, song_id=0). But this applies for anything where hash is the same but with a different offset or song_id combination.

The fix is to add an id column as new primary key and add an index to hash to keep the performance up.

from dejavu.

pguridi avatar pguridi commented on July 22, 2024

indeed, with the primary key in the "hash" field, the composite unique constraint UNIQUE KEY song_id (song_id,offset,hash) would never be used. Because there will not be a single repeated hash in the whole table.

from dejavu.

worldveil avatar worldveil commented on July 22, 2024

Ah, yes I overlooked that. Is there any key/indexing configuration where we can avoid adding an id yet still have 1) index on hash, and 2) unique constraint on hash, offset, song_id? The inclusion of an id field will cause an increase in storage size.

from dejavu.

Wessie avatar Wessie commented on July 22, 2024

@worldveil we could try using a composite primary key, but I'm not sure how well supported it is on older MySQL versions or when using ORMs.

from dejavu.

pguridi avatar pguridi commented on July 22, 2024

@Wessie AFAIK, SQLAlchemy supports composite foreign keys. Also sqlite supports it. "Both single column and composite (multiple column) primary keys are supported." [1]

[1] http://www.sqlite.org/lang_createtable.html

from dejavu.

worldveil avatar worldveil commented on July 22, 2024

I'd say let's go for composite key if we can then. What would the syntax be for that? My SQL is a little rusty.

If that won't work then we can add an id but I'd like to avoid that as it adds a lot of disk space since we'd add one to each fingerprint.

from dejavu.

worldveil avatar worldveil commented on July 22, 2024

Should be a simple change to this:

CREATE TABLE IF NOT EXISTS `fingerprints` (
  `hash` binary(10) NOT NULL,
  `song_id` mediumint(8) unsigned NOT NULL,
  `offset` int(10) unsigned NOT NULL,
  INDEX (`hash`),
  UNIQUE KEY `uniq_constraint` (`song_id`,`offset`,`hash`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

What's odd is as @pguridi says, MySQL for me allows different song_id, hash combinations even with the primary key. However, just using INDEX should be the correct way of doing it.

from dejavu.

Wessie avatar Wessie commented on July 22, 2024
CREATE TABLE IF NOT EXISTS `fingerprints` (
  `hash` binary(10) NOT NULL,
  `song_id` mediumint(8) unsigned NOT NULL,
  `offset` int(10) unsigned NOT NULL,
  PRIMARY KEY (`hash`, `song_id`, `offset`)
  UNIQUE KEY `uniq_constraint` (`hash`, `song_id`, `offset`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Should be the correct version, please use utf8 standard as well.

from dejavu.

worldveil avatar worldveil commented on July 22, 2024

Yes good on utf8. We need the index on hash, though. INDEX doesn't enforce uniqueness, it's just for lookup speed. Also the primary and unique there are redundant?

from dejavu.

Wessie avatar Wessie commented on July 22, 2024

I'm not sure the INDEX is required since it is also a primary key. But adding it shouldn't hurt.

from dejavu.

worldveil avatar worldveil commented on July 22, 2024

The SQL statement to retrieve hashes works on a WHERE hash = <value> query. Thus we need an index that hashes on only the hash field. The primary key here is redundant to the UNIQUE one, it only ensures a fast lookup on all three fields, which we don't need (and don't think we use). We need only two things:

  1. Fast lookup on hash field
  2. Unique constraint on (hash, song_id, offset)

Thus, the minimal amount of indexes to achieve that is:

CREATE TABLE IF NOT EXISTS `fingerprints` (
  `hash` binary(10) NOT NULL,
  `song_id` mediumint(8) unsigned NOT NULL,
  `offset` int(10) unsigned NOT NULL,
  INDEX (`hash`)
  UNIQUE KEY `uniq_constraint` (`hash`, `song_id`, `offset`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

from dejavu.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.