Comments (18)
The primary key isn't the only constraint there, UNIQUE KEY
song_id (
song_id,
offset,
hash)
tells the database that there can't ever be a duplicate row that contains the exact same song_id
, offset
and hash
pairing.
from dejavu.
On second look, no that is indeed an error in the schema. We should have a separate id
as PRIMARY KEY
and keep the UNIQUE
constraint for the other fields.
This is a backwards incompatible change though, we need to check this out with @worldveil.
from dejavu.
Yes, I can see that. But doesn't apply both constraints?.
I think, that with the current schema, is impossible to save a same hash, with different offset and song_id. Because the composite unique constraint will not apply, but the primary key on "hash" will.
from dejavu.
because the insert query uses "INSERT IGNORE" , all the constraint errors will be silent, not only the ones from the UNIQUE KEY
song_id (
song_id,
offset,
hash)
, also the failing rows from the "hash primary key"..
from dejavu.
@pguridi indeed, this is a bug. Although not one that will reduce accuracy most likely. Would still be good to fix it. You should use the correct method in the ORM.
from dejavu.
yes. Ill add an id
primary key field, remove the primary key from hash
and keep the composite constraint.
from dejavu.
The primary key on hash
is for performance reasons, why do you want to change it?
from dejavu.
The change is needed because it is impossible to enter both (examples) (hash=10, offset=0, song_id=0)
and (hash=10, offset=100, song_id=0)
. But this applies for anything where hash
is the same but with a different offset
or song_id
combination.
The fix is to add an id
column as new primary key
and add an index
to hash
to keep the performance up.
from dejavu.
indeed, with the primary key in the "hash" field, the composite unique constraint UNIQUE KEY
song_id (
song_id,
offset,
hash)
would never be used. Because there will not be a single repeated hash in the whole table.
from dejavu.
Ah, yes I overlooked that. Is there any key/indexing configuration where we can avoid adding an id
yet still have 1) index on hash
, and 2) unique constraint on hash
, offset
, song_id
? The inclusion of an id
field will cause an increase in storage size.
from dejavu.
@worldveil we could try using a composite primary key, but I'm not sure how well supported it is on older MySQL versions or when using ORMs.
from dejavu.
@Wessie AFAIK, SQLAlchemy supports composite foreign keys. Also sqlite supports it. "Both single column and composite (multiple column) primary keys are supported." [1]
[1] http://www.sqlite.org/lang_createtable.html
from dejavu.
I'd say let's go for composite key if we can then. What would the syntax be for that? My SQL is a little rusty.
If that won't work then we can add an id
but I'd like to avoid that as it adds a lot of disk space since we'd add one to each fingerprint.
from dejavu.
Should be a simple change to this:
CREATE TABLE IF NOT EXISTS `fingerprints` (
`hash` binary(10) NOT NULL,
`song_id` mediumint(8) unsigned NOT NULL,
`offset` int(10) unsigned NOT NULL,
INDEX (`hash`),
UNIQUE KEY `uniq_constraint` (`song_id`,`offset`,`hash`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
What's odd is as @pguridi says, MySQL for me allows different song_id
, hash
combinations even with the primary key. However, just using INDEX
should be the correct way of doing it.
from dejavu.
CREATE TABLE IF NOT EXISTS `fingerprints` (
`hash` binary(10) NOT NULL,
`song_id` mediumint(8) unsigned NOT NULL,
`offset` int(10) unsigned NOT NULL,
PRIMARY KEY (`hash`, `song_id`, `offset`)
UNIQUE KEY `uniq_constraint` (`hash`, `song_id`, `offset`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Should be the correct version, please use utf8
standard as well.
from dejavu.
Yes good on utf8
. We need the index on hash
, though. INDEX
doesn't enforce uniqueness, it's just for lookup speed. Also the primary and unique there are redundant?
from dejavu.
I'm not sure the INDEX
is required since it is also a primary key. But adding it shouldn't hurt.
from dejavu.
The SQL statement to retrieve hashes works on a WHERE hash = <value>
query. Thus we need an index that hashes on only the hash
field. The primary key here is redundant to the UNIQUE one, it only ensures a fast lookup on all three fields, which we don't need (and don't think we use). We need only two things:
- Fast lookup on
hash
field - Unique constraint on (
hash
,song_id
,offset
)
Thus, the minimal amount of indexes to achieve that is:
CREATE TABLE IF NOT EXISTS `fingerprints` (
`hash` binary(10) NOT NULL,
`song_id` mediumint(8) unsigned NOT NULL,
`offset` int(10) unsigned NOT NULL,
INDEX (`hash`)
UNIQUE KEY `uniq_constraint` (`hash`, `song_id`, `offset`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
from dejavu.
Related Issues (20)
- Which field in the results indicates the best match with the original song HOT 3
- run_tests.py: error: the following arguments are required src HOT 4
- Trying to fingerprint about 200 000 files. After 15000 files INSERT operation is very slow. HOT 1
- Do maximum_filter with cupy instead of scipy
- Failed to solve HOT 1
- Comparing short audio files
- Python Docker image bloated
- A directory of potentially duplicate audio files?
- it works well with python3.6 ~
- Not working for recordings
- Some errors when I use python3.7
- multiple concurrency
- Fingerprinting Audio With Differing Sample Rates HOT 1
- Jscght
- figerprinting file HOT 1
- Any way to generate the result image of the plots? HOT 1
- Does this package supports Python 3 HOT 4
- does this project work well on recognizing human speak? HOT 5
- Problem running the sample project HOT 1
- erors using fingerprint_file HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dejavu.