unique-image-scan's People
unique-image-scan's Issues
Investigate switching from ProgressBar to tqdm
https://github.com/tqdm/tqdm claims to have much less time overhead per interation and appears to have a better interface. We should look at this alternative.
Switch from click to typer
Typer is much more dev friendly.
Plan features that we want
Stepping back from what the project is at this point in time, let's break down what we want and what we can do.
This ticket describes goals for an MVP, not myriad future integrations.
Goals
These goals are subject to change during discussion.
The primary is to determine unique media (images and movie files) and discover duplicates, and build a list.
Sub-goals:
- Provide a list of duplicates and include listing of metadata differences.
- Provide a list of images where only image data differs, metadata is identical (potentially resized versions or different file type such as a DNG export to JPG)
Non-goals
- Alter any metadata in any source media files
- Remove or link any source media files on disk
Definitions
Unique Media
"Unique image" is defined as just the image data itself, not metadata. That is, we only care about the raw pixels in the images. If any picture data is different, it's not an identical image.
The following are examples of unique images:
- Images that have identical DateCreated type fields but have different image data. (EG: an iPhone HDR photo may have this scenario.)
- Images that have different image data, but have identical ShutterCount. (EG: Images that were exported from DNG to JPG may have identical metadata.)
- Images that have identical mtime/ctime but different image data.
- Images that have identical image data but have a different ShutterCount or ImageNumber or equivalent field.
Duplicate Media
Duplicate images may ore may not have different metadata.
The following are examples of duplicate images:
- Two files differ only because one has only facial recognition boxes added.
- Two files that differ only by mtime and/or ctime.
- Two files where one has no EXIF data and the other has a variety of metadata.
- Two files where one has an audio and video stream, and another has the same audio and video stream but also has a subtitle stream.
Methods for determining duplicate media
- Read the image data or video stream and hash all or part of that data.
- Compare EXIF/IPTC/etc. data for fields that should be unique and read-only per file. (EG: GPSPosition may be added, removed or altered during editing, but Model should not change.)
Next steps
- Rename
scan
verb toexif-scan
or similar #9 - Add
media-scan
or similar verb to be used with hashing the image/video stream
Add summary of metadata that was searched
It would be great to see some stats about the metadata that was searched.
For example:
- Camera models, lens models, shutter speed, f-stop, iso and how many photos had each value
- Count of photos with no EXIF data
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.