Giter Site home page Giter Site logo

vid_dup_finder's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

chad90b

vid_dup_finder's Issues

`--print-unique` seems to not work

vid_dup_finder --files /home/rafal/Pictures --print-unique
error: Found argument '--print-unique' which wasn't expected, or isn't valid in this context

USAGE:
    vid_dup_finder [OPTIONS] --files <Directories/files to search>...

For more information try --help

Videos less than 30 sec not working

I am aware you mentioned that it does not work for videos under 30 seconds, but I wondered if there were any updates to this. Also, if there is anything I can help with, then just let me know

Create crates.io release and move library to different workspace

Hi,
Looks that your project allows to use duplicated videos, but currently I couldn't find any method how I could integrate it with my app Czkawka.

I suggest to move library to different workspaces and after that publish it in crates.io.

Currently also when I tried to compile app, I found that it require nightly Rust, which sadly prevents me from using it.

Ability to adjust 30 second limit?

Can we have a way to change the 30 second limit? I have a bunch of videos with an identical intro sequence and I'd like to adjust the limit to 2 minutes or more. I understand that this increases the processing time, so keeping 30sec as default makes sense, but adding the ability to adjust this on a per-job basis would be very useful.

--include-exts

Exploring this tool further to identify video duplicates. It appears to perform significantly faster in comparison to videohash. However, I've noticed that the tool has a few issues:

  • It initiates an FFmpeg process for all discovered files, even if they clearly aren't video files, such as "par2, txt, sfv, png, jpg."
  • Additionally, it stores references to these non-video files in the cache.

A workaround is to use the "--exclude-exts" option to blacklist specific file types. But it would be more convenient to provide users with more flexibility in deciding the strategy. Here are some suggestions:

  • Evaluate every file to determine if it's actually a video file.
  • Whitelist: Only scan items specified in "--include-exts."
  • Blacklist: Scan everything except the ones listed in "--exclude-exts."

Of course, these last two options should be mutually exclusive.

Inconsistent results ("Too short" messages)

So, i noticed some invalid "too short" messages. I have a directory with a lot of non-movie stuff.
Including source/compiled files of vid_dup_finder.

I did a scan a couple of times, inceasing the logfile number each time, like:

rm /home/test/.cache/vid_dup_finder/vid_dup_finder_cache.bin
./target/release/vid_dup_finder --files ~/TEMP/fingerprint/ 2>&1 |tee -a log5

Now we check the output messages (redacted the output. I used the same replacements for each file)
number=5; cat log${number} |grep short |sed 's|.*Too short : ||' |sort > log${number}_parsed.log

For example check "log5_compared_to_6_and_7.png".
In run 5 we got a "too short" message for "[rarbg]/.mp4". This did not happen in run 6.
I got a "inserting" in run 6 for this same file.

The same for "log8_compared_to_9_and_10.png"
"cache2/.mp4" failed in run 8. But was inserted in run "9".

As every run the order of the scanned files is different i think some variable is the loop. Or ffmpeg gets confused about all the non-movie files.
log5_compared_to_6_and_7
log8_compared_to_9_and_10
log5_parsed.log
log6_parsed.log
log7_parsed.log
log8_parsed.log
log9_parsed.log
log10_parsed.log

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.