Giter Site home page Giter Site logo

rotsniff's Introduction

rotsniff

rotsniff is a tool to catalog files and their hashes in order to detect corrupted or missing files.

It was inspired by scorch, and the database format is similar, but do not expect any kind of compatibility at this time.

Usage: rotsniff [OPTIONS] <COMMAND>

Commands:
  append  Add files not found in the database
  remove  Remove entries from the database that no longer exists
  update  Update entries in the database for files that have changed
  verify  Verify that all files in the database are intact, and that all files have entries in the database
  help    Print this message or the help of the given subcommand(s)

Options:
      --db <FILE>            Path to the database file [default: ./rotsniff.db]
  -v, --verbose              Make `command` more verbose. Actual behavior depends on the command
  -f, --fnfilter <FNFILTER>  Restrict commands to files which match regex
  -F, --negate-fnfilter      Negate the fnfilter regex match
  -h, --help                 Print help
  -V, --version              Print version

Installing

You can install the latest tagged version directly from crates.io by running the following command.

cargo install rotsniff

Examples

% mkdir foo
% echo 'Hello, World!' > foo/hello

% rotsniff -v append foo
foo/hello: blake2b:94D8520FE182ADD62BEC85B531A17A779FCD39F23248CFABD18347B86CE9F8B73A0C151DD7CE171843DD8A14E5329DDE6B73149D26D6638E94EF4C634F3F1A7B

% rotsniff -v verify foo
MATCH: foo/hello

% echo 'Goodbye!' > foo/hello

% rotsniff -v verify foo
MODIFIED: foo/hello

% rotsniff -v update
UPDATED: foo/hello

% rm foo/hello
% touch foo/new

% rotsniff -v verify foo
FILE NOT FOUND: foo/hello
NOT FOUND IN DB: foo/new

% rotsniff -v append foo
foo/new: blake2b:786A02F742015903C6C6FD852552D272912F4740E15847618A86E217F71F5419D25E1031AFEE585313896444934EB04B903A685B1448B755D56F701AFE9BE2CE

% rotsniff -v verify foo
FILE NOT FOUND: foo/hello
MATCH: foo/new

% rotsniff -v remove
REMOVED: foo/hello

% rotsniff -v verify foo
MATCH: foo/new

Database

The database is a simple CSV text file that is compressed with gzip, in order to be future proof and easily parsed by other software if required.

% rotsniff -v append foo
foo/test: blake2b:7DFDB888AF71EAE0E6A6B751E8E3413D767EF4FA52A7993DAA9EF097F7AA3D949199C113CAA37C94F80CF3B22F7D9D6E4F5DEF4FF927830CFFE4857C34BE3D89
% zcat < rotsniff.db
foo/test,blake2b:7DFDB888AF71EAE0E6A6B751E8E3413D767EF4FA52A7993DAA9EF097F7AA3D949199C113CAA37C94F80CF3B22F7D9D6E4F5DEF4FF927830CFFE4857C34BE3D89

The format is currently file,hash:digest, but this may change to include more data in the future. The only supported hash function for now is BLAKE2b.

rotsniff's People

Contributors

kwarf avatar

Stargazers

 avatar  avatar

Watchers

 avatar

rotsniff's Issues

question: surprising performance

Hi,

I'm looking for a hash verification tool for my photos/videos collection, and came across rotsniff.

I've already had a look at some other projects, including https://github.com/laktak/chkbit-py - which is written in Python and uses md5. On my folder (400+ gbytes, 50k+ files) it finished building a database in 7 minutes (using 6 workers = num of my physical cores; with a single worker, it needed 30 minutes).

As far as I understand, blake2 is faster than md5, and Rust is faster than Python.
However, rotsniff actually took much, much longer on my data: 52 minutes
The command I'm running is just rotsniff append .

And that is the question: is it supposed to take that much longer?
My uninformed gut feeling is that something is wrong :)

My observations:

  • rotsniff starts with a rather high CPU usage, so I assume it is multi-threaded by default; looks like 12 threads (= number of virtual cores) are created
  • SSD read speeds are much lower than for the python tool; speed seems to be capped at around 150-160 mbytes/sec (while python tool definitely had 500+ mbytes/sec peaks). While read speed is low, SSD activity (number of requests) is very high. I wonder if there is a too small reading buffer somewhere? (This is further suggested by threads spending lots of time in a waiting state...)
  • very high RAM usage: up to 30GB VIRT and 15 GB RES (based on a random htop observation for a minute or so); it does decrease at times, but seems to be in multi-gigabyte (4-6+ or so?) RES range at all times.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.