Giter Site home page Giter Site logo

Checksum issues about zpaqfranz HOT 15 CLOSED

sergeevabc avatar sergeevabc commented on June 12, 2024
Checksum issues

from zpaqfranz.

Comments (15)

fcorbelli avatar fcorbelli commented on June 12, 2024

Windows 7 x64, Zpaqfranz 59.4

$ zpaqfranz sum *.wav -pakka -sha256
No multithread: Found (2.85 MB) => 2.988.280 bytes (2.85 MB) / 2 files in 0.015000
60f2791266401076a68c3311d2fa089657cfb1116048a614cc4b841e63ffb187 original02.wav
7cb7ca1fadbe690d153e8b7e5598c6a43dc8ed6e8c68854458e3f70fe2172dbe original01.wav
0.265 seconds (000:00:00) (all OK)

a) Checksums are output only after all files have been processed, which is a problem when files are large. I would like to be able to see the results as soon as they are ready.

b) Results are displayed without taking into account alphabetical sorting.

thanks for the report

  1. true
  2. you can use the -nosort switch. By default the sort is made by hash, to quickly find duplicated files
|XXH3: 1CE69346A95DAB99DAD33D49FBFB6431 [                130]     |release/zpaqfranz_old/12/fai.bat
|XXH3: 1CE69346A95DAB99DAD33D49FBFB6431 [                130] === |release/zpaqfranz_old/13/fai.bat
|XXH3: 1CE69346A95DAB99DAD33D49FBFB6431 [                130] === |release/zpaqfranz_old/15/fai.bat

from zpaqfranz.

fcorbelli avatar fcorbelli commented on June 12, 2024

Please try the attached pre-release, with the -nosort switch

59_5a.zip
This will show immediatly the computated hash (hopefully 😄 )

PS do not forget -ssd if you have solid state drives

from zpaqfranz.

fcorbelli avatar fcorbelli commented on June 12, 2024

b) Results are displayed without taking into account alphabetical sorting.

59_5b.zip
Please try this one
you can orderby and even -desc

zpaqfranz sum k:\vm -xxh3 -ssd -orderby size -only *.vmdk

from zpaqfranz.

sergeevabc avatar sergeevabc commented on June 12, 2024

@fcorbelli, thanks for the quick edits, but I think you're trying to make it more comfortable to pedal the bike backwards when people want to pedal forward.

By default the sort is made by hash, to quickly find duplicated files

Why is this the default behavior? Calculating a checksum and using a checksum to identify duplicates are two different tasks. The former task is regular, serving to detect accidental or intentional file changes. And the latter task is a derivative and less in demand. Not to mention, comparing several hashes by eye is a questionable practice, instead it seems more appropriate to give a hint from the app, but I'm not sure that a compression app should be extended in this way (there are jdupes-like apps for that).

For example, before I send photos to the remote facility, I calculate the checksums of the monthly archives then save them as checksums.sha256. This is what the output of checksum calculators looks like by default. And this is exactly the look I expected to get from Zpaqfranz without having to remember a thousand and one flags.

$ rhash --sha256 img202405*.jpg
833410eb6106a8865d21efbaa88250a7f20361b79d2a35ae541d4726a36c128e  img202405_6894.jpg
b5045fff05433b10248d3b138bcd83a2e1322b4807710794ebc92ed45476a9f0  img202405_6895.jpg
5dd52b378bb927d83b3e0a4755a89ccd2e425bd076f60df3428fab96ad4a6300  img202405_6896.jpg

$ b3sum img202405*.jpg
96ac12a2716c7c25b00380017aebc56b78602e484a2ecb2b80c1961bcfdc0598  img202405_6894.jpg
c4b824e925f627d3afe34c3a04ed769b0ae73ab0ac08c8312f161c2eda38cbcb  img202405_6895.jpg
3614a0d0afa46fb6eeb921a7eded0d602a433e898baf710c09740869ed870633  img202405_6896.jpg

$ xxhsum img202405*.jpg
46cb55b63ff71972  img202405_6894.jpg
24fa50966a7b913c  img202405_6895.jpg
8380dc1a51b5972a  img202405_6896.jpg

from zpaqfranz.

fcorbelli avatar fcorbelli commented on June 12, 2024

@fcorbelli, thanks for the quick edits, but I think you're trying to make it more comfortable to pedal the bike backwards when people want to pedal forward.

I'm actually more interested in how I pedal.

By default the sort is made by hash, to quickly find duplicated files
Why is this the default behavior?
Calculating a checksum and using a checksum to identify duplicates are two different tasks.

Because, as I explained, having a quick indication of duplicate files is what I do every day

The former task is regular, serving to detect accidental or intentional file changes. And the latter task is a derivative and less in demand.

It is just the opposite. zpaqfranz is my tool for making backups, not for calculating hashes. there are already many, maybe even better ones

Not to mention, comparing several hashes by eye is a questionable practice, instead it seems more appropriate to give a hint from the app,

You don't have to do it "by eye," you get three = appear. It is therefore immediate to identify them. At least for me

but I'm not sure that a compression app should be extended in this way (there are jdupes-like apps for that).

It's a program that I don't know
In zpaqfranz there are d (duplication) and 1on1

For example, before I send photos to the remote facility, I calculate the checksums of the monthly archives then save them as checksums.sha256. This is what the output of checksum calculators looks like. And this is exactly the look I expected to get from Zpaqfranz without having to remember a thousand and one flags.

With zpaqfranz it is wasted time.
The hashes are (can be) stored within the archive, along with their CRC-32 (global) and SHA-1 (block)
You can store SHA256 like this

zpaqfranz a z:\1.zpaq *.cpp -sha256

That's all
To read back

zpaqfranz l z:\1.zpaq -checksum

BTW you gave me an idea, I will make a parser that shows checksums even if specific algorithms are given (i.e. if you type l -sha256 it will show you the list, even if they are BLAKE3)

If you have a level of "paranoia" similar to mine, you can use hashdeep to create a list of hashes, add it to the archive, and then use zpaqfranz to compare the hashes of the extracted files
I do this when storing zfs streams, just to have a check of external software (hashdeep), especially on name restorability

Short version: the purpose of the sum command is not to create a list of hashes, like md5 or hashdeep
Of course you can do that. But here we are talking about (deduplicated) backups

|XXH3: 0000F21922075EC1E0BEBE3D781A4FDB [              5.702]     |c:/zpaqfranz/pakka/__astcache/@c@@zpaqfranz@pakka/c@@program files (x86)@embarcadero@[email protected]@include@windows@[email protected]
|XXH3: 00027C61BBC801068CD2D7B5F82D8228 [              6.393]     |c:/zpaqfranz/pakka/__astcache/@c@@zpaqfranz@pakka/c@@program files (x86)@embarcadero@[email protected]@include@windows@[email protected]
|XXH3: 002134602BBAEC53BCDB1D8936B3D3A7 [          2.457.654]     |c:/zpaqfranz/pakka/spaz/button-24822_1280.bmp
|XXH3: 003E1B7B293C53D820C81607A378079E [              9.396]     |c:/zpaqfranz/PDCursesMod-master/psffonts/mappings/CP862.TXT
|XXH3: 003FF08403A2138C14E97655634C316D [              3.616]     |c:/zpaqfranz/PDCursesMod-master/psffonts/fntcol16/bigsf-14.psf
|XXH3: 00486627A24AB2CD3849174B9E6B55D7 [                812]     |c:/zpaqfranz/PDCursesMod-master/demos/README.md
|XXH3: 00486627A24AB2CD3849174B9E6B55D7 [                812] === |c:/zpaqfranz/demos/README.md
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [             61.679]     |c:/zpaqfranz/715/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [             61.679] === |c:/zpaqfranz/715/mono/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [             61.679] === |c:/zpaqfranz/715/ok/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [             61.679] === |c:/zpaqfranz/715/zpaq715/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [             61.679] === |c:/zpaqfranz/715d/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [             61.679] === |c:/zpaqfranz/717/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [             61.679] === |c:/zpaqfranz/717/ok/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [             61.679] === |c:/zpaqfranz/718/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [             61.679] === |c:/zpaqfranz/bsd/spaz/zpaq/zpaq/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [             61.679] === |c:/zpaqfranz/dataman/src/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [             61.679] === |c:/zpaqfranz/release/test/715/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [             61.679] === |c:/zpaqfranz/release/zpaqfranz_old/11/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [             61.679] === |c:/zpaqfranz/release/zpaqfranz_old/12/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [             61.679] === |c:/zpaqfranz/release/zpaqfranz_old/13/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [             61.679] === |c:/zpaqfranz/release/zpaqfranz_old/15/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [             61.679] === |c:/zpaqfranz/release/zpaqfranz_old/16/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [             61.679] === |c:/zpaqfranz/release/zpaqfranz_old/16beta/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [             61.679] === |c:/zpaqfranz/release/zpaqfranz_old/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [             61.679] === |c:/zpaqfranz/zpaqd/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [             61.679] === |c:/zpaqfranz/zpipe/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [             61.679] === |c:/zpaqfranz/zpipe/ok/libzpaq.h

I think this example is quite clear
Incidentally, having an alphabetical order doesn't really add any particular information

You can also use a "smart" version (especially on *nix), namely the dir command (or call the zpaqfranz executable of, or a symbolic link to)
In that case you can do

zpaqfranz dir -checksum -xxh3
zpaqfranz dir /os -checksum

I'll put an "autochecksum" at this point as well, to save a switch

from zpaqfranz.

fcorbelli avatar fcorbelli commented on June 12, 2024

Short version: I don't think it makes a lot of sense to have a program that, by default behavior, does exactly what a thousand other pieces of software do.
I don't want to remember a thousand switches, pipes and concatenations to find duplicate files using other software
I don't want to "pedal backwards," I want to "pedal to my goal"

from zpaqfranz.

fcorbelli avatar fcorbelli commented on June 12, 2024

BTW If you are wondering why the dir command exists, the answer is trivial for a storage manager
Finds the largest files, recursively, within a folder and checks if they are duplicate

How do you do this? With a thousand switches and many pipes

With zpaqfranz?

zpaqfranz dir /s /os -blake3

Just like standard Windows: /s (recursive), /os (order by size)

And you'll get something like

11/05/2024  15:52           3.588.212 release/59_4/zpaqfranz.cpp
=================           3.588.212 va.cpp

02/09/2022  10:23           5.812.273 zpipe/ok/pippo.zpaq
=================           5.812.273 zpipe/ok2/pippo.zpaq
=================           5.812.273 zpipe/pippo.zpaq

02/09/2022  10:23           6.044.170 zpipe/ok/pippero.zpaq
=================           6.044.170 zpipe/ok2/pippero.zpaq
=================           6.044.170 zpipe/pippero.zpaq

24/07/2022  12:03           7.494.122 windows-via-c-c_5th-edition.pdf
=================           7.494.122 zpipe/decomp.pdf
=================           7.494.122 zpipe/ok/decomp.pdf
=================           7.494.122 zpipe/ok2/decomp.pdf

07/09/2023  11:00          19.012.145 zpaqlist/2.txt
=================          19.012.145 zpaqlist/uno/2.txt

Which, again, I think is really easy to interpret.

from zpaqfranz.

fcorbelli avatar fcorbelli commented on June 12, 2024

59_5e.zip

In the attached pre-release there is a brand new command, hash
That mimic other softwares' behavyour

useful switches
-stdout
-ssd

not so useful
-noeta
-verbose

from zpaqfranz.

sergeevabc avatar sergeevabc commented on June 12, 2024
$ zpaqfranz.exe hash *.zip

zpaqfranz v59.5d-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2024-05-13)
franz:hash                                      9 - command  < a lot of spaces
franz:-noconsole
HASHA                                                        < HASHA?
Hashing SHA-1 ignoring .zfs and :$DATA

No multithread: Found (40.54 MB) => 42.513.725 bytes (40.54 MB) / 7 files in 0.015000
6eb23ff770ea1d45788bbaad89f4d66f3af303cc sample01.zip
26b545b16ddb7514501bef110abbec9944fb57c8 sample02.zip
7a9871038cb8eb954b4f723c9f86b248f914fc59 sample03.zip
a027992dabb0df0eb20cf3ed08fe6371512d7bbc sample04.zip
8be6297f4132bc3a8936f5199bf9b48928f51d78 sample05.zip
cc873c4bf3875f5bf0884d6d7119b66a216e0f1b sample06.zip
5eb0b36798d4ddc487b7ba69e84addb4ba594fd8 sample07.zip217.270/SeC
2.512 seconds (000:00:02) (all OK)                   ^ lack of break looks odd, and what is SeC?
               ^ extra zero looks odd
$ zpaqfranz.exe hash *.zip

zpaqfranz v59.5e-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2024-05-13)
franz:hash                                      9 - command
franz:-noconsole
Hashing SHA-1 ignoring .zfs and :$DATA

6eb23ff770ea1d45788bbaad89f4d66f3af303cc sample01.zip
26b545b16ddb7514501bef110abbec9944fb57c8 sample02.zip
7a9871038cb8eb954b4f723c9f86b248f914fc59 sample03.zip
a027992dabb0df0eb20cf3ed08fe6371512d7bbc sample04.zip
8be6297f4132bc3a8936f5199bf9b48928f51d78 sample05.zip
cc873c4bf3875f5bf0884d6d7119b66a216e0f1b sample06.zip
5eb0b36798d4ddc487b7ba69e84addb4ba594fd8 sample07.zip
0.328 seconds (000:00:00) (all OK)

from zpaqfranz.

fcorbelli avatar fcorbelli commented on June 12, 2024

franz:hash 9 - command < a lot of spaces
Just debug info

HASHA < HASHA?
Just debug info

^ lack of break looks odd, and what is SeC?
bytes for SeConds

^ extra zero looks odd
I like it 😄
Sometimes making backups can last for veeeeery long time

from zpaqfranz.

sergeevabc avatar sergeevabc commented on June 12, 2024

^ extra zero looks odd
I like it 😄

Well, this looks odd not because different representations of time are possible, but because it breaks the consistency of interface elements since you use two zeroes in other places (see below). It's like going outside, buttoning the cuff on one sleeve, but rolling it up to the elbow on the other.

$ zpaqfranz hash -sha256 Ennio.2021.mkv
does not work so far

$ zpaqfranz hash Ennio.2021.mkv
zpaqfranz v59.5e-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2024-05-13)
franz:hash                                      9 - command
franz:-noconsole
Hashing SHA-1 ignoring .zfs and :$DATA

029% 00:02:46 ( 872.00 MB) of (   2.94 GB)           13.446.445/SeC
     ^ two zeroes                                    ^ looks very odd, consider 13.44 MB/s which is clearer and more familiar

54760: CONTROL-C detected, try some housekeeping...

from zpaqfranz.

fcorbelli avatar fcorbelli commented on June 12, 2024

Well, this looks odd not because different representations of time are possible, ...

The total running time can be > 99 hours
Hardly 999

but because it breaks the consistency of interface elements since you use two zeroes in other places (see below). It's like going outside, buttoning the cuff on one sleeve, but rolling it up to the elbow on the other.

The sleeves gets different lengths 😄

 ^ two zeroes                                    ^ looks very odd, consider 13.44 MB/s which is clearer and more familiar

It is MUCH harder to find, in the source code, the exact line for ETA
With different cases, MUCH easier
Because there are about 4 different ETAs
For small files, for big one, without an expected size
Updated every 1 second etc

/SeC is myavanzamentoby1sec()
So the expected operation is one update per second
/sec is print_progress()
That changes the writing as the ETA changes, so it is NOT the same as the previous one, although apparently it is

Because sometimes the output tells more then expected

Well yes, (almost) everything in zpaqfranz has a reason for being there

from zpaqfranz.

fcorbelli avatar fcorbelli commented on June 12, 2024

59_5f.zip

new timetohuman

from zpaqfranz.

fcorbelli avatar fcorbelli commented on June 12, 2024

The newer pre-release 59.5g is ready to be updated

zpaqfranz update -force

With /s instead of /sec 😄
(much harder debug, but I'll invent something else)

BTW there is a new -home switch for the s command

C:\zpaqfranz>zpaqfranz s c:\users -home -ssd -ignore
zpaqfranz v59.5g-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2024-05-14)
franz:-home -hw -ignore -ssd
homesize
Scanning 5 subfolders...
Creating 5 scan threads


Parallel scan ended in 0.922000 s
----------------------------------------------------------------------------------------------------
        2.461.494.221 00005697 c:/users/All Users/
                    0 00000000 c:/users/Default User/
            1.568.227 00000103 c:/users/Default/
          929.689.685 00008663 c:/users/Public/
       35.141.251.003 00073727 c:/users/utente/
1.047 seconds (00:00:01) (all OK)

from zpaqfranz.

fcorbelli avatar fcorbelli commented on June 12, 2024

No more update, I close for now
Thank you

from zpaqfranz.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.