rfjakob / cshatag Goto Github PK
View Code? Open in Web Editor NEWDetect silent data corruption under Linux using sha256 stored in extended attributes
License: MIT License
Detect silent data corruption under Linux using sha256 stored in extended attributes
License: MIT License
I noticed that if the /usr/local/bin
directory doesn't already exist and you install cshatag
with make install
, then the directory won't be created and the executable will be copied in its place. So instead of having /usr/local/bin/cshatag
you end up having it in /usr/local/bin
(a file, not a directory).
Couldn't get shatag working after like an hour trying with extended attributes etc with make-no-sense verbose things like <missing>
Anyway this worked out of the box and I can see them via getfattr
How do I compile for ARM using the makefile? I ran this and it worked fine (and I am using it already, just wondering about the makefile):
env GOOS=linux GOARCH=arm GOARM=7 go build .
but I read the makefile and no idea what's going on in there..
also another question:
https://ostechnix.com/how-to-edit-a-file-without-changing-its-timestamps-in-linux/
doing this makes it think a file is corrupt, there are other things that sometimes change files but not mtime, are there any methods of avoiding this or not really?
Background: I'm first time user of cshatag, so please let me know if this is just user error. I am on M1 Mac and have an external hard drive that's ExFAT formatted
Description of Error:
cshatag -recursive
, and I can see the the user.shatag.ts and user.shatag.sha256 attributes are written to files when I check using xattr -l
.cshatag -recursive -qq
then I see lots of errors because of files that start with ._
, because it says "operation not permitted". I tried to remove these dot files to see if cshatag can still check for errors, but that also removes the extended attributes.cshatag -qq
runs fine if I check just one file.Question:
Thanks for helping/explaining how I should be using cshatag
While working on a MacOS port, I just found a weird bug when updating tags on a Samba share. I am wondering if this bug is introduced by some MacOS + SMB interaction, or if it is preexisting. Here is the behavior I am seeing:
When updating an outdated tag on the Mac's main filesystem, the update works, and the next run of cshatag
reports <ok>
:
yemartin at iMac in ~/src/cshatag-master
$ cshatag test.bin
<outdated> test.bin
stored: d8c2284963814db0cc2e9d49d96af3139bbc1fee0c43f6f45004863c8e10bdc5 1560059961.626701323
actual: c2bd3e9a8195e2fe3ed99864752c5edb449b8c43beb7a51dcd3b6e28258b955b 1560060120.445768395
yemartin at iMac in ~/src/cshatag-master
$ cshatag test.bin
<ok> test.bin
When doing the same thing on an SMB-mounted Samba share, the update does not work: instead, the xattrs are removed. The next run of cshatag
reports a missing flag ("stored: 00000..."), and this time, it sets the xattrs succesfully. Another (third) run of cshatag
reports <ok>
:
yemartin at iMac in /Volumes/Organizer
$ cshatag test.bin
<outdated> test.bin
stored: d8c2284963814db0cc2e9d49d96af3139bbc1fee0c43f6f45004863c8e10bdc5 1560059979.000000000
actual: c2bd3e9a8195e2fe3ed99864752c5edb449b8c43beb7a51dcd3b6e28258b955b 1560060125.000000000
yemartin at iMac in /Volumes/Organizer
$ cshatag test.bin
<outdated> test.bin
stored: 0000000000000000000000000000000000000000000000000000000000000000 0000000000.000000000
actual: c2bd3e9a8195e2fe3ed99864752c5edb449b8c43beb7a51dcd3b6e28258b955b 1560060125.000000000
yemartin at iMac in /Volumes/Organizer
$ cshatag test.bin
<ok> test.bin
Is this a known issue with SMB or Samba? If not, can someone confirm whether this can be reproduced with Linux + Samba share?
(In case it matters: the protocol negotiated between the Mac and Samba server was SMB_3.02
)
Thank you.
Hello! cshatag is a really useful program. One thing that would make it more useful for automated monitoring is to have the reporting differentiate between new files and outdated ones.
Right now, the event 'outdated' covers two situations: when the checksum in the attribute changes, and if there is no checksum at all (since it assumes a zeroSha256 by default).
I propose a new event 'new' that gets returned when the attributes don't exist and cshatag is calculating them for the first time.
I'm not really a developer, but I could try to throw together an implementation and submit a pull request if you'd like.
To my understanding, if either the mtime or the checksum of a file (or both) have changed, "the status of the file is printed to stdout and the stored checksum is updated" (this last part is taken directly from the readme). This means that if something happened to your file that you weren't aware of (for example, if you edited it by mistake), you'll only find out when running cshatag
the first time. The second time, the checksum will already have been updated and everything looks normal. I can foresee many situations where this might be a problem, for example if your computer loses power before you have a chance to inspect the output or even if you forget that you've run the program and run it again.
In rsync, you can dry run the program using the -n
flag. This means that you get to see all error messages, warnings, et cetera, without actually executing the core function of the program, namely to copy files. Would it be desirable to have something similar implemented in cshatag
?
Hi,
Firstly excellent tool. I have used a few of these/similar tools, the last one saving to a file per directory but I wanted something that moved with the files. Anyway I know this is me not understanding, so what I have done:
echo test_1 > test
touch -t 202301300000 test
cshatag test
I get the result as I would expect:
<new> test
stored: 0000000000000000000000000000000000000000000000000000000000000000 0000000000.000000000
actual: 5a18f75b3ce3ed6550c33f23bb21f833bd63a159cb592a272fd1c61f98de5111 1675036800.000000000
echo test_2 > test
touch -t 202301300001 test
cshatag test
And as expected get the output:
<outdated> test
stored: 5a18f75b3ce3ed6550c33f23bb21f833bd63a159cb592a272fd1c61f98de5111 1675036800.000000000
actual: 8f1d878efe7586c55c8f0d7578ac59efda6831778eb5fba5f68b2f21a3519609 1675036860.000000000
echo test_3 > test
touch -t 202301300001 test
cshatag test
And as expected it is detected.
Error: corrupt file "test"
<corrupt> test
stored: 8f1d878efe7586c55c8f0d7578ac59efda6831778eb5fba5f68b2f21a3519609 1675036860.000000000
actual: 8f89c43b0cd072e7127bcf26635d4e2febdacbb737bdb44f797e4e96b2408d73 1675036860.000000000
<ok> test
I know this is the expected result according to the 'run_tests.sh' script you have. However I am failing to see why. If a file is corrupt then surely the attribute should not get updated, wouldn't you want it to keep showing as corrupt?
Currently one have to read the man page to know how to undo / untag files previously tagged with cshatag. (And the command example given feels like it leaks cshatag implementation detail.) Perhaps adding a --untag flag (or --undo?) would help?
At the same time, we might want to add --help flag too, to more easily find the new flag.
This is a follow-up to #12 that was closed with the introduction of <timechange>
. But unfortunately, <timechange>
does not solve the original problem:
cshatag
still cannot detect bit corruption that happens during move or copy operations between two filesystems with different timestamp precisions.
With the same example as in #12, with one added command to simulate corruption during transfer, and with:
/tmp
on my root filesystem (APFS)/Volumes/Organizer
from my NAS, mounted through SMB (SMB_3.1.1)$ rm /Volumes/Organizer/test.bin \
; touch /tmp/test.bin \
&& cshatag -qq /tmp/test.bin \
&& mv /tmp/test.bin /Volumes/Organizer/ \
&& echo 'CORRUPTION' >> /Volumes/Organizer/test.bin \
&& cshatag /Volumes/Organizer/test.bin
<outdated> /Volumes/Organizer/test.bin
stored: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 1641197558.029917810
actual: 4ef8ee0f9aaecb1597f22dfd7667af4a9b537e11e3aba08729647a882f9aff6e 1641197558.000000000
<corrupt> /Volumes/Organizer/test.bin
stored: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 1641197558.029917810
actual: 4ef8ee0f9aaecb1597f22dfd7667af4a9b537e11e3aba08729647a882f9aff6e 1641197558.000000000
<timechange>
was a nice introduction for when the data has not changed. But when the data did change, we still need to ignore small time differences below a certain threshold, to differentiate between a legitimate <outdated>
, and a <corrupt>
file.
As per the discussion in #12, I suggest:
*1: FAT has a 2 seconds precision on last modified time
*2: With this new behavior as default, users may get a harmless false positive, but the file content is still good. If the behavior is opt-in, users would get false negatives, meaning corruption would go undetected.
Note: to get the false positive, the user would need to make a legitimate edit within 2 seconds of running cshatag
against a given file, quite unlikely. And it if does happen, the file content is good anyway, so no harm done.
To keep things simple, I suggest we just do, when data has changed:
if time_delta <= threshold
: corruptelse
(i.e. time_delta > threshold
): outdatedbut we can also consider introducing a new status, something like:
if time_delta == 0
: corruptelse if time_delta <= threshold
: suspiciouselse
(i.e. time_delta > threshold
): outdatedWhat do you think?
I noticed the return values from malloc are not checked in case the system runs out of memory (which really can happen).
And the allocated memory isn't freed. Now this is not a big problem, since the program can only process one file at a time. But it is not clean.
Also, when the argument is a file which already has a checksum, valgrind notices an "Conditional jump or move depends on uninitialised value".
If this project isn't actively maintained I could fix those problems provided i find the time to do that.
cshatag
always report corruption when I create an empty foo
file for test purposes:
$ touch foo
fturco@desktop ~ 21:27:22 0 $ cshatag foo
<outdated> foo
stored: 0000000000000000000000000000000000000000000000000000000000000000 0000000000.000000000
actual: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 1550262442.004366404
fturco@desktop ~ 21:27:25 0 $ cshatag foo
Error: corrupt file "foo"
<corrupt> foo
stored: 0000000000000000000000000000000000000000000000000000000000000000 1550262442.004366404
actual: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 1550262442.004366404
Some more details:
$ getfattr foo
# file: foo
user.shatag.sha256
user.shatag.ts
I'm using the latest cshatag version for this git repository on a Gentoo Linux system.
The really nice thing about cshatag
, compared to other tags file solutions like chkbit
, is that the tag follows the file along when the file is moved or copied, as long as the destination filesystem supports extended attributes.
But this unfortunately breaks when the time resolution of the target filesystem is less that the original filesystem. This would prevent detecting bit corruption that happened during move or copy operations.
For example, using the Go rewrite, and with:
/tmp
on my root filesystem (APFS)/Volumes/Organizer
from my NAS, mounted through SMB (SMB_3.02)$ rm /Volumes/Organizer/test.bin \
; touch /tmp/test.bin \
&& cshatag /tmp/test.bin \
&& mv /tmp/test.bin /Volumes/Organizer/ \
&& cshatag /Volumes/Organizer/test.bin
remove /Volumes/Organizer/test.bin? y
<outdated> /tmp/test.bin
stored: 0000000000000000000000000000000000000000000000000000000000000000 0000000000.000000000
actual: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 1561415148.563117837
<outdated> /Volumes/Organizer/test.bin
stored: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 1561415148.563117837
actual: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 1561415148.000000000
The second cshatag
call, on the SMB share, considers the tag outdated. If corruption had happened during the move operation, cshatag
would have missed it.
Suggestion: if I remember well, FAT was probably the lowest denominator, with 2 seconds resolution timestamps. So to ensure maximum compatibility, cshatag
should consider the file unchanged if the file timestamp is within +/- 2 seconds of the tag timestamp.
So, to sum it bug-report style:
<outdated> /Volumes/Organizer/test.bin
stored: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 1561415148.563117837
actual: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 1561415148.000000000
<ok> /Volumes/Organizer/test.bin
Do you think this makes sense, and this is possible to add to the Go rewrite?
Hello.
Thanks for your work, works great.
We run Linux LX zones in Illumnos VMs (OmniOS), and the base filesystem there is ZFS. ZFS On Linux (ZOL) already has fix for use of xattrs, but this fails on the LX VMs because xattr are not accessible in the VM because of the way ZFS stores file attributes. I'd like to propose a work-around - kindly allow storage of the attributes to a hidden/immutable file in the same directory as the file as an option.
The attached screenshot is from the OmniOS gitter discussion group.
Thanks.
The readme says
COPYRIGHT
Copyright 2012 Jakob Unterwurzacher. License GPLv2+.
but the LICENSE file contains the MIT license. Under which license is this project now?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.