Giter Site home page Giter Site logo

CRC-32 checks about yauzl HOT 5 OPEN

thejoshwolfe avatar thejoshwolfe commented on July 18, 2024
CRC-32 checks

from yauzl.

Comments (5)

thejoshwolfe avatar thejoshwolfe commented on July 18, 2024 1

I do not currently plan to do any CRC32 checks in yauzl. I know that is a feature of other zip file readers, and I'm not sure how valuable it would be for yauzl to support it.

In my opinion, hash-based error checking should really never be done inside a file format. A file format does not correspond to any situation where data corruption could occur. Rather error checking should happen when errors can happen, such as during network transmission. If you're trying to check for errors in your storage hardware degrading over time, then you can keep checksums of files beside the files and do your error checking whenever you like regardless of the file format. Additionally, if error checking is outside the file format, then you're not limited to a fixed set of hash algorithms, like CRC32.

That being said, it's very popular to include redundancy information in file formats. Even the brand new FLIF image format includes optional CRC32 checksums, despite everything else about the format seeming very progressive. Perhaps checksums in files are not as useless as I think. I'm not sure.

If anyone makes an argument to convince me that it's valuable to do the CRC32 checks in yauzl, I will gladly add optional support for it. Keep in mind that CRC32 computation is not free, so it will slow down unzipping slightly. Currently, yauzl can unzip faster than Info-ZIP's unzip command line program, which is written in C, for some zip files probably because yauzl is skipping the CRC32 checking.

If yauzl added support for CRC32 checking, then it would be an error emitted from the read stream obtained from openReadStream() after the file contents have been piped through before ending the pipeline.

from yauzl.

overlookmotel avatar overlookmotel commented on July 18, 2024

I would like to (belatedly) speak up in favour of CRC32 checking.

I agree with you in principle that hash-based error checking is better done outside of a file format. And there are more appropriate hashing algorithms than CRC32 for many uses.

However, in my use case, I do not control the source of ZIP files I work with, nor the transmission medium by which I receive them. The best indication I have of whether files are corrupted is the CRC32 values in the ZIP file.

I imagine this is not an uncommon situation (I notice some other issues on this repo where @thejoshwolfe has asked how a problematic ZIP file was created, and the answer was "no idea, someone sent it to me").

Would you be willing to reopen this issue?

from yauzl.

thejoshwolfe avatar thejoshwolfe commented on July 18, 2024

I believe I can simply add documentation to the README explaining how to do the CRC32 checks outside of yauzl. I think it's as simple as piping the readStream from openReadStream through a CRC32 checker, and comparing it to entry.crc32.

I'm reopening the issue to look into it.

from yauzl.

overlookmotel avatar overlookmotel commented on July 18, 2024

I have just published a module on npm yauzl-crc that adds CRC32 checking.

@thejoshwolfe If you have time, would you mind taking a look to see if I've missed anything? Streams are not my strong suit.

from yauzl.

sirisian avatar sirisian commented on July 18, 2024

For what it's worth, I am using this library to unzip archives and saw on some hardware it corrupts a file with null bytes. This is somewhat rare and is probably hardware specific. (Might be overheating, but that's just a guess). Decided to dive into how this is possible and saw this issue. On large systems the probability is near zero, but because of scale it happens. Creates some very subtle bugs.

from yauzl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.