Comments (5)
I do not currently plan to do any CRC32 checks in yauzl. I know that is a feature of other zip file readers, and I'm not sure how valuable it would be for yauzl to support it.
In my opinion, hash-based error checking should really never be done inside a file format. A file format does not correspond to any situation where data corruption could occur. Rather error checking should happen when errors can happen, such as during network transmission. If you're trying to check for errors in your storage hardware degrading over time, then you can keep checksums of files beside the files and do your error checking whenever you like regardless of the file format. Additionally, if error checking is outside the file format, then you're not limited to a fixed set of hash algorithms, like CRC32.
That being said, it's very popular to include redundancy information in file formats. Even the brand new FLIF image format includes optional CRC32 checksums, despite everything else about the format seeming very progressive. Perhaps checksums in files are not as useless as I think. I'm not sure.
If anyone makes an argument to convince me that it's valuable to do the CRC32 checks in yauzl, I will gladly add optional support for it. Keep in mind that CRC32 computation is not free, so it will slow down unzipping slightly. Currently, yauzl can unzip faster than Info-ZIP's unzip
command line program, which is written in C, for some zip files probably because yauzl is skipping the CRC32 checking.
If yauzl added support for CRC32 checking, then it would be an error emitted from the read stream obtained from openReadStream()
after the file contents have been piped through before ending the pipeline.
from yauzl.
I would like to (belatedly) speak up in favour of CRC32 checking.
I agree with you in principle that hash-based error checking is better done outside of a file format. And there are more appropriate hashing algorithms than CRC32 for many uses.
However, in my use case, I do not control the source of ZIP files I work with, nor the transmission medium by which I receive them. The best indication I have of whether files are corrupted is the CRC32 values in the ZIP file.
I imagine this is not an uncommon situation (I notice some other issues on this repo where @thejoshwolfe has asked how a problematic ZIP file was created, and the answer was "no idea, someone sent it to me").
Would you be willing to reopen this issue?
from yauzl.
I believe I can simply add documentation to the README explaining how to do the CRC32 checks outside of yauzl. I think it's as simple as piping the readStream
from openReadStream
through a CRC32 checker, and comparing it to entry.crc32
.
I'm reopening the issue to look into it.
from yauzl.
I have just published a module on npm yauzl-crc that adds CRC32 checking.
@thejoshwolfe If you have time, would you mind taking a look to see if I've missed anything? Streams are not my strong suit.
from yauzl.
For what it's worth, I am using this library to unzip archives and saw on some hardware it corrupts a file with null bytes. This is somewhat rare and is probably hardware specific. (Might be overheating, but that's just a guess). Decided to dive into how this is possible and saw this issue. On large systems the probability is near zero, but because of scale it happens. Creates some very subtle bugs.
from yauzl.
Related Issues (20)
- Cannot parse file header with uncompressed size, compressed size, or local file header offset of 0xffffffff, if not in Zip64 format
- Streams aren't implemented according to Node.js documentation HOT 3
- Serialise entry item HOT 4
- Buffer deprecated warning HOT 3
- error : invalid central directory file header signature: 0x5053884e HOT 1
- Unable to unzip hidden files HOT 1
- Unable to open correct Zip64 files with 0xffff as "number of this disk" in eocdr HOT 1
- [Feature Request] Support for AES-256 decryption HOT 1
- bug and workaround when running from win executable created with pkg HOT 1
- invalid comment HOT 1
- How to open password protected zip files HOT 2
- Failure when readstream doesn't have the unpipe method implementation
- calculating progress for the whole archive. HOT 3
- Opening of archive with entry named "/" fails HOT 3
- Invalid signature causes process to close unexpectedly HOT 2
- how to handle different values for extraFieldLength in local file header vs central directory file header HOT 2
- use bigint HOT 1
- Unzipping nested folders from archive HOT 1
- What file types are supported? HOT 1
- end of central directory record signature not found HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from yauzl.