zip-rs / zip-old Goto Github PK

Zip implementation in Rust

License: MIT License

Rust 100.00%

zip-old's Introduction

zip has moved

This repository was formerly the source of the zip Rust crate for compressing and decompressing ZIP files, but that has moved to https://github.com/Pr0methean/zip. Please submit all issues and pull requests there, and close any existing copies here. Once the existing ones are closed, this repository will be archived.

zip-old's People

Contributors

Stargazers

Watchers

Forkers

steveklabnik jgraichen hpcorona tempbottle alexbool eugene-bulkin anti-social booktrack icefoxen rowedonalde ob stanmihai4 vandenoever nooberfsh glasserc seb-odessa fauxfaux pkgw jacobh killercup king6cong lsr0 anvie rardiol dbrgn matklad snapview gilescope livingthought songbird0 bfrascher drupalio baskerville roki1988 jturner314 xmclark ctjhoa aloxaf camchenry srijs dblohm7 galaxie jmerdich damiencollard dfrankland gurgalex jonpas clouds56 ryokik mibac138 jdepooter isgasho exphp-forks almielczarek berto- izderadicka benjaminri zacps rzip noah-kennedy eijebong manuthambi johannesmaibaum charles-lei alyoshavasilieva akinnane spanfile stchris dr-emann tonyhb nercury jeremyaube mcofficer songlinshu michel-slm golemfactory nasrullo nickbabcock davide-romanini inorick cjgriscom rfilmyer qu1x 49nord htrefil dkg zamazan4ik markmmm contextualist khuey zumoshi gelbpunkt kongy mbrubeck transparencies striezel-stash stephaneworkspace lunnova anyrun a1phyr

zip-old's Issues

Make an additional optional feature to use `deflate` and `inflate` instead of `flate2`

I think it makes sense to provide some additional feature-guarded backends / "deflate engines" to the end user, currently flate2 is used, but the problem with it is that it relies on some C libraries and even the rust_backend pulls too much C-related things and requires a refactoring. When I tried to port zip-rs to WebAssembly (Rust now has a target which allows you to compile the library as WebAssembly module) I struggled with flate2 and its dependencies (notably it's rust_backend dependency) and had to replace the usage of flate2 by the combination of deflate and inflate (that was the easiest way to go), both provide pure Rust libraries for compression and decompression. The API of inflate is very similar to the one provided by flate2, so the changes were trivial, deflate has however a different API and has to be adapted a bit to be able to work with zip-rs.

IMHO it also makes sense to enable/disable compression and decompression seperately (some people may only either compression or decompression, which is especially important for things like WebAssembly where you don't want to blow up the size of the compiled module).

is it possible to update a ZIP file ?

Hi,

Is it possible to update a file in a ZIP file without having to decompress and recompress all the files ?

Open ZIP archive protected by password

concurrent reads

Hello Maintainer,

Firstly, thank you for creating and maintaining this library!

Secondly, are you interested in adapting this library to support concurrent reads?

For my application, I need to read entries (at random) from a zip file over an extended period of time. The reads might be concurrent, and might be interleaved (read a few bytes from zip entry A, then a few bytes from zip entry B, then A again, etc).

The current API shape makes this use-case impossible without creating multiple ZipArchives.

I've hacked the code to accomplish my goals, and thought I'd share it here in case you or any library users are interested in doing something similar.

I'd rather not maintain a separate fork of this library just to suit my use-case if it can be avoided, so if there is interest in incorporating any kind of support for concurrent reads, I'm happy to help out.

If there is no interest, please don't hesitate to close this issue.

Add sanitize_filename from samples/extract.rs as library function

This is a complex enough function, and specific to zip files, that it would be nice to have it as a library function. Possibly as name_sanitized on ZipFile.

Issues unzipping file

I'm using this crate to decompress some ZIP files and I'm having issues to decompress this file (the extension is APK, but it's a ZIP file): https://apkpure.com/aliexpress-shopping-app/com.alibaba.aliexpresshd

I've dug a little in the problem and it seems that it crashes when it tries to parse extra fields. On the 34th file, the extra_field_length field has 3 as a value, which provokes that extra_field ends with a buffer of 3 positions (in particular, the buffer is [0, 0, 0]).

I've checked the specification and it seems that extra field should contain pairs of HeaderId (u16) and data size(u16), but the given buffer is not multiple of 4, which provokes that parse_extra_field function returns an Err, which is propagated up to the zip::ZipArchive::new method.

It seems like this ZIP is not well formed, but other ZIP utilities on my computer are able to decompress it. If I comment the calls to parse_extra_field the file can be decompressed properly. Do you know if can be related to some of the ZIP extensions that you state on the README that are not yet supported?

Is there anything that I can do to help solving this issue?

Thanks

Add extract_all to ZipArchive

Another convenience function request, it'd be nice if there were an extract_all method on ZipArchive that you could specify a path to extract files to. Basically like the extract.rs example. This makes more sense than each user of the library that wants to extract a zip file having to replicate that code themselves.

set comment/unix_mode/last_modified fields when creating a zipfile

Hi,

It's not an issue, just a question.
I've try to set unix_mode and last_modified fields when creating a file with ZipWriter, with no success.

After a short review, setting this two fields doesn't seem too complicated (probably in write_central_directory_header and write_local_file_header), maybe it is not implemented because no one has never needed it ? :-)

Can not read certain ZIP files on windows

Certain ZIP files when read on windows cause an error. This does not happen with windows explorer's built-in ZIP reader, 7-zip, etc., however, 7-zip gives a warning saying "Headers Error".

The error:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Io(Os { code: 87, kind: Other, message: "The parameter is incorrect." })', libcore\result.rs:945:5
error: process didn't exit successfully: `target\debug\zip-rs-error.exe` (exit code: 101)

The version of windows used is Windows 7 Ultimate. The error does not occur on linux.

The code:

extern crate zip;

use std::fs::File;
use zip::read::ZipArchive;
use std::io::{Read, Cursor};

fn main() {
    let mut buf = Vec::new();
    File::open("archive.zip").unwrap().read_to_end(&mut buf).unwrap();
    let _archive = ZipArchive::new(Cursor::new(buf)).unwrap();
}

The unreadable archive
Test case (also contains the unreadable archive)

ZipWriter doc sample fails to write

Running the ZipWriter sample exits with an error:

ZipWriter drop failed: Io(Error { repr: Custom(Custom { kind: WriteZero, error: StringError("failed to write whole buffer") }) })
Result: Err(Io(Error { repr: Custom(Custom { kind: WriteZero, error: StringError("failed to write whole buffer") }) }))

Integrity check

It would be useful to have the option to check the integrity of a zip file, like unzip -t.

Zip gets created with inner-zip without file extension

Hi. Thanks for this repo so far.

I struggling around for some hours now: If I create a zip, insert some data/files in it and finish it, I get a zip, which contains a zip file (without file extension). This inner zip file then contains the data/files.

Here's my code. Maybe, I do something weird. But I can't find where's the mess happening.

/// Just the bytes of `file_exchange`. Infos are located at the
/// "main table" `file_exchange`.
#[derive(Insertable, RustcDecodable, RustcEncodable)]
#[table_name="file_exchange_bytes"]
pub struct FileExchangeBytesInsert {
    /// Id of file_exchange entry.
    pub file_exchange_id: i32,
    /// Buffer.
    pub bytes: Vec<u8>,
}

impl FileExchangeBytesInsert {
    pub fn zip(mut self) -> Self {
        info!(mod_logger("FileExchangeBytesInsert zip fn"), "ZIPing file exchange: {}", self.file_exchange_id);

        let buf = vec![];
        let w = Cursor::new(buf);

        let mut zip = zip::ZipWriter::new(w);
        let options = zip::write::FileOptions::default()
            .compression_method(zip::CompressionMethod::Bzip2);

        let mut file_info = FileExchangeQuery::get_by_id(self.file_exchange_id);

        let inner_file_name = file_info.file_name
            .replace(".bz2", ".txt");

        zip.start_file(inner_file_name, options)
            .expect("Could not start creating zip file");

        zip.write_all(self.bytes.as_slice())
            .expect("Could not write buffer to zip file");

        self.bytes = zip.finish()
            .expect("Could not finish zip file")
            .into_inner()
            .to_vec();

        file_info.file_size = self.bytes.len() as i32;
        let _rslt = file_info.update();

        self
    }
}

Random access support

It would be great to be able to access ZipFiles by their names (so without iteration), something like:

let zip_file = zip_reader.get_file("hard_to_find.rs");
let reader = zip_reader.read_file(zip_file);

Thanks!

Parallel decompression of a single entry

It seems the requirement in #14 is a bit different from mine. I am currently stress testing zip-rs and I found that it uses 100 % of a single logical CPU for quite a long time on a beefy workstation just to decompress a 4 GiB large file consisting of zeroes. Would it be possible to distribute decompression across logical CPUs? I couldn't quickly make Rayon's parallel iterators work.

CRC reported by ZipFile::crc32() seems to be the actual CRC of compressed file, instead of the one saved in ZIP file structure

Hi,

First of most, thanks for creating this library, using it is a joy 😄

I've stumbled upon a problem and I am not sure whether I am doing something wrong, or there is a problem in a library. Part of my application's functionality would be to check whether a given ZIP file is damaged; I wanted to do this by reading a CRC32 from a file structure and then compare it with the CRC32 I've calculated from the file which was uncompressed to some temporary directory. To check if it works okay, I've modified an existing, small ZIP file in a hex editor and changed value of CRC32 to an incorrect value by hand. You can find this file here. As expected, zip tool sees it as an invalid zip file:

$ zip -T wrongcrc.zip
hello.txt               bad CRC 4a92ba89  (should be 13371337)
test of wrongcrc.zip FAILED

zip error: Zip file invalid, could not spawn unzip, or wrong unzip (original files unmodified)

However, trying to use the crc32() method from a ZipFile object gives me the correct CRC32, instead of the wrong one that is present in this file. Minimal example showing this issue can be found in the following repository:
https://github.com/michalrud/zip-crc-example

Am I using the library wrong? Maybe I misunderstood the documentation? Thanks in advance for any help :)

finish method signature

https://github.com/mvdnes/zip-rs/blob/master/src/write.rs#L188
does method:
pub fn finish(mut self) -> ZipResult

must be
pub fn finish(&mut self) -> ZipResult

why not pass here a reference?

Feature: benchmark comparison with other popular libraries

Hello,

It would be great if there were comparison benchmarks available somewhere for zip-rs. This would make it easy to see what libraries perform best against which inputs, and help the library selection process.

Thanks for considering!

Add ZipFile.is_directory()

Suggest to yank 0.2.9

I've been thinking that the 0.2.9 should have been a 0.3.0 as the deflate feature is a breaking change. Users of <0.2.9 can't necessarily use 0.2.9 and vice versa users of 0.2.9 can't necessarily use <0.2.9. It's just a suggestion though, if you don't agree feel free to close this issue 😄

(I created this issue to avoid hijacking #23 anymore than I already have...)

performance isn't great

Hi, I tried to use this project to extract a billing csv file

I tried a 780mb file(compressed size), uncompressed size was 13gb.

when I tried to walk over the lines in the file using a buffer, I got rather bad throughput overall, so I tried to benchmark just the unzip process.

zcat {filename} > /dev/null finished in ~ 52 seconds
uncompressing and copying using std::io::copy(to /dev/null as well) took more than 40min, until I got annoyed with the CPU fan noise and shut it down.

is there any configuration/version you think will make a difference?

Why podio?

Why create another BE/LE library when there's byteorder (repo) which is already stablished, works with old and new io and it's more widespread and tested?

Cannot read zip from byte stream

This is more of a question than it is a bug, I'm sure. Given the following code:

fn index(req: &HttpRequest) -> FutureResponse<HttpResponse> {
  // Get the zip file in the form of bytes
  return req
    .body()
    .limit(512000)
    .from_err()
    .and_then(move |b: Bytes| {
      println!("{:?}", b);

      let mut archive = zip::read::read_zipfile_from_stream(&mut b).unwrap(); // <- error here.

      Ok(HttpResponse::NoContent().into())
    })
    .responder()
}

Obviously I do not have an actual stream, but I do have the bytes. How can I create an archive purely from bytes? I've looked into stdin, but that doesn't quite fit the bill. I'm trying to read an archive that's being sent in the form of a curl request. Is there some way I can achieve this with my current code as the framework?

Additional tests

I was thinking about implementing additional tests and I wanted to get some feedback before implementing anything.

First one I was thinking about was a circular zip test.

Zip some arbitrary data with zip-rs
Unzip that data with zip-rs
Assert that start data == end data

This seems like something that should always be true, no matter what. It also seems like a usability issue if zip-rs was somehow able to create a zip file that it couldn't read. This would also provide a good test for fuzzing in the near future.

Additional test ideas are welcome. Thanks.

Can ZipFile implement Seek?

I feel like this might have been asked before; if so, I apologize.

Is it possible for the ZipFile type to implement the Seek trait? I'm honestly not sure, but it would be useful to have the option, even if it involves iterating through the whole file decompressing it on the fly.

Thanks.

CRC Error on binary files

I'm using zip-rs to compress multiple binary files into one zip.
Zipping some text files works fine, but when I'm doing this with binary files, the CRC is wrong.
Using 7zip I can still extract the data and if I'm testing it with an MP3 file VLC does not emit any errors on the extracted file, thus the data itself is correct.

My zipping code:

let mut dir = PathBuf::from(&CONFIG.general.temp_dir);
    dir.push(folder);

    if try!(metadata(dir.as_path())).is_dir() {

        let output_file = try!(File::create(zip_path));
        let mut writer = zip::ZipWriter::new(output_file);

        for entry in try!(read_dir(dir)) {
            let entry = try!(entry);
            if try!(entry.metadata()).is_file() {
                try!(writer.start_file(entry.file_name().to_string_lossy().into_owned(), zip::CompressionMethod::Deflated));
                let mut reader = try!(File::open(entry.path()));
                let _ = reader.sync_data();
                try!(copy(& mut reader,& mut writer));
            }
        }
        try!(writer.finish());
        trace!("finsiehd zipping");
        Ok(())
    }else{

'CRC failed in test.bin File is broken.'

If I use CompressionMethod:Stored -> No compression, everything's fine.

Support zip file unpacking

It would be great for ZipArchive to be able to automatically unpack a zip file when give an output path.

For example, unpacking a zip file in the current directory:

let fname = std::path::Path::new("archive.zip");
let file = fs::File::open(&fname)?;
let mut archive = zip::ZipArchive::new(file)?;
let output_path = ".";
archive.unpack(output_path);

Comparatively poor performance (both speed & size)

Compared to the zip command-line tool, or compared to the standard OpenJDK implementation (which is equivalent), zip-rs has poor performance in both senses: it is slow (~3x slower than the standard implementations and equivalent compression), and it has only approximately the worst deflate compression available with zip.

Since compression can be a rate-limiting operation (in my case it is), this is a rather significant drawback. I realize that a significant part of this is due to deficiencies in libflate, but it still is a substantial negative for this crate. If a native implementation that performs acceptably is too tricky to create, a feature wrapping libzip would render the crate usable.

If you find some file of a dozen MB or so and name it speed.log, the following will test the performance:

#[derive(Debug, Clone, Copy)]
pub struct CompressionResult {
    pub dt: f64,
    pub MBps: f64,
    pub shrink: f64
}
impl CompressionResult {
    pub fn new(dt: f64, MBps: f64, shrink: f64) -> CompressionResult {
        CompressionResult { dt, MBps, shrink }
    } 
    pub fn from(dt: f64, old_size: u64, new_size: u64) -> CompressionResult {
        let MBps = ((old_size as f64)/(1024.0*1024.0))/dt;
        let shrink = (new_size as f64)/(old_size as f64);
        CompressionResult {
            dt: (dt*1000.).round()/1000.,
            MBps: (MBps*100.).round()/100.,
            shrink: (shrink*1000.).round()/1000.
        }
    }
}

pub fn compress_the_target(cs: &str, ct: &str) -> CompressionResult {
    use std::*;
    use std::io::Write;
    let source_p = path::Path::new(cs);
    let target_p = path::Path::new(ct);
    if target_p.exists() { fs::remove_file(target_p); }
    let data = fs::read(source_p).unwrap();
    let t0 = time::Instant::now();
    {
        let target = fs::File::create(target_p).unwrap();
        let buffer = io::BufWriter::with_capacity(65536, target);
        let mut zw = zip::ZipWriter::new(buffer);
        zw.start_file(
            source_p.file_name().unwrap().to_str().unwrap(),
            zip::write::FileOptions::default().compression_method(
                zip::CompressionMethod::Deflated
            )
        );
        zw.write(data.as_ref());
        zw.finish();
    }
    let elapsed = {
        let d = t0.elapsed();
        (d.as_secs() as f64) + (d.subsec_nanos() as f64)/1e9
    };
    let old_size = fs::metadata(source_p).unwrap().len();
    let new_size = fs::metadata(target_p).unwrap().len();
    
    CompressionResult::from(elapsed, old_size, new_size)
}

fn main() {
    let compression_source = "speed.log";
    let compression_target = "speed-rust.zip";
    println!("{:?}", compress_the_target(compression_source, compression_target));
}

You can get the same report for the command-line version by calling it as an external process, e.g. here in Python:

import time
import subprocess
import os

cmdline_compression_source = 'speed.log'
cmdline_compression_target = 'speed-cmdline.zip'

def compress_the_target():
    if os.path.isfile(cmdline_compression_target):
        os.remove(cmdline_compression_target)
    t0 = time.time()
    subprocess.run(['zip', '-7', cmdline_compression_target, cmdline_compression_source])
    elapsed = time.time() - t0
    log_size = os.stat(cmdline_compression_source).st_size
    zip_size = os.stat(cmdline_compression_target).st_size
    rate = (log_size/(1024*1024.0))/elapsed
    factor = zip_size/log_size
    os.remove(cmdline_compression_target)
    return (round(elapsed*1000)/1000, round(rate*100)/100, round(factor*1000)/1000)

compress_the_target()

Reverse write_dir example (read directory from zip file)?

Just a question, but there is no example of how to read a directory and their subdirectories from a ZIP file. Is this feature planned or just undocumented?

Panic: attempt to subtract with overflow

This fuzzer produced this panic (triggered by this line):

thread '<unnamed>' panicked at 'attempt to subtract with overflow', /Users/pascal/.cargo/git/checkouts/zip-rs-62c959c79813fe27/5c12e51/src/read.rs:88
stack backtrace:
   0:        0x10edb0f63 - std::sys::imp::backtrace::tracing::imp::unwind_backtrace::hfebda55a0148f0d8
   1:        0x10edb1cff - std::panicking::default_hook::{{closure}}::h42d7518fb451881e
   2:        0x10edb197b - std::panicking::default_hook::h187f0cce2cdf6403
   3:        0x10edb3e6a - std::panicking::rust_panic_with_hook::hcc9e45ce1503358a
   4:        0x10edb3d04 - std::panicking::begin_panic::h64dd529720e55854
   5:        0x10edb3c82 - std::panicking::begin_panic_fmt::h3c31ecee09d7435b
   6:        0x10edb3be7 - rust_begin_unwind
   7:        0x10edb5f50 - core::panicking::panic_fmt::hd9dc6c4915cbf1ab
   8:        0x10edb5e54 - core::panicking::panic::h7b823a67daa03480
   9:        0x10ecc4ca7 - <zip::read::ZipArchive<R>>::new::h0f01a25881b5e655
  10:        0x10ecf9f48 - rust_fuzzer_test_input
  11:        0x10ed2ab6c - std::panicking::try::do_call::hc13bd523a440d526

using this input (1.2kB of Rust binary string, did not try to shrink this):

b"PK\x03\x04\n\x00\x00\x00\x00\x00\xe9p\xdaJ\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x04\x00\x1c\x00zip/UT\t\x00\x03\xf5\xf8PY\"\xf9PYux\x0b\x00\x01\x04\xf5\x01\x00\x00\x04\x14\x00\x00\x00PK\x03\x04\x14\x00\x00\x00\x08\x00\xe9p\xdaJ\xf6\xe3\xf0\xa6\xd6\x00\x00\x00\x04\x18\x00\x00\r\x00\x1c\x00zip/.DS_StoreUT\t\x00\x03\xf5\xf8PY\xf5\xf8PYux\x0b\x00\x01\x04\xf5\x01\x00\x00\x04\x14\x00\x00\x00\xed\x981\n\xc2@\x10E\xff\xc4 \x01\x9b--\xf7\n\xde`\tz\x02/ (\xd8\x88\x91H\xacSy.\x8ff\xc2~Q4\x81X)\xfa\x1f\x0c\xaf\xc8\xcc&ivv\x16\x80\xe5\xd5f\x068\x00\x19\xa2qF\'\x19\xe3\x85\x84\xb6\x18\xcd\x1a\x15\x8e\xa7e\xb9\xdf\x15\xfbm\xf7Z/\xb4\xb5c\xacQ\xa0\xc4\xe1\xa9\xde\x06\xae!\x84\x10B\x88\xe1\xb0\xbff\x93\xcf~\x86\x10\xe2\x0bi\xf7\x07O\x07\xba\x8e6>O\xe8\xf4\xa1\xc6\xd1\x9e\x0et\x1dm\xccK\xe8\x94\xcehG{:\xd0u47-\xe3\xf0a|\xb3qB1G{:\xbc\xf5\xcbB\xfc\r\xa3(\xd7\xf6\xff\x05z\xe7\x7f!\xc4\x0fc\xe9|9\xcf\xd1\x7f\xe1\xd6\xf6Z\xdf\xc4\x8a9\x97[a\xcfA \x89\x17\x86S\xdc\xf3<\x1d\xe8:Z\x87\x01!>\xc1\x15PK\x03\x04\x14\x00\x00\x00\x08\x00\\g\xdaJ\x16\xd3\x86m\x9d\x00\x00\x00\xf7\x00\x00\x00\x0e\x00\x1c\x00zip/Cargo.tomlUT\t\x00\x03\xf0\xe8PYl\xf8PYux\x0b\x00\x01\x04\xf5\x01\x00\x00\x04\x14\x00\x00\x00\x85\x8f1\x0e\xc3 \x0cEw\x9f\x02\xb1\x07:W\xeaIP\x06\x08N@M\x08\xc2\x10\xa9\xa9z\xf7\x9a\xa1\xaa:U\x9e\xfe\xb7\x9f\xbfm\xb2\x9d\xeev\xc1\x11\x92\xddP\xdc\x84<c\x1e\xaa-\x0bV\x92p`\xa1\xb8\xa7\xee_\x14\x97\x84\xdc\xdc\x1a)\xb03\xdb\x95\x10\xc0x\xcc\x98<\xa6)\"\x8d\xc08\xf7\x9eb\x89\xb5S\xa1\xd6LW\xadY\x86\xe6\xd4\xb4oz;|B\xd2=\xa7\x90\x14/X\xa3\x9b\xdbyb\x19\xe8A\x7f\xe0\xd2\xa8\x0e}Z\xffP\x8aG\xfa*0\xc6\xc54~\xbf)h=\x1fmk\xf8(\xc5\xa1\xf0\x06PK\x03\x04\n\x00\x00\x00\x00\x00zh\xdaJ\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0b\x00\x1c\x00zip/corpus/UT\t\x00\x03\x00\x00\x04\x18\x00\x00\r\x00\x18\x00\x00\x00\x00\x00\x00\x00\x00\x00\xa4\x81>\x00\x00\x00zip/.DS_StoreUT\x05\x00\x03\xf5\xf8PYux\x0b\x00\x01\x04\xf5\x01\x00\x00\x04\x14\x00\x00\x00PK\x01\x02\x1e\x03\x14\x00\x00\x00\x08\x00\\g\xdaJ\x16\xd3\x86m\x9d\x00\x00\x00\xf7\x00\x00\x00\x0e\x00\x18\x00\x00\x00\x00\x00\x01\x00\x00\x00\xa4\x81[\x01\x00\x00zip/Cargo.tomlUT\x05\x00\x03\xf0\xe8PYux\x0b\x00\x01\x04\xf5\x01\x00\x00\x04\x14\x00\x00\x00PK\x01\x02\x1e\x03\n\x00\x00\x00\x00\x00zh\xdaJ\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0b\x00\x18\x00\x00\x00\x00\x00\x00\x00\x10\x00\xedA@\x02\x00\x00zip/corpus/UT\x05\x00\x03\x18\xeaPYux\x0b\x00\x01\x04\xf5\x01\x00\x00\x04\x14\x00\x00\x00PK\x01\x02\x1e\x03\x14\x00\x00\x00\x08\x00zh\xdaJ\xcbL\xc0{\x07\x00\x00\x00K\x00\x00\x003\x00\x18\x00\x00\x00\x00\x00\x01\x00\x00\x00\xa4\x81\x85\x02\x00\x00zip/corpus/9fdad349dac578687a62907dd7ba4295801fa566UT\x05\x00\x03\x18\xeaPYux\x0b\x00\x01\x04\xf5\x01\x00\x00\x04\x14\x00\x00\x00PK\x01\x02\x1e\x03\x14\x00\x00\x00\x08\x00\x98h\xdaJ\xae2\xf2\xfa\xfd\x00\x00\x00\x97\x01\x00\x00\x0b\x00\x18\x00\x00\x00\x00\x00\x01\x00\x00\x00\xa4\x81\xf9\x02\x00\x00zip/read.rsUT\x05\x00\x03O\xeaPYux\x0b\x00\x01\x04\xf5\x01\x00\x00\x04\x14\x00\x00\x00PK\x01\x02\x1e\x03\n\x00\x00\x00\x00\x00\x01q\xdaJ\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\n\x00\x18\x00\x00\x00\x00\x00\x00\x00\x10\x00\xed\xbf\xc4\xfb\xff\xff\x85\x96\x85/seeds/UT\x05\x00\x03\"\xf9PYux\x0b\x00\x01\x04\xf5\x01\x00\x00\x04\x14\x00\x00\x00PK\x01\x02\x1e\x03\x14\x00\x00\x00\x08\x00\xedp\xdaJj\x00\x88m\xb2\x00\x00\x00\x04\x18\x00\x00\x13\x00\x18\x00\x00\x00\x00\x00\x00\x00\x00\x00\xa4\x81\x7f\x04\x00\x00zip/seeds/.DS_StoreUT\x05\x00\x03\xfd\xf8PYux\x0b\x00\x01\x04\xf5\x01\x00\x00\x04\x14\x00\x00\x00PK\x05\x06\x00\x00\x00\x00\x08\x00\x08\x00\xb5\x02\x00\x00~\x05\x00\x00\x00\x00"

I'm using the latest master version of zip-rs (5c12e51) and rustc 1.20.0-nightly (c9bb93576 2017-06-24). The fuzz runner script enables optimizations as well as debug assertions.

Single-pass ZIP generation

ZipWriter currently expects the underlying stream to be seekable. The seekability is used to update the local file header after the file is complete (and also to determine some offsets).

The ZIP file format specification supports writing an archive in one pass without seeking. This mode may be useful in some cases (for example, streaming a ZIP archive of several large files without generating the archive ahead of time).

Would you be interested in implementing this mode and lifting the seekability requirement?

Implement Debug for ZipArchive

It would be handy to be able to view the internal values for debugging purposes. Currently, I am trying to figure out why by_name() can't find my file, which I verified is indeed in the zip via by_index().

Support of files more than 4Gb

Hi,
I was faced with this issue while worked on fb2 library tool.
E.g fb2-545000-549999.zip has 5190592358 bytes length and won't be processed by the zip-rs.
I have attached torrent file as a target for reproducing.

Why is `ZipArchive::new()` fallible?

The documentation doesn’t say why this method could fail. I can't think of the reason. Could you explain and document that?

Can ZipArchive::by_index/by_name not be mut?

It just feels like a weird abstraction break to have a fundamentally read-only operation require the ZipArchive to be mutable, when all that's ever changing is the internal reader object which won't be shared by anything else anyway. It seems like wrapping it in a RefCell would let you pretend everything is immutable, since the API is entirely immutable anyway.

I'm not sure this is a good or feasible idea, it's just a suggestion. I might try implementing it sometime and see how it works out.

file truncated when decompressing DEFLATE

I'm using version 0.2.5: "checksum zip 0.2.5 (registry+https://github.com/rust-lang/crates.io-index)" = "12143b8b0e8d215391a2c6362d54b482589cb469b0fa4d047c42067cd4bcb311"

Reproducing

Get test.zip, then run these scripts:
https://gist.github.com/Gjum/f7e95799183b5710a6c9341463e167a0

zip-rs stops mid-file and leaves the remaining bytes unchanged,
while Python's deflate module successfully extracts the whole file.

The first omitted byte is 0xA0 in this example. Many other similarly created files I tested this with get truncated too, at different byte offsets and different first omitted byte values.

test.zip was created using Python's zipfile module with its DEFLATE method. When using STORE, everything works well.

Expected result

output from reproduce.py, see test-py.out for the full output

7A 00 01 F0 00 00 00 00 00 00 00 00 00 00 00 00 23 
7B 00 02 F0 00 00 00 00 00 00 00 00 00 00 00 00 23 
7B 00 02 F0 00 00 00 00 00 00 00 00 00 00 00 00 23 
7B 00 02 F0 00 00 00 00 00 00 00 00 00 00 00 00 23 
7B 00 02 F0 00 00 00 00 00 00 00 00 00 00 00 00 23 
7B 00 02 F0 00 00 00 00 00 00 00 00 7C A0 06 F0 23 <- difference starts here
7B 00 02 F0 00 00 00 00 00 00 00 00 00 00 00 00 23 
7B 00 02 F0 00 00 00 00 00 00 00 00 00 00 00 00 23 
7C 00 02 F0 00 00 00 00 00 00 00 00 00 00 00 00 23 
7E 00 02 F0 00 00 00 00 00 00 00 00 00 00 00 00 23 
7E 00 02 F0 00 00 00 00 00 00 00 00 00 00 00 00 23 
7E 00 02 F0 00 00 00 00 7F 00 8B F1 00 00 00 00 23 
7E 00 02 F1 00 00 00 00 00 00 00 00 00 00 00 00 23

Actual result

output from reproduce.rs, see test-rs.out for the full output

7A 00 01 F0 00 00 00 00 00 00 00 00 00 00 00 00 23 
7B 00 02 F0 00 00 00 00 00 00 00 00 00 00 00 00 23 
7B 00 02 F0 00 00 00 00 00 00 00 00 00 00 00 00 23 
7B 00 02 F0 00 00 00 00 00 00 00 00 00 00 00 00 23 
7B 00 02 F0 00 00 00 00 00 00 00 00 00 00 00 00 23 
7B 00 02 F0 00 00 00 00 00 00 00 00 7C 00 00 00 00 <- difference starts here
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... all zeroes from here

Can you add a method to get file_name_raw for `ZipFile`?

Thanks.
To solve the gibberish because of the character set of zipfile's name is not utf-8.

Command line tools?

Any plan to provide a pair of command line tools? Such as zip and unzip.

Zip file does not work for LÖVE2D

I program games in Lua using the LÖVE game framework. I am developing a build tool which will automatically create .love files (just a renamed .zip file) and do a lot of zipping in general. I know that LÖVE uses PhysicsFS for working with zip files.

For some reason, the zip file created by zip-rs will not load with LÖVE, but other zip files created by Info-ZIP or 7-Zip will work with no issue.

I have attached a zip file created by Info-ZIP and one created by zip-rs. The file being zipped is exactly the same. This is a diff of the two files, the one on the left is zip-rs (red), the one on the right is Info-ZIP (green).

I have been able to do a workaround by changing the default system byte to Unix, (camchenry@63a3ec3) and then it works fine. If I set it to Unix, it seems to load just fine. Even more strange, changing that same byte in zip files generated by other programs seems to have no effect.

Stack Overflow for very simple code

This code runs but then panics with thread 'main' has overflowed its stack:

extern crate zip;

use zip::*;
use zip::write::*;
use std::io::Write;

fn main() {
    let mut out = ZipWriter::new(std::fs::File::create("out.zip").expect("create file"));
    out.start_file("test.txt", FileOptions::default());
    write!(out, "Hello, World!");
}

If you change it to use CompressionMethod::Stored instead, then it works just fine. Older versions before the feature-gated deflate compression seem to also work, as does bzip2 and running in Release mode. Perhaps the deflate code is too deep when unoptimized, with the new feature-gating tricking the compiler somehow? Perhaps that makes it a flate2 issue? I'm also on Windows

Copy/Clone of FileOptions

It would be really nice if you would add the copy / clone trait for FileOptions as I have to create this struct again for every file, leading to massive overhead.

move occurs because `f_options` has type `lib::zip::write::FileOptions`, which does not implement the `Copy` trait

Implement Iterator trait for ZipArchive

It would be fairly useful to be able to loop over the files in ZipArchive rather than have to use by_index (if you're willing to ignore errors, that is).

Currently it is impossible to implement Iterator for ZipArchive<R> because of how the struct is created and the fact that R is only Read + Seek. I'm not sure if it actually would be doable without modifying the struct implementation in a way that would harm the utility of the whole thing, but it would be helpful if it could be done.

Implement core::convert::From so by_name can be used in try!()

Getting the following:

Compiling gtfs v0.0.1 (file:///C:/users/nolan/Projects/Gtfs-rs)
:6:1: 6:32 error: the trait core::convert::From<zip::result::ZipErr or> is not implemented for the type std::io::error::Error [E0277]
:6 $ crate:: convert:: From:: from ( err ) ) } } )
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
:1:1: 6:48 note: in expansion of try!
src\feed.rs:19:16: 19:58 note: expansion site

Attempting the following:

let file = try!(self.archive.by_name("agencies.txt"));

New to Rust so perhaps I'm missing something. I'd like to punt all zip errors up the callstack.

Thanks.

Does not support ZIP64

I just ran into a problem because I have a Zip file with ~80000 files in it, and zip-rs can't handle it since you need to support ZIP64 to work with archives with more than 65535 files.

This is pretty urgent for me so I am going to see what I can do to get my use case working. It seems like it would be good to have an issue open to track the general task, though, so here we are.

Zip-RS extract.rs example absolute extraction paths

The examples/extract.rs script does not protect against files being extracted to arbitrary directories outside the current directory. Whilst it appears some effort is made to prevent path traversal from zip entry filenames containing ../ (in sanitize_filename), this does not protect against entry filenames with absolute paths.

For example, a zip entry with a filename of /tmp/outfile will be extracted to /tmp/outfile instead of ./tmp/outfile. See zip_3_absolute_directory.zip for an example of this.

Whilst this is not an issue with the zip-rs library itself, having an insecure example which other developers may copy is less than ideal.

thread 'main' panicked at 'capacity overflow' -- with 714bytes file

Hi, I was fuzzing zip-rs when it found this panic:

thread 'main' panicked at 'capacity overflow', libcore/option.rs:916:5
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::print
             at libstd/sys_common/backtrace.rs:71
             at libstd/sys_common/backtrace.rs:59
   2: std::panicking::default_hook::{{closure}}
             at libstd/panicking.rs:206
   3: std::panicking::default_hook
             at libstd/panicking.rs:222
   4: std::panicking::rust_panic_with_hook
             at libstd/panicking.rs:400
   5: std::panicking::begin_panic_fmt
             at libstd/panicking.rs:347
   6: rust_begin_unwind
             at libstd/panicking.rs:323
   7: core::panicking::panic_fmt
             at libcore/panicking.rs:71
   8: core::option::expect_failed
             at libcore/option.rs:916
   9: <core::option::Option<T>>::expect
             at /checkout/src/libcore/option.rs:302
  10: <alloc::raw_vec::RawVec<T, A>>::allocate_in
             at /checkout/src/liballoc/raw_vec.rs:89
  11: <alloc::raw_vec::RawVec<T>>::with_capacity
             at /checkout/src/liballoc/raw_vec.rs:144
  12: <alloc::vec::Vec<T>>::with_capacity
             at /checkout/src/liballoc/vec.rs:363
  13: <zip::read::ZipArchive<R>>::new
             at /home/paulg/.cargo/git/checkouts/zip-rs-62c959c79813fe27/806147a/src/read.rs:180
  14: zip_read::main::{{closure}}
             at /home/paulg/Projects/fuzz-targets/common/src/lib.rs:758
             at src/bin/zip_read.rs:8
  15: honggfuzz::fuzz
             at /home/paulg/.cargo/registry/src/github.com-1ecc6299db9ec823/honggfuzz-0.5.18/src/lib.rs:277
  16: zip_read::main
             at src/bin/zip_read.rs:7
  17: std::rt::lang_start::{{closure}}
             at /checkout/src/libstd/rt.rs:74
  18: std::panicking::try::do_call
             at libstd/rt.rs:59
             at libstd/panicking.rs:305
  19: __rust_maybe_catch_panic
             at libpanic_unwind/lib.rs:101
  20: std::rt::lang_start_internal
             at libstd/panicking.rs:284
             at libstd/panic.rs:361
             at libstd/rt.rs:58
  21: std::rt::lang_start
             at /checkout/src/libstd/rt.rs:74
  22: main
  23: __libc_start_main
  24: _start

here is the code:

    let reader = std::io::Cursor::new(data);
    let mut archive = if let Ok(x) = zip::ZipArchive::new(reader) { x } else { return; };

    for i in 0..archive.len() {
        use std::io::prelude::*;

        let file = archive.by_index(i).unwrap();
        let _size = file.bytes().count();
    }

here is the file
https://drive.google.com/file/d/1WPRI4ed8JDMwQW9BDz-9z_tkqbl93407/view?usp=sharing

Handle symlinks

Hello,

I have a ZIP archive created on linux using the --symlinks option.
So my archive contains symlinks.

Yet, when I extract the corresponding files, they end up being text files with the name of the symlink target as their content.

How can I detect/handle symlinks?

Regards,

Zip file format not compatible with MS OFFICE xlsx

unzip some.xlsx file, and re-zip them to another.xlsx with zip-rs, an error dialog will show up when it's opened by MS Office.

Sanitization of windows file names

Hi,

Recently snyk found a vulnerability in malicious filenames in archives being used to extract to unwanted paths which I assume from the HN comments is an old issue. I tried the vulnerable file available at the repo . I found that the filenames are sanitized and there is a test case for the same. It works fine on linux systems. On linux machines for ../../../../../../../tmp/evil.txt it writes to tmp/evil.txt in the current folder

But I later found python to do some more additional validation on windows filenames. I couldn't try this on WIndows since I don't have access to a windows machine.

Python doc reference : https://docs.python.org/3/library/zipfile.html?highlight=zipfile#zipfile.ZipFile.extract
Python implementation : https://github.com/python/cpython/blob/b8c0845fee9277b1106ceecbf7592f8806c73ec8/Lib/zipfile.py#L1597

If a member filename is an absolute path, a drive/UNC sharepoint and leading (back)slashes will be stripped, e.g.: ///foo/bar becomes foo/bar on Unix, and C:\foo\bar becomes foo\bar on Windows. And all ".." components in a member filename will be removed, e.g.: ../../foo../../ba..r becomes foo../ba..r. On Windows illegal characters (:, <, >, |, ", ?, and *) replaced by underscore (_).

Feel free to close this if it's irrelevant or I am missing something.

Thanks for the library.

ZIP64 support is broken

I have a zip file which zip fails to open. Unfortunately I can't share it, however I've tracked the issue down to the zip64 extra field parsing.

Here's how zip currently handles it:

fn parse_extra_field(file: &mut ZipFileData, data: &[u8]) -> ZipResult<()>
{   
    let mut reader = io::Cursor::new(data);
    
    while (reader.position() as usize) < data.len()
    {
        let kind = reader.read_u16::<LittleEndian>()?;
        let len = reader.read_u16::<LittleEndian>()?;
        match kind
        {
            // Zip64 extended information extra field
            0x0001 => {
                file.uncompressed_size = reader.read_u64::<LittleEndian>()?;
                file.compressed_size = reader.read_u64::<LittleEndian>()?;
                reader.read_u64::<LittleEndian>()?;  // relative header offset
                reader.read_u32::<LittleEndian>()?;  // disk start number
            },
            _ => { reader.seek(io::SeekFrom::Current(len as i64))?; },
        };
    }
    Ok(())
}

And here's how unzip (the command line tool which properly unpacks my zip file) does it:

int getZip64Data(__G__ ef_buf, ef_len)
    __GDEF
    ZCONST uch *ef_buf; /* buffer containing extra field */
    unsigned ef_len;    /* total length of extra field */
{
    unsigned eb_id;
    unsigned eb_len;
    
/*---------------------------------------------------------------------------
    This function scans the extra field for zip64 information, ie 8-byte
    versions of compressed file size, uncompressed file size, relative offset
    and a 4-byte version of disk start number.
    Sets both local header and central header fields.  Not terribly clever,
    but it means that this procedure is only called in one place.
  ---------------------------------------------------------------------------*/

    if (ef_len == 0 || ef_buf == NULL)
        return PK_COOL;

    Trace((stderr,"\ngetZip64Data: scanning extra field of length %u\n",
      ef_len));

    while (ef_len >= EB_HEADSIZE) {
        eb_id = makeword(EB_ID + ef_buf);
        eb_len = makeword(EB_LEN + ef_buf);

        if (eb_len > (ef_len - EB_HEADSIZE)) {
            /* discovered some extra field inconsistency! */
            Trace((stderr,
              "getZip64Data: block length %u > rest ef_size %u\n", eb_len,
              ef_len - EB_HEADSIZE));
            break;
        }
        if (eb_id == EF_PKSZ64) {
        
          int offset = EB_HEADSIZE;

          if (G.crec.ucsize == 0xffffffff || G.lrec.ucsize == 0xffffffff){
            G.lrec.ucsize = G.crec.ucsize = makeint64(offset + ef_buf);
            offset += sizeof(G.crec.ucsize);
          }
          if (G.crec.csize == 0xffffffff || G.lrec.csize == 0xffffffff){
            G.csize = G.lrec.csize = G.crec.csize = makeint64(offset + ef_buf);
            offset += sizeof(G.crec.csize);
          }
          if (G.crec.relative_offset_local_header == 0xffffffff){
            G.crec.relative_offset_local_header = makeint64(offset + ef_buf);
            offset += sizeof(G.crec.relative_offset_local_header);
          }
          if (G.crec.disk_number_start == 0xffff){
            G.crec.disk_number_start = (zuvl_t)makelong(offset + ef_buf);
            offset += sizeof(G.crec.disk_number_start);
          }
        }
          
        /* Skip this extra field block */
        ef_buf += (eb_len + EB_HEADSIZE);
        ef_len -= (eb_len + EB_HEADSIZE);
    }

    return PK_COOL;
} /* end function getZip64Data() */

The unzip's logic here is obviously very different.

Support zip archive comment

I can see that there's an API for accessing the comment of a zip file but not for a zip archive.