lonami / lzxd Goto Github PK

View Code? Open in Web Editor NEW

14.0 14.0 3.0 91 KB

Home Page: https://crates.io/crates/lzxd

License: Apache License 2.0

Rust 100.00%

decompression

lzxd's People

Contributors

Stargazers

Watchers

Forkers

drchat valmyzk ikrivosheev

lzxd's Issues

Large tests in Git LFS

Follow-up to the discussion in #8. Another email from GitHub:

[GitHub] Git LFS disabled for Lonami

Git LFS has been disabled on your personal account Lonami because you’ve exceeded your data plan by at least 150%. Please purchase additional data packs to cover your bandwidth and storage usage:

https://github.com/account/billing/data/upgrade

Current usage as of 12 Dec 2023 05:16PM UTC:

Bandwidth: 1.59 GB / 1 GB (159%)
Storage: 0.13 GB / 1 GB (13%)

Quite annoying. Might have to rethink what to do with the tests or if it's worth keeping them around.

It seems CI also falls here, as the build failed due to reaching quota:

Fetching LFS objects
  /usr/bin/git lfs fetch origin refs/remotes/origin/master
  fetch: Fetching reference refs/remotes/origin/master
  batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
  Error: error: failed to fetch some objects from 'https://github.com/Lonami/lzxd.git/info/lfs'
  The process '/usr/bin/git' failed with exit code 2

Clarify LZX vs. LZXD

As I'm looking through this library, it isn't too apparent whether this library actually implements the LZX-DELTA algorithm.

For example, the header block is parsed according to the LZX specification as opposed to the LZX-DELTA specification, and this library is able to successfully decode an LZX bitstream.
In addition, a similar C library provides the ability to set reference data for the algorithm to process, whereas this library does not.

It might be better to rename this library to lzx and limit the functionality to that offered by the LZX algorithm (such as limiting the window sizes).
Or, alternatively, the mspack library could be used as reference for extending this library to support both LZX and LZX-DELTA.

Extracting file fails with errors

Hello! Thank you for great library!
I try to open cab archive using this library: https://github.com/mdsteele/rust-cab and get many errors:

w9xpopen.exe unpacker error=Custom { kind: Other, error: ChunkTooLong }
README.txt unpacker error=Custom { kind: Other, error: ChunkTooLong }
NEWS.txt unpacker error=Custom { kind: Other, error: OverreadBlock }
LICENSE.txt unpacker error=Custom { kind: Other, error: InvalidBlock(0) }
python.exe unpacker error=Custom { kind: Other, error: ChunkTooLong }
pythonw.exe unpacker error=Custom { kind: Other, error: ChunkTooLong }
python27.dll unpacker error=Custom { kind: Other, error: OverreadBlock }
...

File: python.zip

Related issue: mdsteele/rust-cab#14

Corrupted output if decompressed block is not padded to 32KB

I have a cabinet file here with a single file called blob that is compressed using LZX.
Using rust-cab - I can just about successfully decompress this file and access it.

However, at the very end of the file, it is evident that decompression is beginning to break down and fail:

⁮/CategoreType="Pr摯P捵ot" Prshabst Su换Categories="true"\Ex汣<摵le䉤T Default="falss" xmlns:cat="http://schemas.microsoft.com/msus/2002/12/UpdateHandlers/Category" />⼼<upd:Handle卲uiecffi䑣dat/>⼼<upd:Updat/

From some debugging output I put in lzxd - it seems that this corruption corresponds to the second decoded block. Looking at the output from the sliding window, perhaps it's a simple off-by-one error that is being compounded by the sliding window?

These are the bytes output at the end of decompression, (inexactly) corresponding with the above text:

572: [00, 2F, 00, 3E, 00, 3C, 2F]
579: [3C, 00, 75, 00, 70, 00, 64, 00, 3A, 00, 48, 00, 61, 00, 6E, 00, 64, 00, 6C, 00, 65, 00, 72, 53, 75, 00, 69, 00, 65, 00, 63, 00, 66, 00, 66, 00, 69, 00, 63, 44, 64, 00, 61, 00, 74, 00]
5A7: [2F, 00, 3E, 00, 3C, 2F, 3C, 00, 75, 00, 70, 00, 64, 00, 3A]
5B6: [00, 55, 00, 70, 00, 64, 00, 61, 00, 74, 00]
5C1: [2F, 00, 3E]

The format is UTF-16 LE, and the text lies within the ASCII character set, so every other byte should be 0x00.

At the beginning of the second block, you can see it start outputting invalid UTF-16:

101: [41, 00, 43, 00, 48, 00, 49, 00, 4E, 00, 45, 00, 22, 00, 20, 00, 53, 00, 75, 00, 62, 00, 6B, 00, 65, 00, 79, 00, 3D, 00, 22, 00, 53, 00, 4F, 00, 46, 00, 54, 00, 57, 00, 41, 00, 52, 00, 45, 00, 5C, 00, 4D, 00, 69, 00, 63, 00, 72, 00, 6F, 00, 73, 00, 6F, 00, 66, 00, 74, 00, 5C, 00, 4F, 00, 66, 00, 66, 00, 69, 00, 63, 00, 65, 00, 5C, 00, 31]
156: [30] <--------------- should be 00!
157: [31, 00, 2E, 00, 30, 00, 5C, 00, 50, 00, 6F, 00, 77, 00, 65, 00, 72, 00, 50, 00, 6F, 00, 69, 00, 6E, 00, 74, 00, 5C, 00, 49, 00, 6E, 00, 73, 00, 74, 00, 61, 00, 6C, 00, 6C, 00, 52, 00, 6F, 00, 6F, 00, 74, 00, 22, 00, 20, 00, 52, 00, 65, 00, 67, 00, 54, 00, 79, 00, 70, 00, 65, 00, 33, 00, 32, 00, 3D, 00, 22, 00, 74, 00, 72, 00, 75, 00, 65, 00, 22, 00, 20, 00, 78, 00, 6D, 00, 6C, 00, 6E, 00, 73, 00, 3A, 00, 62, 00, 61, 00, 72, 00, 3D, 00, 22, 00, 68, 00, 74, 00, 74, 00, 70, 00, 3A, 00, 2F, 00, 2F, 00, 73, 00, 63, 00, 68, 00, 65, 00, 6D, 00, 61, 00, 73, 00, 2E, 00, 6D, 00, 69, 00, 63, 00, 72, 00, 6F, 00, 73, 00, 6F, 00, 66, 00, 74, 00, 2E, 00, 63, 00, 6F, 00, 6D, 00, 2F, 00, 6D, 00, 73, 00, 75, 00, 73, 00, 2F, 00, 32, 00, 30, 00, 30, 00, 32, 00, 2F, 00, 31, 00, 32, 00, 2F, 00, 42, 00, 61, 00, 73, 00, 65, 00, 41, 00, 70, 00, 70, 00, 6C, 00, 69, 00, 63, 00, 61, 00, 62, 00, 69, 00, 6C, 00, 69, 00, 74, 00, 79, 00, 52, 00, 75, 00, 6C, 00, 65, 00, 73, 00, 22, 00, 20, 00, 2F, 00, 3E, 00, 3C]
250: [2F] <--------------- should be 00!
251: [3C, 00, 6C, 00, 61, 00, 72, 00, 3A, 00, 4F, 00, 72]
25E: [00, 3E, 00, 3C, 2F]

I'll document more as I figure it out in this issue.

Chunk length must be divisible by 2

Hello! Thank you for the library.
I got error: https://github.com/Lonami/lzxd/blob/master/src/lib.rs#L305

Sample: python.gz.

Here is a file with name: python275.chm. Compressed size one of the block (block number is 1307) is: 32785.

Here is lzx blocks: python275.chm.tar.gz
WindowSize is: MB2

Here is code to reproduce:

let mut pathes = std::fs::read_dir("/tmp/python275.chm").unwrap().map(|e| {
  let path = e.unwrap().path();
  let file_name = path.file_name().unwrap().to_str().unwrap().to_string();
  let parts = file_name.split(".").collect::<Vec<_>>();
  let idx = parts[3].parse::<usize>().unwrap();
  let compressed_size = parts[4].parse::<usize>().unwrap();
  let uncompressed_size = parts[5].parse::<usize>().unwrap();
  (path, idx, compressed_size, uncompressed_size)
}).collect::<Vec<_>>();

pathes.sort_by_key(|(_, idx, _, _)| *idx);

let mut lzxd = Lzxd::new(WindowSize::MB2);

for (path, idx, _, uncompressed_size) in pathes {
  let data = std::fs::read(path).unwrap();
  let res = lzxd.decompress_next(&data, uncompressed_size).unwrap();
  println!("{} - {}", idx, res.len())
}

Bug in post-decompression E8 fixups

Hello! Sample is same as in:

Bug is at: 29917176..29917180. Bytes must be: [79, 248, 255] but it's: [106, 175, 0]

BEFORE POSTPROCESS
VIEW: [79, 248, 255, 141, 84, 36, 88, 85]
AFTER POSTPROCESS
VIEW: [106, 175, 0, 141, 84, 36, 88, 85]

Extracting this renderdoc.pdb.cab fails with InvalidBlock(4)

Here's a cab file which cabextract can uncompress successfully, but the cab crate fails on:

% curl -o renderdoc.pdb.cab "https://renderdoc.org/symbols/renderdoc.pdb/6D1DFFC4DC524537962CCABC000820641/renderdoc.pd_"
% cargo install --examples cab
% cabtool cat renderdoc.pdb.cab renderdoc.pdb > renderdoc.pdb
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: Other, error: InvalidBlock(4) }', /Users/mstange/.cargo/registry/src/github.com-1ecc6299db9ec823/cab-0.3.0/examples/cabtool.rs:64:63
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

What do "reference data" and "subject data" refer to?

https://docs.rs/lzxd/0.1.0/lzxd/enum.WindowSize.html refers to the reference data and the subject data, but I'm having trouble figuring out what that means, and how to determine the window size ahead of time. The context here is that I'm trying to integrate lzxd into this code where I have the size of the compressed data and the decompressed size.

Figuring out the equivalent of get_compressed_block() based on xnbcli is hard

This crate provides a single decompress_next method which accepts a slice. It's not clear to me how to obtain the chunks; the naive choice of compressed.chunks(65_536) doesn't work, but looking at https://github.com/LeonBlade/xnbcli/blob/ef55c383561addbd0bcfc11bfead6fc607502536/app/Presser/index.js#L21-L72 shows a different interface than lzxd provides.

Corrupted output

I have a cab archive: python.zip

I extracted this file and have log:

gcab: gcab.txt

rust-cab ptesc.txt

Here is different:

907c907
< 9eecdeb542613c96ef9d822c754677fad20cdc6b01f998438f9143981c42d6b1  _ssl.pyd
---
> c93007f787f06f4a3c187b12a03bccc9e8e27b1e5cc71b4f44ddc2ef045870c8  _ssl.pyd
2106c2106
< 5c4f7eb850cb4ebd35c039be7319e2ed05439418884d414001e015c4637585fc  python27.dll
---
> 3fdca19920531643ca7cbfb01df73b6b4245da4024b264c3737cb38d3c439571  python27.dll
2302c2302
< d92c119edcb239fc52cdb1b59eddc19f251ade3a55b519d144c494b3581fc607  tcl85.dll
---
> be3703458dbb3f4308f4cf1fcf6d3c89e6cc77d2439a16d52c11ca26fc55364f  tcl85.dll
3045c3045
< 751941b4e09898c31791efeb5f90fc7367c89831d4a98637ed505e40763e287b  wininst_6.0.exe
---
> 854f0c6807c74bbf3249be772a2ab04a3934b71466d5868e2a0ee5c18b3911e4  wininst_6.0.exe
3048c3048
< 52def964142be6891054d2f95256a3b05d66887964fcd66b34abfe32477e8965  wininst_9.0.exe
---
> e64f29cb9e193c14e6904516e2a8829d3674928c22a321ed813ca6060a596492  wininst_9.0.exe

How can bitstream::peek_bits work for bits > 16 ?

In the case where bits > 16, it calls:

        let lo = self.peek_bits_oneword(16) as u32;
        let hi = self.peek_bits_oneword(bits - 16) as u32;

But peek_bits_oneword doesn't change the bitsstream, so aren't you really peeking from the same point in the bitstream twice?

lonami / lzxd Goto Github PK

lzxd's People

Contributors

Stargazers

Watchers

Forkers

lzxd's Issues

Large tests in Git LFS

Clarify LZX vs. LZXD

Extracting file fails with errors

Corrupted output if decompressed block is not padded to 32KB

Chunk length must be divisible by 2

Bug in post-decompression E8 fixups

Extracting this renderdoc.pdb.cab fails with InvalidBlock(4)

What do "reference data" and "subject data" refer to?

Figuring out the equivalent of get_compressed_block() based on xnbcli is hard

Corrupted output

How can bitstream::peek_bits work for bits > 16 ?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent