Giter Site home page Giter Site logo

lzxd's People

Contributors

drchat avatar ikrivosheev avatar lonami avatar valmyzk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

lzxd's Issues

Large tests in Git LFS

Follow-up to the discussion in #8. Another email from GitHub:

[GitHub] Git LFS disabled for Lonami

Git LFS has been disabled on your personal account Lonami because you’ve exceeded your data plan by at least 150%. Please purchase additional data packs to cover your bandwidth and storage usage:

https://github.com/account/billing/data/upgrade

Current usage as of 12 Dec 2023 05:16PM UTC:

Bandwidth: 1.59 GB / 1 GB (159%)
Storage: 0.13 GB / 1 GB (13%)

Quite annoying. Might have to rethink what to do with the tests or if it's worth keeping them around.

It seems CI also falls here, as the build failed due to reaching quota:

Fetching LFS objects
  /usr/bin/git lfs fetch origin refs/remotes/origin/master
  fetch: Fetching reference refs/remotes/origin/master
  batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
  Error: error: failed to fetch some objects from 'https://github.com/Lonami/lzxd.git/info/lfs'
  The process '/usr/bin/git' failed with exit code 2

Clarify LZX vs. LZXD

As I'm looking through this library, it isn't too apparent whether this library actually implements the LZX-DELTA algorithm.

For example, the header block is parsed according to the LZX specification as opposed to the LZX-DELTA specification, and this library is able to successfully decode an LZX bitstream.
In addition, a similar C library provides the ability to set reference data for the algorithm to process, whereas this library does not.

It might be better to rename this library to lzx and limit the functionality to that offered by the LZX algorithm (such as limiting the window sizes).
Or, alternatively, the mspack library could be used as reference for extending this library to support both LZX and LZX-DELTA.

Extracting file fails with errors

Hello! Thank you for great library!
I try to open cab archive using this library: https://github.com/mdsteele/rust-cab and get many errors:

w9xpopen.exe unpacker error=Custom { kind: Other, error: ChunkTooLong }
README.txt unpacker error=Custom { kind: Other, error: ChunkTooLong }
NEWS.txt unpacker error=Custom { kind: Other, error: OverreadBlock }
LICENSE.txt unpacker error=Custom { kind: Other, error: InvalidBlock(0) }
python.exe unpacker error=Custom { kind: Other, error: ChunkTooLong }
pythonw.exe unpacker error=Custom { kind: Other, error: ChunkTooLong }
python27.dll unpacker error=Custom { kind: Other, error: OverreadBlock }
...

File: python.zip

Related issue: mdsteele/rust-cab#14

Corrupted output if decompressed block is not padded to 32KB

I have a cabinet file here with a single file called blob that is compressed using LZX.
Using rust-cab - I can just about successfully decompress this file and access it.

However, at the very end of the file, it is evident that decompression is beginning to break down and fail:

/CategoreType="Pr摯P捵ot" Prshabst Su换Categories="true"\Ex汣<摵le䉤T Default="falss" xmlns:cat="http://schemas.microsoft.com/msus/2002/12/UpdateHandlers/Category" />⼼<upd:Handle卲uiecffi䑣dat/>⼼<upd:Updat/

From some debugging output I put in lzxd - it seems that this corruption corresponds to the second decoded block. Looking at the output from the sliding window, perhaps it's a simple off-by-one error that is being compounded by the sliding window?

These are the bytes output at the end of decompression, (inexactly) corresponding with the above text:

572: [00, 2F, 00, 3E, 00, 3C, 2F]
579: [3C, 00, 75, 00, 70, 00, 64, 00, 3A, 00, 48, 00, 61, 00, 6E, 00, 64, 00, 6C, 00, 65, 00, 72, 53, 75, 00, 69, 00, 65, 00, 63, 00, 66, 00, 66, 00, 69, 00, 63, 44, 64, 00, 61, 00, 74, 00]
5A7: [2F, 00, 3E, 00, 3C, 2F, 3C, 00, 75, 00, 70, 00, 64, 00, 3A]
5B6: [00, 55, 00, 70, 00, 64, 00, 61, 00, 74, 00]
5C1: [2F, 00, 3E]

The format is UTF-16 LE, and the text lies within the ASCII character set, so every other byte should be 0x00.

At the beginning of the second block, you can see it start outputting invalid UTF-16:

101: [41, 00, 43, 00, 48, 00, 49, 00, 4E, 00, 45, 00, 22, 00, 20, 00, 53, 00, 75, 00, 62, 00, 6B, 00, 65, 00, 79, 00, 3D, 00, 22, 00, 53, 00, 4F, 00, 46, 00, 54, 00, 57, 00, 41, 00, 52, 00, 45, 00, 5C, 00, 4D, 00, 69, 00, 63, 00, 72, 00, 6F, 00, 73, 00, 6F, 00, 66, 00, 74, 00, 5C, 00, 4F, 00, 66, 00, 66, 00, 69, 00, 63, 00, 65, 00, 5C, 00, 31]
156: [30] <--------------- should be 00!
157: [31, 00, 2E, 00, 30, 00, 5C, 00, 50, 00, 6F, 00, 77, 00, 65, 00, 72, 00, 50, 00, 6F, 00, 69, 00, 6E, 00, 74, 00, 5C, 00, 49, 00, 6E, 00, 73, 00, 74, 00, 61, 00, 6C, 00, 6C, 00, 52, 00, 6F, 00, 6F, 00, 74, 00, 22, 00, 20, 00, 52, 00, 65, 00, 67, 00, 54, 00, 79, 00, 70, 00, 65, 00, 33, 00, 32, 00, 3D, 00, 22, 00, 74, 00, 72, 00, 75, 00, 65, 00, 22, 00, 20, 00, 78, 00, 6D, 00, 6C, 00, 6E, 00, 73, 00, 3A, 00, 62, 00, 61, 00, 72, 00, 3D, 00, 22, 00, 68, 00, 74, 00, 74, 00, 70, 00, 3A, 00, 2F, 00, 2F, 00, 73, 00, 63, 00, 68, 00, 65, 00, 6D, 00, 61, 00, 73, 00, 2E, 00, 6D, 00, 69, 00, 63, 00, 72, 00, 6F, 00, 73, 00, 6F, 00, 66, 00, 74, 00, 2E, 00, 63, 00, 6F, 00, 6D, 00, 2F, 00, 6D, 00, 73, 00, 75, 00, 73, 00, 2F, 00, 32, 00, 30, 00, 30, 00, 32, 00, 2F, 00, 31, 00, 32, 00, 2F, 00, 42, 00, 61, 00, 73, 00, 65, 00, 41, 00, 70, 00, 70, 00, 6C, 00, 69, 00, 63, 00, 61, 00, 62, 00, 69, 00, 6C, 00, 69, 00, 74, 00, 79, 00, 52, 00, 75, 00, 6C, 00, 65, 00, 73, 00, 22, 00, 20, 00, 2F, 00, 3E, 00, 3C]
250: [2F] <--------------- should be 00!
251: [3C, 00, 6C, 00, 61, 00, 72, 00, 3A, 00, 4F, 00, 72]
25E: [00, 3E, 00, 3C, 2F]

I'll document more as I figure it out in this issue.

Chunk length must be divisible by 2

Hello! Thank you for the library.
I got error: https://github.com/Lonami/lzxd/blob/master/src/lib.rs#L305

Sample: python.gz.

Here is a file with name: python275.chm. Compressed size one of the block (block number is 1307) is: 32785.

Here is lzx blocks: python275.chm.tar.gz
WindowSize is: MB2

Here is code to reproduce:

let mut pathes = std::fs::read_dir("/tmp/python275.chm").unwrap().map(|e| {
  let path = e.unwrap().path();
  let file_name = path.file_name().unwrap().to_str().unwrap().to_string();
  let parts = file_name.split(".").collect::<Vec<_>>();
  let idx = parts[3].parse::<usize>().unwrap();
  let compressed_size = parts[4].parse::<usize>().unwrap();
  let uncompressed_size = parts[5].parse::<usize>().unwrap();
  (path, idx, compressed_size, uncompressed_size)
}).collect::<Vec<_>>();

pathes.sort_by_key(|(_, idx, _, _)| *idx);

let mut lzxd = Lzxd::new(WindowSize::MB2);

for (path, idx, _, uncompressed_size) in pathes {
  let data = std::fs::read(path).unwrap();
  let res = lzxd.decompress_next(&data, uncompressed_size).unwrap();
  println!("{} - {}", idx, res.len())
}

Bug in post-decompression E8 fixups

Hello! Sample is same as in:

  1. #21
  2. #25

Bug is at: 29917176..29917180. Bytes must be: [79, 248, 255] but it's: [106, 175, 0]

BEFORE POSTPROCESS
VIEW: [79, 248, 255, 141, 84, 36, 88, 85]
AFTER POSTPROCESS
VIEW: [106, 175, 0, 141, 84, 36, 88, 85]

Extracting this renderdoc.pdb.cab fails with InvalidBlock(4)

Here's a cab file which cabextract can uncompress successfully, but the cab crate fails on:

% curl -o renderdoc.pdb.cab "https://renderdoc.org/symbols/renderdoc.pdb/6D1DFFC4DC524537962CCABC000820641/renderdoc.pd_"
% cargo install --examples cab
% cabtool cat renderdoc.pdb.cab renderdoc.pdb > renderdoc.pdb
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: Other, error: InvalidBlock(4) }', /Users/mstange/.cargo/registry/src/github.com-1ecc6299db9ec823/cab-0.3.0/examples/cabtool.rs:64:63
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Corrupted output

I have a cab archive: python.zip

I extracted this file and have log:

gcab: gcab.txt

rust-cab ptesc.txt

Here is different:

907c907
< 9eecdeb542613c96ef9d822c754677fad20cdc6b01f998438f9143981c42d6b1  _ssl.pyd
---
> c93007f787f06f4a3c187b12a03bccc9e8e27b1e5cc71b4f44ddc2ef045870c8  _ssl.pyd
2106c2106
< 5c4f7eb850cb4ebd35c039be7319e2ed05439418884d414001e015c4637585fc  python27.dll
---
> 3fdca19920531643ca7cbfb01df73b6b4245da4024b264c3737cb38d3c439571  python27.dll
2302c2302
< d92c119edcb239fc52cdb1b59eddc19f251ade3a55b519d144c494b3581fc607  tcl85.dll
---
> be3703458dbb3f4308f4cf1fcf6d3c89e6cc77d2439a16d52c11ca26fc55364f  tcl85.dll
3045c3045
< 751941b4e09898c31791efeb5f90fc7367c89831d4a98637ed505e40763e287b  wininst_6.0.exe
---
> 854f0c6807c74bbf3249be772a2ab04a3934b71466d5868e2a0ee5c18b3911e4  wininst_6.0.exe
3048c3048
< 52def964142be6891054d2f95256a3b05d66887964fcd66b34abfe32477e8965  wininst_9.0.exe
---
> e64f29cb9e193c14e6904516e2a8829d3674928c22a321ed813ca6060a596492  wininst_9.0.exe

How can bitstream::peek_bits work for bits > 16 ?

In the case where bits > 16, it calls:

        let lo = self.peek_bits_oneword(16) as u32;
        let hi = self.peek_bits_oneword(bits - 16) as u32;

But peek_bits_oneword doesn't change the bitsstream, so aren't you really peeking from the same point in the bitstream twice?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.