lonami / lzxd Goto Github PK
View Code? Open in Web Editor NEWHome Page: https://crates.io/crates/lzxd
License: Apache License 2.0
Home Page: https://crates.io/crates/lzxd
License: Apache License 2.0
Follow-up to the discussion in #8. Another email from GitHub:
[GitHub] Git LFS disabled for Lonami
Git LFS has been disabled on your personal account Lonami because you’ve exceeded your data plan by at least 150%. Please purchase additional data packs to cover your bandwidth and storage usage:
https://github.com/account/billing/data/upgrade
Current usage as of 12 Dec 2023 05:16PM UTC:
Bandwidth: 1.59 GB / 1 GB (159%)
Storage: 0.13 GB / 1 GB (13%)
Quite annoying. Might have to rethink what to do with the tests or if it's worth keeping them around.
It seems CI also falls here, as the build failed due to reaching quota:
Fetching LFS objects
/usr/bin/git lfs fetch origin refs/remotes/origin/master
fetch: Fetching reference refs/remotes/origin/master
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
Error: error: failed to fetch some objects from 'https://github.com/Lonami/lzxd.git/info/lfs'
The process '/usr/bin/git' failed with exit code 2
As I'm looking through this library, it isn't too apparent whether this library actually implements the LZX-DELTA algorithm.
For example, the header block is parsed according to the LZX specification as opposed to the LZX-DELTA specification, and this library is able to successfully decode an LZX bitstream.
In addition, a similar C library provides the ability to set reference data for the algorithm to process, whereas this library does not.
It might be better to rename this library to lzx
and limit the functionality to that offered by the LZX algorithm (such as limiting the window sizes).
Or, alternatively, the mspack library could be used as reference for extending this library to support both LZX and LZX-DELTA.
Hello! Thank you for great library!
I try to open cab archive using this library: https://github.com/mdsteele/rust-cab and get many errors:
w9xpopen.exe unpacker error=Custom { kind: Other, error: ChunkTooLong }
README.txt unpacker error=Custom { kind: Other, error: ChunkTooLong }
NEWS.txt unpacker error=Custom { kind: Other, error: OverreadBlock }
LICENSE.txt unpacker error=Custom { kind: Other, error: InvalidBlock(0) }
python.exe unpacker error=Custom { kind: Other, error: ChunkTooLong }
pythonw.exe unpacker error=Custom { kind: Other, error: ChunkTooLong }
python27.dll unpacker error=Custom { kind: Other, error: OverreadBlock }
...
File: python.zip
Related issue: mdsteele/rust-cab#14
I have a cabinet file here with a single file called blob
that is compressed using LZX
.
Using rust-cab
- I can just about successfully decompress this file and access it.
However, at the very end of the file, it is evident that decompression is beginning to break down and fail:
/CategoreType="Pr摯P捵ot" Prshabst Su换Categories="true"\Ex汣<摵le䉤T Default="falss" xmlns:cat="http://schemas.microsoft.com/msus/2002/12/UpdateHandlers/Category" />⼼<upd:Handle卲uiecffi䑣dat/>⼼<upd:Updat/
From some debugging output I put in lzxd
- it seems that this corruption corresponds to the second decoded block. Looking at the output from the sliding window, perhaps it's a simple off-by-one error that is being compounded by the sliding window?
These are the bytes output at the end of decompression, (inexactly) corresponding with the above text:
572: [00, 2F, 00, 3E, 00, 3C, 2F]
579: [3C, 00, 75, 00, 70, 00, 64, 00, 3A, 00, 48, 00, 61, 00, 6E, 00, 64, 00, 6C, 00, 65, 00, 72, 53, 75, 00, 69, 00, 65, 00, 63, 00, 66, 00, 66, 00, 69, 00, 63, 44, 64, 00, 61, 00, 74, 00]
5A7: [2F, 00, 3E, 00, 3C, 2F, 3C, 00, 75, 00, 70, 00, 64, 00, 3A]
5B6: [00, 55, 00, 70, 00, 64, 00, 61, 00, 74, 00]
5C1: [2F, 00, 3E]
The format is UTF-16 LE, and the text lies within the ASCII character set, so every other byte should be 0x00
.
At the beginning of the second block, you can see it start outputting invalid UTF-16:
101: [41, 00, 43, 00, 48, 00, 49, 00, 4E, 00, 45, 00, 22, 00, 20, 00, 53, 00, 75, 00, 62, 00, 6B, 00, 65, 00, 79, 00, 3D, 00, 22, 00, 53, 00, 4F, 00, 46, 00, 54, 00, 57, 00, 41, 00, 52, 00, 45, 00, 5C, 00, 4D, 00, 69, 00, 63, 00, 72, 00, 6F, 00, 73, 00, 6F, 00, 66, 00, 74, 00, 5C, 00, 4F, 00, 66, 00, 66, 00, 69, 00, 63, 00, 65, 00, 5C, 00, 31]
156: [30] <--------------- should be 00!
157: [31, 00, 2E, 00, 30, 00, 5C, 00, 50, 00, 6F, 00, 77, 00, 65, 00, 72, 00, 50, 00, 6F, 00, 69, 00, 6E, 00, 74, 00, 5C, 00, 49, 00, 6E, 00, 73, 00, 74, 00, 61, 00, 6C, 00, 6C, 00, 52, 00, 6F, 00, 6F, 00, 74, 00, 22, 00, 20, 00, 52, 00, 65, 00, 67, 00, 54, 00, 79, 00, 70, 00, 65, 00, 33, 00, 32, 00, 3D, 00, 22, 00, 74, 00, 72, 00, 75, 00, 65, 00, 22, 00, 20, 00, 78, 00, 6D, 00, 6C, 00, 6E, 00, 73, 00, 3A, 00, 62, 00, 61, 00, 72, 00, 3D, 00, 22, 00, 68, 00, 74, 00, 74, 00, 70, 00, 3A, 00, 2F, 00, 2F, 00, 73, 00, 63, 00, 68, 00, 65, 00, 6D, 00, 61, 00, 73, 00, 2E, 00, 6D, 00, 69, 00, 63, 00, 72, 00, 6F, 00, 73, 00, 6F, 00, 66, 00, 74, 00, 2E, 00, 63, 00, 6F, 00, 6D, 00, 2F, 00, 6D, 00, 73, 00, 75, 00, 73, 00, 2F, 00, 32, 00, 30, 00, 30, 00, 32, 00, 2F, 00, 31, 00, 32, 00, 2F, 00, 42, 00, 61, 00, 73, 00, 65, 00, 41, 00, 70, 00, 70, 00, 6C, 00, 69, 00, 63, 00, 61, 00, 62, 00, 69, 00, 6C, 00, 69, 00, 74, 00, 79, 00, 52, 00, 75, 00, 6C, 00, 65, 00, 73, 00, 22, 00, 20, 00, 2F, 00, 3E, 00, 3C]
250: [2F] <--------------- should be 00!
251: [3C, 00, 6C, 00, 61, 00, 72, 00, 3A, 00, 4F, 00, 72]
25E: [00, 3E, 00, 3C, 2F]
I'll document more as I figure it out in this issue.
Hello! Thank you for the library.
I got error: https://github.com/Lonami/lzxd/blob/master/src/lib.rs#L305
Sample: python.gz.
Here is a file with name: python275.chm
. Compressed size one of the block (block number is 1307) is: 32785.
Here is lzx blocks: python275.chm.tar.gz
WindowSize is: MB2
Here is code to reproduce:
let mut pathes = std::fs::read_dir("/tmp/python275.chm").unwrap().map(|e| {
let path = e.unwrap().path();
let file_name = path.file_name().unwrap().to_str().unwrap().to_string();
let parts = file_name.split(".").collect::<Vec<_>>();
let idx = parts[3].parse::<usize>().unwrap();
let compressed_size = parts[4].parse::<usize>().unwrap();
let uncompressed_size = parts[5].parse::<usize>().unwrap();
(path, idx, compressed_size, uncompressed_size)
}).collect::<Vec<_>>();
pathes.sort_by_key(|(_, idx, _, _)| *idx);
let mut lzxd = Lzxd::new(WindowSize::MB2);
for (path, idx, _, uncompressed_size) in pathes {
let data = std::fs::read(path).unwrap();
let res = lzxd.decompress_next(&data, uncompressed_size).unwrap();
println!("{} - {}", idx, res.len())
}
Here's a cab file which cabextract
can uncompress successfully, but the cab
crate fails on:
% curl -o renderdoc.pdb.cab "https://renderdoc.org/symbols/renderdoc.pdb/6D1DFFC4DC524537962CCABC000820641/renderdoc.pd_"
% cargo install --examples cab
% cabtool cat renderdoc.pdb.cab renderdoc.pdb > renderdoc.pdb
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: Other, error: InvalidBlock(4) }', /Users/mstange/.cargo/registry/src/github.com-1ecc6299db9ec823/cab-0.3.0/examples/cabtool.rs:64:63
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
https://docs.rs/lzxd/0.1.0/lzxd/enum.WindowSize.html refers to the reference data and the subject data, but I'm having trouble figuring out what that means, and how to determine the window size ahead of time. The context here is that I'm trying to integrate lzxd into this code where I have the size of the compressed data and the decompressed size.
This crate provides a single decompress_next method which accepts a slice. It's not clear to me how to obtain the chunks; the naive choice of compressed.chunks(65_536)
doesn't work, but looking at https://github.com/LeonBlade/xnbcli/blob/ef55c383561addbd0bcfc11bfead6fc607502536/app/Presser/index.js#L21-L72 shows a different interface than lzxd provides.
I have a cab archive: python.zip
I extracted this file and have log:
gcab: gcab.txt
rust-cab ptesc.txt
Here is different:
907c907
< 9eecdeb542613c96ef9d822c754677fad20cdc6b01f998438f9143981c42d6b1 _ssl.pyd
---
> c93007f787f06f4a3c187b12a03bccc9e8e27b1e5cc71b4f44ddc2ef045870c8 _ssl.pyd
2106c2106
< 5c4f7eb850cb4ebd35c039be7319e2ed05439418884d414001e015c4637585fc python27.dll
---
> 3fdca19920531643ca7cbfb01df73b6b4245da4024b264c3737cb38d3c439571 python27.dll
2302c2302
< d92c119edcb239fc52cdb1b59eddc19f251ade3a55b519d144c494b3581fc607 tcl85.dll
---
> be3703458dbb3f4308f4cf1fcf6d3c89e6cc77d2439a16d52c11ca26fc55364f tcl85.dll
3045c3045
< 751941b4e09898c31791efeb5f90fc7367c89831d4a98637ed505e40763e287b wininst_6.0.exe
---
> 854f0c6807c74bbf3249be772a2ab04a3934b71466d5868e2a0ee5c18b3911e4 wininst_6.0.exe
3048c3048
< 52def964142be6891054d2f95256a3b05d66887964fcd66b34abfe32477e8965 wininst_9.0.exe
---
> e64f29cb9e193c14e6904516e2a8829d3674928c22a321ed813ca6060a596492 wininst_9.0.exe
In the case where bits > 16, it calls:
let lo = self.peek_bits_oneword(16) as u32;
let hi = self.peek_bits_oneword(bits - 16) as u32;
But peek_bits_oneword doesn't change the bitsstream, so aren't you really peeking from the same point in the bitstream twice?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.