ulikunitz / xz Goto Github PK
View Code? Open in Web Editor NEWPure golang package for reading and writing xz-compressed files
License: Other
Pure golang package for reading and writing xz-compressed files
License: Other
Following program:
package main
import (
"bytes"
"io"
"log"
"os"
"github.com/ulikunitz/xz"
)
func main() {
const text = ""
var buf bytes.Buffer
// compress text
w, err := xz.NewWriter(&buf)
if err != nil {
log.Fatalf("xz.NewWriter error %s", err)
}
if _, err := io.WriteString(w, text); err != nil {
log.Fatalf("WriteString error %s", err)
}
if err := w.Close(); err != nil {
log.Fatalf("w.Close error %s", err)
}
// decompress buffer and write output to stdout
r, err := xz.NewReader(&buf)
if err != nil {
log.Fatalf("NewReader error %s", err)
}
if _, err = io.Copy(os.Stdout, r); err != nil {
log.Fatalf("io.Copy error %s", err)
}
}
when executed fails:
$ go run xztest.go
2017/02/15 10:55:38 io.Copy error xz: invalid header magic bytes
exit status 1
I would expect that it should correctly handle empty string.
I want to compress xz type, but I don't know how to write to file.
Could you give me answer this question?
And how to decompress from xz file?
Thank you so much~^^
Cool project!
Obvious enhancement for the future would be to parallelize some work, making use of all cpu cores.
Is https://github.com/ulikunitz/xz/tree/rewrite production ready? When do you anticipate this being promoted to main?
Thanks
@pmezard reported a panic in the master tree that he has found using go-fuzz. Many thanks for that. I have asked for the go-fuzz code and the crasher sequence to check what caused the bug and to fix it in the dev tree.
# github.com/ulikunitz/xz/lzma
..\github.com\ulikunitz\xz\lzma\reader.go:35: constant 4294967295 overflows int
..\github.com\ulikunitz\xz\lzma\reader2.go:31: constant 4294967295 overflows int
..\github.com\ulikunitz\xz\lzma\writer.go:78: constant 4294967295 overflows int
..\github.com\ulikunitz\xz\lzma\writer2.go:55: constant 4294967295 overflows int
go 1.6.3
windows xp
The new version of XZ utils supports multi CPU simultaneous compression, which will greatly improve the efficiency. How to implement it,thanks!
I'm trying to decompress iOS OTA files and they are failing your flags check. They appear to have "None" checksum.
https://tukaani.org/xz/xz-file-format-1.0.4.txt
2.1.1.2. Stream Flags
The first byte of Stream Flags is always a null byte. In the
future, this byte may be used to indicate a new Stream version
or other Stream properties.
The second byte of Stream Flags is a bit field:
Bit(s) Mask Description
0-3 0x0F Type of Check (see Section 3.4):
ID Size Check name
0x00 0 bytes None
0x01 4 bytes CRC32
0x02 4 bytes (Reserved)
0x03 4 bytes (Reserved)
0x04 8 bytes CRC64
0x05 8 bytes (Reserved)
0x06 8 bytes (Reserved)
0x07 16 bytes (Reserved)
0x08 16 bytes (Reserved)
0x09 16 bytes (Reserved)
0x0A 32 bytes SHA-256
0x0B 32 bytes (Reserved)
0x0C 32 bytes (Reserved)
0x0D 64 bytes (Reserved)
0x0E 64 bytes (Reserved)
0x0F 64 bytes (Reserved)
4-7 0xF0 Reserved for future use; MUST be zero for now.
Thank you for a great pkg!
Hi,
I wondered if I can use your package to extract files from an NSIS installer .exe? It looks like it uses LZMA compression with an offset of 4 EF BE AD DE N u l l s o f t I n s t
according to the command 7z i
. Thanks for any help with this.
Implementation of readUvarint
at https://github.com/ulikunitz/xz/blob/master/bits.go#L56 is very similar to the vulnerable code in the Golang encoding/binary
library and seems to suffer from the same vulnerability described in golang/go#40618.
See the fix at https://go-review.googlesource.com/c/go/+/247120/2/src/encoding/binary/varint.go
Note: I couldn't find any information on how to disclose this issue to the maintainers. I would also suggest setting up a Security Policy for the project within GitHub
I've a deb package from my local /var/cache/apt/archives
. I'm using debian stretch and the package is accountsservice_0.6.43-1_amd64.deb
.
If I execute the following commands, everything works (no warnings):
$ ar x accountsservice_0.6.43-1_amd64.deb data.tar.xz
$ xz -d data.tar.xz
$ ls
accountsservice_0.6.43-1_amd64.deb data.tar
$ tar tf data.tar
./
./etc/
./etc/dbus-1/
./etc/dbus-1/system.d/
./etc/dbus-1/system.d/org.freedesktop.Accounts.conf
...
But if I run the following code (where t/data.tar.xz
is the file extracted from the deb cited above), I got an error:
func main() {
f, err := os.Open("t/data.tar.xz")
if err != nil {
panic(err)
}
r, err := xz.NewReader(f)
if err != nil {
panic(err)
}
tr := tar.NewReader(r)
for {
hdr, err := tr.Next()
if err != nil {
panic(err)
}
log.Println(hdr)
}
}
The execution output is (I'm using Gogland IDE, but the result is the same if I run from terminal):
GOROOT=/usr/lib/go-1.8
GOPATH=/home/langbeck/git/golang/unpackers/external:/home/langbeck/git/golang/unpackers:/home/langbeck/go
/usr/lib/go-1.8/bin/go build -i -o /tmp/maingo main
/tmp/maingo
panic: xz: unexpected padding size 5
goroutine 1 [running]:
main.main()
/home/langbeck/git/golang/unpackers/src/main/main.go:127 +0x16c
Note 1: line 127 is the line of the 3rd panic()
Note 2: xz --test data.tar.xz && echo OK
run fine and the same deb file is installed in my system, so it's a valid deb file.
I'm attaching the deb file in question: accountsservice_0.6.43-1_amd64.deb.zip
The GNU xz command has a boolean --x86
flag which allegedly gives 0-15% extra compression when applied to files containing x86 machine code. The Linux kernel uses this flag when compressing its bzImage. It is also popular among UEFI implementations when compressing firmware.
The explanation from the xz
man page is:
A BCJ filter converts relative addresses in the machine code to
their absolute counterparts. This doesn't change the size of
the data, but it increases redundancy, which can help LZMA2 to
produce 0-15 % smaller .xz file. The BCJ filters are always
reversible, so using a BCJ filter for wrong type of data doesn't
cause any data loss, although it may make the compression ratio
slightly worse.
We have already adapted this feature to Go. Would it be suitable to upstream it with an --x86
flag?
It would be helpful (for my use case) to export a function that performs the same behavior as the gxz
command line utility so that it can be called programmatically. This would de-dupe the file read/ write logic that has already been written for the command line utility, allowing for more programmatic usage of this library.
Would this be a desirable addition?
This was attempted, incorrectly, in #48.
The 7zip SDK has something they call "FastBytes" which seems to be a mechanism to limit how much time is spent looking for the best sequence to add to the dictionary. I don't see an equivalent here, how do you get around that?
Found in fuzz test.
how to reproduce
package xz_test
import (
"bytes"
"io/ioutil"
"testing"
"github.com/ulikunitz/xz"
)
func TestPanic(t *testing.T) {
data := []byte([]uint8{253, 55, 122, 88, 90, 0, 0, 0, 255, 18, 217, 65, 0, 189, 191, 239, 189, 191, 239, 48})
t.Log(string(data))
r, err := xz.NewReader(bytes.NewReader(data))
if err != nil {
t.Skip("OK")
}
b, err := ioutil.ReadAll(r)
if err != nil {
t.Skip("OK")
}
t.Log(b)
}
$ go test -run "TestPanic" -v
=== RUN TestPanic
panic_test.go:13: 7zXZAソ0
--- FAIL: TestPanic (0.00s)
panic: runtime error: makeslice: len out of range [recovered]
panic: runtime error: makeslice: len out of range [recovered]
panic: runtime error: makeslice: len out of range
goroutine 6 [running]:
testing.tRunner.func1.1(0x54ef00, 0x5b1c00)
/home/linuxbrew/.linuxbrew/Cellar/go/1.15.7/libexec/src/testing/testing.go:1072 +0x30d
testing.tRunner.func1(0xc000001380)
/home/linuxbrew/.linuxbrew/Cellar/go/1.15.7/libexec/src/testing/testing.go:1075 +0x41a
panic(0x54ef00, 0x5b1c00)
/home/linuxbrew/.linuxbrew/Cellar/go/1.15.7/libexec/src/runtime/panic.go:969 +0x1b9
io/ioutil.readAll.func1(0xc000095f28)
/home/linuxbrew/.linuxbrew/Cellar/go/1.15.7/libexec/src/io/ioutil/ioutil.go:30 +0x106
panic(0x54ef00, 0x5b1c00)
/home/linuxbrew/.linuxbrew/Cellar/go/1.15.7/libexec/src/runtime/panic.go:969 +0x1b9
github.com/ulikunitz/xz.readIndexBody(0x5b40a0, 0xc00008c3c0, 0x100, 0xc000095bc0, 0x40df58, 0x20, 0x557560, 0x1)
/home/heijo/ghq/github.com/ulikunitz/xz/format.go:684 +0x1d4
github.com/ulikunitz/xz.(*streamReader).readTail(0xc00008a1e0, 0xc000074490, 0xc000074490)
/home/heijo/ghq/github.com/ulikunitz/xz/reader.go:163 +0x50
github.com/ulikunitz/xz.(*streamReader).Read(0xc00008a1e0, 0xc000244000, 0x200, 0x200, 0xc000095dd0, 0x40b125, 0xc000095dd8)
/home/heijo/ghq/github.com/ulikunitz/xz/reader.go:209 +0x4f9
github.com/ulikunitz/xz.(*Reader).Read(0xc00008c3f0, 0xc000244000, 0x200, 0x200, 0xc000244000, 0x0, 0x0)
/home/heijo/ghq/github.com/ulikunitz/xz/reader.go:112 +0xe5
bytes.(*Buffer).ReadFrom(0xc00006feb0, 0x5b4120, 0xc00008c3f0, 0x0, 0xc00008c300, 0x5b40a0)
/home/linuxbrew/.linuxbrew/Cellar/go/1.15.7/libexec/src/bytes/buffer.go:204 +0xb1
io/ioutil.readAll(0x5b4120, 0xc00008c3f0, 0x200, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/linuxbrew/.linuxbrew/Cellar/go/1.15.7/libexec/src/io/ioutil/ioutil.go:36 +0xe5
io/ioutil.ReadAll(...)
/home/linuxbrew/.linuxbrew/Cellar/go/1.15.7/libexec/src/io/ioutil/ioutil.go:45
github.com/ulikunitz/xz_test.TestPanic(0xc000001380)
/home/heijo/ghq/github.com/ulikunitz/xz/panic_test.go:18 +0x185
testing.tRunner(0xc000001380, 0x58fab0)
/home/linuxbrew/.linuxbrew/Cellar/go/1.15.7/libexec/src/testing/testing.go:1123 +0xef
created by testing.(*T).Run
/home/linuxbrew/.linuxbrew/Cellar/go/1.15.7/libexec/src/testing/testing.go:1168 +0x2b3
exit status 2
FAIL github.com/ulikunitz/xz 0.005s
While researching xz as a storage medium I came across this article.
https://www.nongnu.org/lzip/xz_inadequate.html
Is what they've outlined on this page a valid concern and we shouldn't be using xz?
I have a processing scenario where I read lzma objects and I need to decompress them.
while using pprof I could see that the lzma reader allocates buffers for every message:
0 0% 0.0046% 433804685.70MB 96.50% github.com/ulikunitz/xz/lzma.NewReader (inline)
4122.21MB 0.00092% 0.0056% 433804685.70MB 96.50% github.com/ulikunitz/xz/lzma.ReaderConfig.NewReader
2414.61MB 0.00054% 0.0061% 432805222.15MB 96.28% github.com/ulikunitz/xz/lzma.newDecoderDict (inline)
432802807.54MB 96.28% 96.28% 432802807.54MB 96.28% github.com/ulikunitz/xz/lzma.newBuffer (inline)
can we add some option for allowing to have a pool of that buffer? or some other way to reuse a reader?
is it possible to work with lzip fles which are also known of using LZMA compress algorithm ?
As mentioned in #1, this project was declared as "not even alpha" in 2015. What is the current maturity of this project?
As mentioned in #23, this project was considered to be not near the speed of XZ back in 2018. Has the speed improved since then? Were you referring to general code optimization or part of your LZMA implementation that needed to be reimplemented according to the specification in order to meet the standard speed expectations?
As mentioned in #26, this project implements a different LZMA algorithm than XZ. Would you be able to provide the names/URLs of your algorithm versus that algorithm?
Some xz archives fail part way during decompression. Quite a few of the Linux kernel releases fall into this category.
You can reproduce it via:
[djm@demiurge ~]$ wget -q https://www.kernel.org/pub/linux/kernel/v3.0/linux-3.13.tar.xz
[djm@demiurge ~]$ xzcat linux-3.13.tar.xz | wc -c
549816320
[djm@demiurge ~]$ gxz -dc linux-3.13.tar.xz | wc -c
201330688
[djm@demiurge ~]$ echo $?
Note that the truncation is silent - no error is written to stderr and the exit status is 0. The problem isn't in cmd/xz - I noticed it first using the library directly.
@ulikunitz - we are sporadically getting a limit reached
error and wondering what are the possible reasons that this might be happening or if there is a way to increase initial setting of the N:? Maybe something we can do with the props or writerConfig to increase this limit? The other weird thing is that this only seems to happens on *.tar.xz files.
Thanks to Dórian C. Langbeck I realized that I confused the padding for the block header with the padding of the block in the discussion of issue #15 . Currently we don't check the block padding size. Whether we should check it is an open question.
3.3. Block Padding
Block Padding MUST contain 0-3 null bytes to make the size of
the Block a multiple of four bytes. This can be needed when
the size of Compressed Data is not a multiple of four. If any
of the bytes in Block Padding are not null bytes, the decoder
MUST indicate an error.
Greetings everyone,
I'm a little fresh with golang, so hope I am looking at the right place.
Going through the API - it seems like Reader:Close and Writer:Flush are missing.
These are fairly common - as you can see in the the golang standard library zlib and zgip.
Is it possible to have these in future release?
Thanks in advance
This had me scratching my head today for a solution-
I am decompressing many files, however a few wouldn't work because they are large and I would get out of memory panics. I looked at my code and saw I was reading the full file into a byte slice which is expensive memory-wise. I rewrote the code to use io.Readers of sorts. Now, even with that I still get out of memory panics. Looking at the source code, the issue lies with ReaderConfig.NewReader calling newDecoderDict, which calls newBuffer which makes a byte slice of the buffer size, which is exactly what I had fixed in my own code. Now I am asking, is it possible to remedy this, and if so, could we?
Thanks,
I'm trying to figure out how to compress a folder but haven't found a solution for that yet
There is a memory leak. When I decompressed all the kernel modules for analysis, the VmRSS occupancy increased to 150M
The gxz.exe doesn't detect the terminal in git-bash on Windows x86-64. Terminal detection works in the PowerShell. This is probably a wontfix, but I want to record it.
I am having trouble decoding a file.
I reproduced the issue with gxz after encountering it with my test case.
The error is:
lzma: Reader2 doesn't get data
Line 149 in 25c16dc
I have attached the file that produces the issue.
gxz
gxz -d
gxz lzma: Reader2 doesn't get data
xz -d appears to decode properly so this would seem to point to a problem in reader2.
Will follow up with more info or a fix.
The offending file:
https://www.dropbox.com/s/gmoyva5lx5k96vs/utils.LegacyBloomFilter.bin?dl=0
The documentation of WriterConfig is somewhat sparse. I would like to emulate ( in spirit, I understand the algorithm is not perfect ) the result of xz --lzma2=preset=9,dict=128MiB
Could you please point me to a "starting point" ?
Thanks!
I'm working on a library to do random access in xz files with multiple blocks. I would love to use your library to do the heavy lifting instead of reinventing the wheel. I need to use some of the internal pieces though, including the blockReader
Line 261 in 067145b
Would you be open to a PR which refactors things so your original package keeps the same public interface but uses a new ulikunitz/xz/lib/xzinternals
which makes public some of these currently private structs?
I'm trying to build my program that uses your library and getting this error:
vendor/github.com/ulikunitz/xz/reader.go:17:2: use of internal package github.com/ulikunitz/xz/internal/xlog not allowed
The dependency:
"github.com/ulikunitz/xz"
I added it to vendor/
using govendor add
.
Trying to discover how to extract tar.xz files. Is this possible?
First, many thanks for that library, I'm using it in a backup application. Works great!
But I'm not sure whether the following behavior is a bug:
I've developed a backup application which runs as a daemon and compresses files on a daily schedule. While the application is in an idle state, it consumes ~ 6MB RAM. When the daemon executes some tasks (other than xz compression), the memory consumption grows as long as the task is executed. When such a task has completed, the memory usage normalizes back to ~6MB.
But now, when the task starts which compresses files, the daemon consumes ~ 110MB memory which is absolutely ok, but this amount of memory is not released after the task has completed. So, after execution of the compression task the daemon process utilizes still that amount of memory. This memory usage remains the same even after multiple executions of the 'compression' task. Is this a normal behavior or am I doing something wrong?
Here is the code snippet which is responsible for the compression:
destinationFile, err := os.OpenFile(destinationFilePath, os.O_CREATE|os.O_WRONLY, 0644)
if err != nil {
return err
}
defer destinationFile.Close()
xzWriter, err := xz.NewWriter(destinationFile)
if err != nil {
return err
}
defer xzWriter.Close()
if _, err := io.Copy(xzWriter, sourceFile); err != nil {
return err
}
return destinationFile.Sync()
What do you think?
The -f option allows overwriting of the target file. Currently the target file is removed before the compression has been completed. So interrupting the process with CTRL-C removes the target file, which is unexpected.
If I am using the following code, how do I set the options for maximum compression (the -9 preset level)?
// target is of type io.Writer
xzw, _ := xz.NewWriter(target)
IMO It's clear that this is not affected by CVE-2024-3094, but that is bouncing all over the internet, and it would be great to make a statement in the README why it's not affected.
When i tried to unzip big file (about 3 GiB size in xz and about 18 GiB unpacked) the process was too slow - only 3 GiB of 18 unpacked in about 40 min on my machine. The same file was unpacked for about 5 minutes using 7 zip tool
$ ll
-rw-r--r-- 1 jpillora wheel 550K 30 Jan 21:18 a.log
$ cp a.log b.log
$ cp a.log c.log
$ xz a.log
$ gxz b.log
$ gzip c.log
$ ll
-rw-r--r-- 1 jpillora wheel 6.0K 30 Jan 15:43 a.log.xz
-rw-r--r-- 1 jpillora wheel 207K 30 Jan 21:16 b.log.xz
-rw-r--r-- 1 jpillora wheel 10K 30 Jan 21:16 c.log.gz
Any idea why this is?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.