Giter Site home page Giter Site logo

xz's People

Contributors

creachadair avatar itsmattl avatar jubalh avatar truefurby avatar ulikunitz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xz's Issues

Compressing and decompressing empty string fails

Following program:

package main

import (
	"bytes"
	"io"
	"log"
	"os"

	"github.com/ulikunitz/xz"
)

func main() {
	const text = ""
	var buf bytes.Buffer
	// compress text
	w, err := xz.NewWriter(&buf)
	if err != nil {
		log.Fatalf("xz.NewWriter error %s", err)
	}
	if _, err := io.WriteString(w, text); err != nil {
		log.Fatalf("WriteString error %s", err)
	}
	if err := w.Close(); err != nil {
		log.Fatalf("w.Close error %s", err)
	}
	// decompress buffer and write output to stdout
	r, err := xz.NewReader(&buf)
	if err != nil {
		log.Fatalf("NewReader error %s", err)
	}
	if _, err = io.Copy(os.Stdout, r); err != nil {
		log.Fatalf("io.Copy error %s", err)
	}
}

when executed fails:

$ go run xztest.go
2017/02/15 10:55:38 io.Copy error xz: invalid header magic bytes
exit status 1

I would expect that it should correctly handle empty string.

multicore

Cool project!

Obvious enhancement for the future would be to parallelize some work, making use of all cpu cores.

Panic in lzma.writeRep

@pmezard reported a panic in the master tree that he has found using go-fuzz. Many thanks for that. I have asked for the go-fuzz code and the crasher sequence to check what caused the bug and to fix it in the dev tree.

reader.go:35: constant 4294967295 overflows int

# github.com/ulikunitz/xz/lzma
..\github.com\ulikunitz\xz\lzma\reader.go:35: constant 4294967295 overflows int
..\github.com\ulikunitz\xz\lzma\reader2.go:31: constant 4294967295 overflows int
..\github.com\ulikunitz\xz\lzma\writer.go:78: constant 4294967295 overflows int
..\github.com\ulikunitz\xz\lzma\writer2.go:55: constant 4294967295 overflows int

go 1.6.3
windows xp

Checksum None is valid

I'm trying to decompress iOS OTA files and they are failing your flags check. They appear to have "None" checksum.

https://tukaani.org/xz/xz-file-format-1.0.4.txt

2.1.1.2. Stream Flags

        The first byte of Stream Flags is always a null byte. In the
        future, this byte may be used to indicate a new Stream version
        or other Stream properties.

        The second byte of Stream Flags is a bit field:

            Bit(s)  Mask  Description
             0-3    0x0F  Type of Check (see Section 3.4):
                              ID    Size      Check name
                              0x00   0 bytes  None
                              0x01   4 bytes  CRC32
                              0x02   4 bytes  (Reserved)
                              0x03   4 bytes  (Reserved)
                              0x04   8 bytes  CRC64
                              0x05   8 bytes  (Reserved)
                              0x06   8 bytes  (Reserved)
                              0x07  16 bytes  (Reserved)
                              0x08  16 bytes  (Reserved)
                              0x09  16 bytes  (Reserved)
                              0x0A  32 bytes  SHA-256
                              0x0B  32 bytes  (Reserved)
                              0x0C  32 bytes  (Reserved)
                              0x0D  64 bytes  (Reserved)
                              0x0E  64 bytes  (Reserved)
                              0x0F  64 bytes  (Reserved)
             4-7    0xF0  Reserved for future use; MUST be zero for now.

Thank you for a great pkg!

Extract files from NSIS installer?

Hi,

I wondered if I can use your package to extract files from an NSIS installer .exe? It looks like it uses LZMA compression with an offset of 4 EF BE AD DE N u l l s o f t I n s t according to the command 7z i. Thanks for any help with this.

[SECURITY] Implementation of readUvarint vulnerable to CVE-2020-16845

Implementation of readUvarint at https://github.com/ulikunitz/xz/blob/master/bits.go#L56 is very similar to the vulnerable code in the Golang encoding/binary library and seems to suffer from the same vulnerability described in golang/go#40618.

See the fix at https://go-review.googlesource.com/c/go/+/247120/2/src/encoding/binary/varint.go

Note: I couldn't find any information on how to disclose this issue to the maintainers. I would also suggest setting up a Security Policy for the project within GitHub

5-byte padding

I've a deb package from my local /var/cache/apt/archives. I'm using debian stretch and the package is accountsservice_0.6.43-1_amd64.deb.

If I execute the following commands, everything works (no warnings):

$ ar x accountsservice_0.6.43-1_amd64.deb data.tar.xz
$ xz -d data.tar.xz
$ ls
accountsservice_0.6.43-1_amd64.deb  data.tar
$ tar tf data.tar
./
./etc/
./etc/dbus-1/
./etc/dbus-1/system.d/
./etc/dbus-1/system.d/org.freedesktop.Accounts.conf
...

But if I run the following code (where t/data.tar.xz is the file extracted from the deb cited above), I got an error:

func main() {
	f, err := os.Open("t/data.tar.xz")
	if err != nil {
		panic(err)
	}

	r, err := xz.NewReader(f)
	if err != nil {
		panic(err)
	}

	tr := tar.NewReader(r)
	for {
		hdr, err := tr.Next()
		if err != nil {
			panic(err)
		}	

		log.Println(hdr)
	}
}

The execution output is (I'm using Gogland IDE, but the result is the same if I run from terminal):

GOROOT=/usr/lib/go-1.8
GOPATH=/home/langbeck/git/golang/unpackers/external:/home/langbeck/git/golang/unpackers:/home/langbeck/go
/usr/lib/go-1.8/bin/go build -i -o /tmp/maingo main
/tmp/maingo
panic: xz: unexpected padding size 5

goroutine 1 [running]:
main.main()
	/home/langbeck/git/golang/unpackers/src/main/main.go:127 +0x16c

Note 1: line 127 is the line of the 3rd panic()
Note 2: xz --test data.tar.xz && echo OK run fine and the same deb file is installed in my system, so it's a valid deb file.

I'm attaching the deb file in question: accountsservice_0.6.43-1_amd64.deb.zip

Add an --x86 flag

The GNU xz command has a boolean --x86 flag which allegedly gives 0-15% extra compression when applied to files containing x86 machine code. The Linux kernel uses this flag when compressing its bzImage. It is also popular among UEFI implementations when compressing firmware.

The explanation from the xz man page is:

          A  BCJ filter converts relative addresses in the machine code to
          their absolute counterparts.  This doesn't change  the  size  of
          the  data,  but it increases redundancy, which can help LZMA2 to
          produce 0-15 % smaller .xz file.  The  BCJ  filters  are  always
          reversible, so using a BCJ filter for wrong type of data doesn't
          cause any data loss, although it may make the compression  ratio
          slightly worse.

We have already adapted this feature to Go. Would it be suitable to upstream it with an --x86 flag?

Expose `processFile` function

It would be helpful (for my use case) to export a function that performs the same behavior as the gxz command line utility so that it can be called programmatically. This would de-dupe the file read/ write logic that has already been written for the command line utility, allowing for more programmatic usage of this library.

Would this be a desirable addition?

This was attempted, incorrectly, in #48.

Equivalent of FastBytes?

The 7zip SDK has something they call "FastBytes" which seems to be a mechanism to limit how much time is spent looking for the best sequence to add to the dictionary. I don't see an equivalent here, how do you get around that?

Panic with invalid input

Found in fuzz test.

how to reproduce

package xz_test

import (
	"bytes"
	"io/ioutil"
	"testing"

	"github.com/ulikunitz/xz"
)

func TestPanic(t *testing.T) {
	data := []byte([]uint8{253, 55, 122, 88, 90, 0, 0, 0, 255, 18, 217, 65, 0, 189, 191, 239, 189, 191, 239, 48})
	t.Log(string(data))
	r, err := xz.NewReader(bytes.NewReader(data))
	if err != nil {
		t.Skip("OK")
	}
	b, err := ioutil.ReadAll(r)
	if err != nil {
		t.Skip("OK")
	}
	t.Log(b)
}
$ go test -run "TestPanic" -v
=== RUN   TestPanic
    panic_test.go:13: 7zXZAソ0
--- FAIL: TestPanic (0.00s)
panic: runtime error: makeslice: len out of range [recovered]
        panic: runtime error: makeslice: len out of range [recovered]
        panic: runtime error: makeslice: len out of range

goroutine 6 [running]:
testing.tRunner.func1.1(0x54ef00, 0x5b1c00)
        /home/linuxbrew/.linuxbrew/Cellar/go/1.15.7/libexec/src/testing/testing.go:1072 +0x30d
testing.tRunner.func1(0xc000001380)
        /home/linuxbrew/.linuxbrew/Cellar/go/1.15.7/libexec/src/testing/testing.go:1075 +0x41a
panic(0x54ef00, 0x5b1c00)
        /home/linuxbrew/.linuxbrew/Cellar/go/1.15.7/libexec/src/runtime/panic.go:969 +0x1b9
io/ioutil.readAll.func1(0xc000095f28)
        /home/linuxbrew/.linuxbrew/Cellar/go/1.15.7/libexec/src/io/ioutil/ioutil.go:30 +0x106
panic(0x54ef00, 0x5b1c00)
        /home/linuxbrew/.linuxbrew/Cellar/go/1.15.7/libexec/src/runtime/panic.go:969 +0x1b9
github.com/ulikunitz/xz.readIndexBody(0x5b40a0, 0xc00008c3c0, 0x100, 0xc000095bc0, 0x40df58, 0x20, 0x557560, 0x1)
        /home/heijo/ghq/github.com/ulikunitz/xz/format.go:684 +0x1d4
github.com/ulikunitz/xz.(*streamReader).readTail(0xc00008a1e0, 0xc000074490, 0xc000074490)
        /home/heijo/ghq/github.com/ulikunitz/xz/reader.go:163 +0x50
github.com/ulikunitz/xz.(*streamReader).Read(0xc00008a1e0, 0xc000244000, 0x200, 0x200, 0xc000095dd0, 0x40b125, 0xc000095dd8)
        /home/heijo/ghq/github.com/ulikunitz/xz/reader.go:209 +0x4f9
github.com/ulikunitz/xz.(*Reader).Read(0xc00008c3f0, 0xc000244000, 0x200, 0x200, 0xc000244000, 0x0, 0x0)
        /home/heijo/ghq/github.com/ulikunitz/xz/reader.go:112 +0xe5
bytes.(*Buffer).ReadFrom(0xc00006feb0, 0x5b4120, 0xc00008c3f0, 0x0, 0xc00008c300, 0x5b40a0)
        /home/linuxbrew/.linuxbrew/Cellar/go/1.15.7/libexec/src/bytes/buffer.go:204 +0xb1
io/ioutil.readAll(0x5b4120, 0xc00008c3f0, 0x200, 0x0, 0x0, 0x0, 0x0, 0x0)
        /home/linuxbrew/.linuxbrew/Cellar/go/1.15.7/libexec/src/io/ioutil/ioutil.go:36 +0xe5
io/ioutil.ReadAll(...)
        /home/linuxbrew/.linuxbrew/Cellar/go/1.15.7/libexec/src/io/ioutil/ioutil.go:45
github.com/ulikunitz/xz_test.TestPanic(0xc000001380)
        /home/heijo/ghq/github.com/ulikunitz/xz/panic_test.go:18 +0x185
testing.tRunner(0xc000001380, 0x58fab0)
        /home/linuxbrew/.linuxbrew/Cellar/go/1.15.7/libexec/src/testing/testing.go:1123 +0xef
created by testing.(*T).Run
        /home/linuxbrew/.linuxbrew/Cellar/go/1.15.7/libexec/src/testing/testing.go:1168 +0x2b3
exit status 2
FAIL    github.com/ulikunitz/xz 0.005s

high allocation ratio

I have a processing scenario where I read lzma objects and I need to decompress them.
while using pprof I could see that the lzma reader allocates buffers for every message:
0 0% 0.0046% 433804685.70MB 96.50% github.com/ulikunitz/xz/lzma.NewReader (inline)
4122.21MB 0.00092% 0.0056% 433804685.70MB 96.50% github.com/ulikunitz/xz/lzma.ReaderConfig.NewReader
2414.61MB 0.00054% 0.0061% 432805222.15MB 96.28% github.com/ulikunitz/xz/lzma.newDecoderDict (inline)
432802807.54MB 96.28% 96.28% 432802807.54MB 96.28% github.com/ulikunitz/xz/lzma.newBuffer (inline)

can we add some option for allowing to have a pool of that buffer? or some other way to reuse a reader?

Current maturity of project (and other semantics)

As mentioned in #1, this project was declared as "not even alpha" in 2015. What is the current maturity of this project?

As mentioned in #23, this project was considered to be not near the speed of XZ back in 2018. Has the speed improved since then? Were you referring to general code optimization or part of your LZMA implementation that needed to be reimplemented according to the specification in order to meet the standard speed expectations?

As mentioned in #26, this project implements a different LZMA algorithm than XZ. Would you be able to provide the names/URLs of your algorithm versus that algorithm?

Cannot decompress some archives

Some xz archives fail part way during decompression. Quite a few of the Linux kernel releases fall into this category.

You can reproduce it via:

[djm@demiurge ~]$ wget -q https://www.kernel.org/pub/linux/kernel/v3.0/linux-3.13.tar.xz                                                                    
[djm@demiurge ~]$ xzcat linux-3.13.tar.xz | wc -c
549816320
[djm@demiurge ~]$ gxz -dc linux-3.13.tar.xz | wc -c
201330688
[djm@demiurge ~]$ echo $?

Note that the truncation is silent - no error is written to stderr and the exit status is 0. The problem isn't in cmd/xz - I noticed it first using the library directly.

limit reached error

@ulikunitz - we are sporadically getting a limit reached error and wondering what are the possible reasons that this might be happening or if there is a way to increase initial setting of the N:? Maybe something we can do with the props or writerConfig to increase this limit? The other weird thing is that this only seems to happens on *.tar.xz files.

Should the block padding size validated?

Thanks to Dórian C. Langbeck I realized that I confused the padding for the block header with the padding of the block in the discussion of issue #15 . Currently we don't check the block padding size. Whether we should check it is an open question.

3.3. Block Padding

        Block Padding MUST contain 0-3 null bytes to make the size of
        the Block a multiple of four bytes. This can be needed when
        the size of Compressed Data is not a multiple of four. If any
        of the bytes in Block Padding are not null bytes, the decoder
        MUST indicate an error.

Missing common APIs like Reader:Close() Writer:Flush()

Greetings everyone,

I'm a little fresh with golang, so hope I am looking at the right place.

Going through the API - it seems like Reader:Close and Writer:Flush are missing.
These are fairly common - as you can see in the the golang standard library zlib and zgip.

Is it possible to have these in future release?
Thanks in advance

Out of Memory bug when using a large reader

This had me scratching my head today for a solution-

I am decompressing many files, however a few wouldn't work because they are large and I would get out of memory panics. I looked at my code and saw I was reading the full file into a byte slice which is expensive memory-wise. I rewrote the code to use io.Readers of sorts. Now, even with that I still get out of memory panics. Looking at the source code, the issue lies with ReaderConfig.NewReader calling newDecoderDict, which calls newBuffer which makes a byte slice of the buffer size, which is exactly what I had fixed in my own code. Now I am asking, is it possible to remedy this, and if so, could we?

Thanks,

memory leak

There is a memory leak. When I decompressed all the kernel modules for analysis, the VmRSS occupancy increased to 150M

LZMA2 reader issue

I am having trouble decoding a file.
I reproduced the issue with gxz after encountering it with my test case.

The error is:
lzma: Reader2 doesn't get data

r.err = errors.New("lzma: Reader2 doesn't get data")

I have attached the file that produces the issue.
gxz
gxz -d
gxz lzma: Reader2 doesn't get data

xz -d appears to decode properly so this would seem to point to a problem in reader2.
Will follow up with more info or a fix.

The offending file:
https://www.dropbox.com/s/gmoyva5lx5k96vs/utils.LegacyBloomFilter.bin?dl=0

Don't worry about this lssue

image
解决了go编译器下载依赖的问题后,又出现了一个新的问题。
编译时,它似乎报错了,在src目录下也并没有出现arozos。
我不太了解go编译器,因此我没有办法自己排查错误,所以这个问题在您眼里可能是个很傻的问题。
但我带着不懂就问的心态,还是新建了一个问题,希望能快速解决。

expose blockreader

I'm working on a library to do random access in xz files with multiple blocks. I would love to use your library to do the heavy lifting instead of reinventing the wheel. I need to use some of the internal pieces though, including the blockReader

xz/reader.go

Line 261 in 067145b

type blockReader struct {
and related parts.

Would you be open to a PR which refactors things so your original package keeps the same public interface but uses a new ulikunitz/xz/lib/xzinternals which makes public some of these currently private structs?

Using this in archiver utility

Hi, this isn't an issue (so you can close this) -- I just wanted to thank you for this package. I've added .tar.xz support to archiver after a request for it.

I know you've said in another issue that this is "pre-alpha" work, and that's fine, but just so you know it's being used now. 👍

Memory not released?

First, many thanks for that library, I'm using it in a backup application. Works great!

But I'm not sure whether the following behavior is a bug:
I've developed a backup application which runs as a daemon and compresses files on a daily schedule. While the application is in an idle state, it consumes ~ 6MB RAM. When the daemon executes some tasks (other than xz compression), the memory consumption grows as long as the task is executed. When such a task has completed, the memory usage normalizes back to ~6MB.

But now, when the task starts which compresses files, the daemon consumes ~ 110MB memory which is absolutely ok, but this amount of memory is not released after the task has completed. So, after execution of the compression task the daemon process utilizes still that amount of memory. This memory usage remains the same even after multiple executions of the 'compression' task. Is this a normal behavior or am I doing something wrong?

Here is the code snippet which is responsible for the compression:

	destinationFile, err := os.OpenFile(destinationFilePath, os.O_CREATE|os.O_WRONLY, 0644)
	if err != nil {
		return err
	}
	defer destinationFile.Close()

	xzWriter, err := xz.NewWriter(destinationFile)
	if err != nil {
		return err
	}
	defer xzWriter.Close()

	if _, err := io.Copy(xzWriter, sourceFile); err != nil {
		return err
	}

	return destinationFile.Sync()

What do you think?

Achieving maximum xz compression

If I am using the following code, how do I set the options for maximum compression (the -9 preset level)?

// target is of type io.Writer
 xzw, _ := xz.NewWriter(target)

Unzipping is too slow

When i tried to unzip big file (about 3 GiB size in xz and about 18 GiB unpacked) the process was too slow - only 3 GiB of 18 unpacked in about 40 min on my machine. The same file was unpacked for about 5 minutes using 7 zip tool

low compression ratio

$ ll
-rw-r--r--  1 jpillora  wheel   550K 30 Jan 21:18 a.log
$ cp a.log b.log
$ cp a.log c.log
$ xz a.log
$ gxz b.log
$ gzip c.log
$ ll
-rw-r--r--  1 jpillora  wheel   6.0K 30 Jan 15:43 a.log.xz
-rw-r--r--  1 jpillora  wheel   207K 30 Jan 21:16 b.log.xz
-rw-r--r--  1 jpillora  wheel    10K 30 Jan 21:16 c.log.gz

Any idea why this is?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.