Giter Site home page Giter Site logo

bodgit / sevenzip Goto Github PK

View Code? Open in Web Editor NEW
135.0 5.0 13.0 1.83 MB

Golang library for dealing with 7-zip archives

Home Page: https://godoc.org/github.com/bodgit/sevenzip

License: BSD 3-Clause "New" or "Revised" License

Go 100.00%
golang golang-library archive archiving compression decompression decompressor 7zip 7z lzma

sevenzip's Introduction

OpenSSF Scorecard OpenSSF Best Practices GitHub release Build Status Coverage Status Go Report Card GoDoc Go version Go version

sevenzip

A reader for 7-zip archives inspired by archive/zip.

Current status:

  • Pure Go, no external libraries or binaries needed.
  • Handles uncompressed headers, (7za a -mhc=off test.7z ...).
  • Handles compressed headers, (7za a -mhc=on test.7z ...).
  • Handles password-protected versions of both of the above (7za a -mhc=on|off -mhe=on -ppassword test.7z ...).
  • Handles archives split into multiple volumes, (7za a -v100m test.7z ...).
  • Handles self-extracting archives, (7za a -sfx archive.exe ...).
  • Validates CRC values as it parses the file.
  • Supports ARM, BCJ, BCJ2, Brotli, Bzip2, Copy, Deflate, Delta, LZ4, LZMA, LZMA2, PPC, SPARC and Zstandard methods.
  • Implements the fs.FS interface so you can treat an opened 7-zip archive like a filesystem.

More examples of 7-zip archives are needed to test all of the different combinations/algorithms possible.

Frequently Asked Questions

Why is my code running so slow?

Someone might write the following simple code:

func extractArchive(archive string) error {
        r, err := sevenzip.OpenReader(archive)
        if err != nil {
                return err
        }
        defer r.Close()

        for _, f := range r.File {
                rc, err := f.Open()
                if err != nil {
                        return err
                }
                defer rc.Close()

                // Extract the file
        }

        return nil
}

Unlike a zip archive where every file is individually compressed, 7-zip archives can have all of the files compressed together in one long compressed stream, supposedly to achieve a better compression ratio. In a naive random access implementation, to read the first file you start at the beginning of the compressed stream and read out that files worth of bytes. To read the second file you have to start at the beginning of the compressed stream again, read and discard the first files worth of bytes to get to the correct offset in the stream, then read out the second files worth of bytes. You can see that for an archive that contains hundreds of files, extraction can get progressively slower as you have to read and discard more and more data just to get to the right offset in the stream.

This package contains an optimisation that caches and reuses the underlying compressed stream reader so you don't have to keep starting from the beginning for each file, but it does require you to call rc.Close() before extracting the next file. So write your code similar to this:

func extractFile(file *sevenzip.File) error {
        rc, err := f.Open()
        if err != nil {
                return err
        }
        defer rc.Close()

        // Extract the file

        return nil
}

func extractArchive(archive string) error {
        r, err := sevenzip.OpenReader(archive)
        if err != nil {
                return err
        }
        defer r.Close()

        for _, f := range r.File {
                if err = extractFile(f); err != nil {
                        return err
                }
        }

        return nil
}

You can see the main difference is to not defer all of the Close() calls until the end of extractArchive().

There is a set of benchmarks in this package that demonstrates the performance boost that the optimisation provides, amongst other techniques:

$ go test -v -run='^$' -bench='Reader$' -benchtime=60s
goos: darwin
goarch: amd64
pkg: github.com/bodgit/sevenzip
cpu: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
BenchmarkNaiveReader
BenchmarkNaiveReader-12                  	       2	31077542628 ns/op
BenchmarkOptimisedReader
BenchmarkOptimisedReader-12              	     434	 164854747 ns/op
BenchmarkNaiveParallelReader
BenchmarkNaiveParallelReader-12          	     240	 361869339 ns/op
BenchmarkNaiveSingleParallelReader
BenchmarkNaiveSingleParallelReader-12    	     412	 171027895 ns/op
BenchmarkParallelReader
BenchmarkParallelReader-12               	     636	 112551812 ns/op
PASS
ok  	github.com/bodgit/sevenzip	472.251s

The archive used here is just the reference LZMA SDK archive, which is only 1 MiB in size but does contain 630+ files split across three compression streams. The only difference between BenchmarkNaiveReader and the rest is the lack of a call to rc.Close() between files so the stream reuse optimisation doesn't take effect.

Don't try and blindly throw goroutines at the problem either as this can also undo the optimisation; a naive implementation that uses a pool of multiple goroutines to extract each file ends up being nearly 50% slower, even just using a pool of one goroutine can end up being less efficient. The optimal way to employ goroutines is to make use of the sevenzip.FileHeader.Stream field; extract files with the same value using the same goroutine. This achieves a 50% speed improvement with the LZMA SDK archive, but it very much depends on how many streams there are in the archive.

In general, don't try and extract the files in a different order compared to the natural order within the archive as that will also undo the optimisation. The worst scenario would likely be to extract the archive in reverse order.

sevenzip's People

Contributors

bodgit avatar dependabot[bot] avatar orisano avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

sevenzip's Issues

Support BCJ method

Decompression method: 3.3.1.3

The GameCube homebrew "Swiss" is distributed in a 7z that uses BCJ for some of the binaries.

Directories should have trailing /

Matching what archive/zip does, sevenzip.FileHeader Name fields should have a trailing / if they're a directory. Zip files seem to have this as part of the archive but 7z files don't so we should add it.

Add support for reading self-extracting archives

A self-extracting archive is an executable with the archive appended to it. As such the 6-byte signature isn't at the beginning of the file.

It should be a case of searching through the file to find the signature.

Low performance of .7z files with password

I extract a file with password, my code likes:

func extract7zArchive(archive string, path string, password string) error {
	reader, err := sevenz.OpenReaderWithPassword(archive, password)
	if err != nil {
		return err
	}
	defer reader.Close()

	for _, f := range reader.File {
		target := PathAppend(path, f.Name)
		// seems no flag to check if f is directory, but it will end with / if is directory
		if IsStrEndWith(f.Name, "/", true) {
			if err := os.MkdirAll(target, f.Mode()); err != nil {
				return err
			}
		} else {
			_f, err := os.OpenFile(target, os.O_CREATE, f.Mode())
			if err != nil {
				return err
			}
			defer _f.Close()

			pre := time.Now().UnixMilli()
			fReader, err := f.Open()
			LogInfo("%d ms cost open %s", time.Now().UnixMilli()-pre, f.Name)
			if err != nil {
				return err
			}
			defer fReader.Close()

			if _, err = io.Copy(_f, fReader); err != nil {
				return err
			}
		}
	}

	return nil
}

it very slow on f.Open, I follow the code and found performance problem at internal/aes7z/key.go

func calculateKey(password string, cycles int, salt []byte) []byte {
	b := bytes.NewBuffer(salt)

	// Convert password to UTF-16LE
	utf16le := unicode.UTF16(unicode.LittleEndian, unicode.IgnoreBOM)
	t := transform.NewWriter(b, utf16le.NewEncoder())
	_, _ = t.Write([]byte(password))

	key := make([]byte, sha256.Size)
	if cycles == 0x3f {
		copy(key, b.Bytes())
	} else {
		pre := time.Now().UnixMilli()
		h := sha256.New()
		count := 0
		for i := uint64(0); i < 1<<cycles; i++ {
			count++
			// These will never error
			_, _ = h.Write(b.Bytes())
			_ = binary.Write(h, binary.LittleEndian, i)
		}
		println(time.Now().UnixMilli()-pre, "ms cost, loop count", count)
		copy(key, h.Sum(nil))
	}

	return key
}

for loop run 500k counts+, cost about 250ms on my pc(cpu i7 9700)
output:
250 ms cost, loop count 524288 11-30 11:37:46,info,archivehlpr.go.155:251 ms cost open apps.json

Handle wrong passwords

I'm using github.com/bodgit/sevenzip v1.3.0.

When I try to extract a password protected 7z archive with a wrong password, I get newRangeDecoder: first byte not zero error.

Here's a test to replicate it (needs stretchr/testify):

package main

import (
	"encoding/base64"
	"io"
	"io/ioutil"
	"testing"

	"github.com/bodgit/sevenzip"
	"github.com/stretchr/testify/assert"
)

func TestWrongPassword(t *testing.T) {
	arcWithPass := "N3q8ryccAARCn6bIEAAAAAAAAABiAAAAAAAAAG8Zm0GDjJRj+0NwGgTAtoiHYDl5AQQGAAEJEAAHCwEAAiQG8QcBElMPMHa6n3dxaC3cnHuv+q8n7yEhAQABAAwNCQAICgEK18HKAAAFARkBABELAHQAZQB4AHQAAAAZABQKAQCWO8QQJK/YARUGAQAggKSBAAA="
	arcWOPass := "N3q8ryccAASaR9RFDQAAAAAAAABSAAAAAAAAADFkgJoBAAhzZXZlbnppcAoAAQQGAAEJDQAHCwEAASEhAQAMCQAICgEK18HKAAAFARkMAAAAAAAAAAAAAAAAEQsAdABlAHgAdAAAABkAFAoBAJY7xBAkr9gBFQYBACCApIEAAA=="
	tests := []struct {
		name          string
		archiveBase64 string
		password      string
	}{
		{
			name:          "password protected, correct password",
			archiveBase64: arcWithPass,
			password:      "password",
		},
		{
			name:          "password protected, wrong password",
			archiveBase64: arcWithPass,
			password:      "password1", // fails `newRangeDecoder: first byte not zero`
		},
		{
			name:          "password protected, wrong password",
			archiveBase64: arcWithPass,
			password:      "not 'password'", // fails `lzma: unexpected chunk type`
		},
		{
			name:          "not protected, with password",
			archiveBase64: arcWOPass,
			password:      "password",
		},
		{
			name:          "not protected, without password",
			archiveBase64: arcWOPass,
			password:      "",
		},
		{
			name:          "password protected, wrong password, another error",
			archiveBase64: "N3q8ryccAATmiIjWEAAAAAAAAABqAAAAAAAAAPj97y5WCxqWKbLj7vDaXAe/cLh2AQQGAAEJEAAHCwEAAiQG8QcBElMPRIjTr6dLKHHK30sEtwLboyEhAQABAAwMCAAICgH8UFzbAAAFARkBABETAHQAZQB4AHQALgB0AHgAdAAAABkAFAoBADbNXd0pr9gBFQYBACCApIEAAA==",
			password:      "notpassword", // fails `lzma: unsupported chunk header byte`
		},
	}
	for _, tt := range tests {
		t.Run(tt.name, func(t *testing.T) {
			archive, size := readBase64(tt.archiveBase64)
			r, err := sevenzip.NewReaderWithPassword(archive, size, tt.password)
			assert.NoError(t, err)

			for _, af := range r.File {
				f, err := af.Open()
				assert.NoError(t, err)
				_, err = io.Copy(ioutil.Discard, f)
				assert.NoError(t, err)
			}
		})
	}
}

func readBase64(encoded string) (io.ReaderAt, int64) {
	// decode as base64
	decoded, err := base64.StdEncoding.DecodeString(encoded)
	if err != nil {
		panic(err)
	}
	return bytesAt{decoded}, int64(len(decoded))
}

type bytesAt struct {
	b []byte
}

func (b bytesAt) ReadAt(p []byte, off int64) (n int, err error) {
	if off >= int64(len(b.b)) {
		return 0, io.EOF
	}
	n = copy(p, b.b[off:])
	return
}

Here the second case password protected, wrong password fails with an error. It seems that the error comes deep within io package when I try to read the archived file io.Copy(ioutil.Discard, f).

It would be great if it detected wrong passwords when I call NewReaderWithPassword and returned an ErrWrongPassword or something. This would let me catch it with errors.Is(err, ErrWrongPassword) and try another password.

I can submit a fix if you can give me some pointers on where to look.

Return list of volume archive files.

Hello,

I'd like to be able to action (read: delete) the archives that have been decompressed. What do you think about adding a method like the following to reader.go?

// Volumes returns the list of compressed files that have been identified as part of the current archive.
func (rc *ReadCloser) Volumes() []string {
	volumes := make([]string, len(rc.f))
	for idx, f := range rc.f {
		volumes[idx] = f.Name()
	}

	return volumes
}

Probably a better way to do this, but I think this change would give me what I'm looking for. A list of archives.

EDIT: Now that I think about it, it would be best if the Volumes() method provided point-in-time data, so when we get an error, we could look at the last slice item returned to identify the corrupted archive. In other words, the method should only return files that have been processed, and the one currently processing (if it's not done). I'm taking this ideal from here: https://github.com/nwaples/rardecode/blob/e2fa07408d4b19ae0500efbcc6983863c95f821e/reader.go#L359-L364

low performance for aes7z

I have a encrypted 7z file, file size was about 60MB, 100+ files in it, when I extract it, cost over 20 seconds on my computer, I trace the code found key will calculate on every file opened in sevenz.File.Open , its very slow, I try to optimize aes7z.calculateKey for get key cached, (I guess all files in a 7z has same key, but I can't comfirm it), it seems works ok, the aes7z/key.go I optimized is here:

package aes7z

import (
	"bytes"
	"crypto/sha256"
	"encoding/binary"
	"sync"

	"golang.org/x/text/encoding/unicode"
	"golang.org/x/text/transform"
)

type keyCacheItem struct {
	password string
	cycles   int
	salt     []byte
	key      []byte
}

func (c *keyCacheItem) hittest(password string, cycles int, salt []byte) bool {
	return c.password == password && c.cycles == cycles && bytes.Equal(salt, c.salt)
}

var keyCache []*keyCacheItem = []*keyCacheItem{}
var keyCacheLock sync.RWMutex

func findKeyCached(password string, cycles int, salt []byte) []byte {
	keyCacheLock.RLock()
	defer keyCacheLock.RUnlock()
	for _, kci := range keyCache {
		if kci.hittest(password, cycles, salt) {
			return kci.key
		}
	}

	return nil
}

func recordKeyCached(password string, cycles int, salt []byte, key []byte) {
	keyCacheLock.Lock()
	defer keyCacheLock.Unlock()
	keyCache = append(keyCache, &keyCacheItem{password: password, cycles: cycles, salt: salt, key: key})
}

func calculateKey(password string, cycles int, salt []byte) []byte {
	k := findKeyCached(password, cycles, salt)
	if len(k) > 0 {
		// key found in cache
		return k
	}
	b := bytes.NewBuffer(salt)

	// Convert password to UTF-16LE
	utf16le := unicode.UTF16(unicode.LittleEndian, unicode.IgnoreBOM)
	t := transform.NewWriter(b, utf16le.NewEncoder())
	_, _ = t.Write([]byte(password))

	key := make([]byte, sha256.Size)
	if cycles == 0x3f {
		copy(key, b.Bytes())
	} else {
		h := sha256.New()
		for i := uint64(0); i < 1<<cycles; i++ {
			// These will never error
			_, _ = h.Write(b.Bytes())
			_ = binary.Write(h, binary.LittleEndian, i)
		}
		copy(key, h.Sum(nil))
	}

	recordKeyCached(password, cycles, salt, key)
	return key
}

my test code is here

package main

import (
	"io"
	"os"
	"time"

	sevenz "github.com/bodgit/sevenzip"
)

func open7zArchive(archive, password string) (*sevenz.ReadCloser, error) {
	if password != "" {
		return sevenz.OpenReaderWithPassword(archive, password)
	}
	return sevenz.OpenReader(archive)
}

func extract7zItem(file *sevenz.File, target string) error {
	if file.FileInfo().IsDir() {
		return os.MkdirAll(target, file.Mode())
	}

	f, err := os.OpenFile(target, os.O_CREATE|os.O_WRONLY, file.Mode())
	if err != nil {
		return err
	}
	defer f.Close()

	// open on each file cost too many times
	fReader, err := file.Open()
	if err != nil {
		return err
	}
	defer fReader.Close()

	_, err = io.Copy(f, fReader)

	return err
}

func extract7zWithCallback(archive string, password string, handler func(*sevenz.File) error) error {
	reader, err := open7zArchive(archive, password)
	if err != nil {
		return err
	}
	defer reader.Close()

	for _, f := range reader.File {
		if err := handler(f); err != nil {
			return err
		}
	}

	return nil
}

func extract7zArchive(archive, password string, path string) error {
	if path[len(path)-1] != '/' && path[len(path)-1] != '\\' {
		path += "/"
	}
	return extract7zWithCallback(archive, password, func(f *sevenz.File) error {
		return extract7zItem(f, path+f.Name)
	})
}

func main() {
	t := time.Now().UnixMilli()
	println(extract7zArchive("F:\\log4job\\日常工作\\2024\\03-29\\client_v2403.csp", "test", "F:\\log4job\\日常工作\\2024\\03-29\\client_v2403"))
	println(time.Now().UnixMilli()-t, " ms cost")
}

extract my 7z file cost 20 seconds on no cache and 5 seconds on key cached

seek over file in archive

I'm not sure if that's possible, but it will be really nice to have ReadSeekCloser instead of ReadCloser which should allow us to read desired bytes from files inside the archive.

docs/examlpes

Are there any documents or examples describing how to use anything this package provides?

sevenzip as a guide

@bodgit This isn't about sevenzip directly but could really use your insight if you have a moment.

Just for kicks I thought I would try to manually decode a 7zip archive just so I can understand the format better. I've been using the sevenzip code as a guide.

   Signature: 377abcaf271c
     Version: 0004

StartHeaderCRC: 886524f2
NextHeaderOffset: 0000000000ad6d77 (11365751)
NextHeaderSize: 0000000000000025 (37)
NextHeaderCRC: 0fc78a39
XXX kEncodedHeader 0x17
&main.streamsInfo{packInfo:(*main.packInfo)(0x140000a8180), unpackInfo:(*main.unpackInfo)(0x140000c0000), subStreamsInfo:(*main.subStreamsInfo)(nil)}
&main.packInfo{position:0xad6cd6, streams:0x1, size:[]uint64{0xa1}, digest:[]uint32(nil), defined:[]bool(nil)}
&main.unpackInfo{folder:[]*main.folder{(*main.folder)(0x140000c2000)}, digest:[]uint32{0x62bad49f}, defined:[]bool{true}}
&main.folder{in:0x1, out:0x1, packedStreams:0x1, coder:[]*main.coder{(*main.coder)(0x140000aa040)}, bindPair:[]*main.bindPair{}, size:[]uint64{0x14e}, packed:[]uint64{0x0}}
&main.coder{id:[]uint8{0x3, 0x1, 0x1}, in:0x1, out:0x1, properties:[]uint8{0x5d, 0x0, 0x10, 0x0, 0x0}}

I'm trying to decompress the lzma encoded headers. I've believe I've located the position of the compressed headers relative to the signature header at 11365590+32. The compressed headers look like 161 bytes, and uncompressed size looks like 334. I can see that they are compressed with lzma (0x3, 0x1, 0x1).

I should just be able to seek to 11365590+32, copy 161 bytes to a []byte, create call bytes.NewReader on the []byte, and pass the bytes.Reader to lzma.NewReader(). However lzma.NewReader gives me this error: newRangeDecoder: first byte not zero

If I dump the 161 bytes to disk I can see the first two bytes are 0x00 so I think the error message from the lzma module (github.com/ulikunitz/xz/lzma) might be misleading.

00000000: 0000 8133 07ae 0fd9

So seems I'm missing something someplace. I'm hoping you might see what it is since you've done this before.

Support PPC method

Decompression method: 3.3.2.5

The GameCube homebrew "Swiss" is distributed in a 7z that uses PPC compression for some of the binaries, (GameCube is a PPC-based machine, so not surprising).

Errors are not accessible since they are private with a lower case

Shoulnd't the the errors defined in sevenzip be accessible for comparison? Currently, errors are defined with leading lower case, therefore, parking them private to external modules. archive/zip uses "ErrFormat", but this module uses "errFormat."

Can we fix this please?

Empty File processing issue

for 7-zip archive data, version 0.4, below code would work

types.go 697 ~710

		case idEmptyFile:
			empty, err := readBool(r, emptyStreams)
			if err != nil {
				return nil, err
			}

			j := 0

			for i := range f.file {
				if f.file[i].isEmptyStream {
					f.file[i].isEmptyFile = empty[j]
				}
				j++
			}

But for 7-zip archive data, version 0.3, empty slice may be index out of range.
I found that in version 0.3, the empty file does not appear only in front of the f.file, which causes the index of empty to be taken later on to exceed.

Should j++ write in isEmptyStream condition , like below ?

		case idEmptyFile:
			empty, err := readBool(r, emptyStreams)
			if err != nil {
				return nil, err
			}

			j := 0

			for i := range f.file {
				if f.file[i].isEmptyStream {
					f.file[i].isEmptyFile = empty[j]
					j++
				}
			}

Extract files from a self-extracting exe

Should it be as simple as replacing an existing function which starts with r, err := zip.OpenReader(archive) with r, err := sevenzip.OpenReader(archive) and then being able to use existing f.Open() and io.Copy() lines? I am successful with a test.7z I made but I don't seem to be able to get it to work with this self-extracting exe file: https://www.reaper.fm/files/6.x/reaper683_x64-install.exe. The PE file is definitely valid:

reaper683_x64-install.exe: PE32 executable (GUI) Intel 80386, for MS Windows, Nullsoft Installer self-extracting archive, 5 sections

Plus, I'm able to open this using 7-zip and Gnome archive manager. What am I missing?

Unable to decrypt 7-ZIP file with password (err: breader.ReadByte: no data!)

What version of the package or command are you using?

master (4e0de6e

What are you trying to do?

Decompress 7-ZIP file with password

What steps did you take?

func TestOpenReaderWithPassword(t *testing.T) {
	t.Parallel()

	tables := []struct {
		name, file, password string
	}{
		{
			name:     "xxxx header compression",
			file:     "7zcracker.7z",
			password: "876",
		},
	}
....

see: reader_test.go
the file url: download
the file is created by "7z2201-x64.exe" (from www.7-zip.org) with default options.

What did you expect to happen, and what actually happened instead?

Err: breader.ReadByte: no data !

Running tool: D:\__SYNC1\Softwares\Go\bin\go.exe test -timeout 30s -run ^TestOpenReaderWithPassword$ github.com/bodgit/sevenzip

--- FAIL: TestOpenReaderWithPassword (0.00s)
    --- FAIL: TestOpenReaderWithPassword/xxxx_header_compression (1.59s)
        d:\__SYNC2\go-path\src\zollty.com\test\sevenzip\reader_test.go:183: breader.ReadByte: no data
FAIL
FAIL	github.com/bodgit/sevenzip	3.581s
FAIL

After my testing:

All 7-zip encrypted files created by software 7-zip (official windows x64 version, default options) cannot be decrypted.

but files encrypted by another software Bandzip can be decrypted normally.

Unable to decrypt 7-ZIP file with password (err: breader.ReadByte: no data!)

What version of the package or command are you using?

master (4e0de6e

What are you trying to do?

Decompress 7-ZIP file with password

What steps did you take?

func TestOpenReaderWithPassword(t *testing.T) {
	t.Parallel()

	tables := []struct {
		name, file, password string
	}{
		{
			name:     "xxxx header compression",
			file:     "7zcracker.7z",
			password: "876",
		},
	}
....

see: reader_test.go
the file url: download
the file is created by "7z2201-x64.exe" (from www.7-zip.org) with default options.

What did you expect to happen, and what actually happened instead?

Err: breader.ReadByte: no data !

Running tool: D:\__SYNC1\Softwares\Go\bin\go.exe test -timeout 30s -run ^TestOpenReaderWithPassword$ github.com/bodgit/sevenzip

--- FAIL: TestOpenReaderWithPassword (0.00s)
    --- FAIL: TestOpenReaderWithPassword/xxxx_header_compression (1.59s)
        d:\__SYNC2\go-path\src\zollty.com\test\sevenzip\reader_test.go:183: breader.ReadByte: no data
FAIL
FAIL	github.com/bodgit/sevenzip	3.581s
FAIL

After my testing:

All 7-zip encrypted files created by software 7-zip (official windows x64 version, default options) cannot be decrypted.

but files encrypted by another software Bandzip can be decrypted normally.

OpenReaderWithPassword() returns nil ("ok,alright") with wrong password (when -mhe=off)

When i made compessed file with password (or without password, when password == "") and no header compress, i've get no errors when read files from 7z archive - with that wrong password. I've got file (wrong data) in sevenzip.fs as if input was right, true password:

r, err := sevenzip.OpenReaderWithPassword(PathFile, "somewrongpass")
defer r.Close()
if err != nil { //  there is nil, ok - prigram is continued.
	err.Println(err) 
	os.Exit(-1)

}
(but the program continues as there was no error )

May be Im do not undestand something? Is it normal behavior?

Improve performance

Currently, every time a file within the archive is opened, a new copy of the decompressed stream is created and then read until the beginning of the file in question is reached. This means there is a performance hit that gets worse as you descend into the archive. To read the 10th file, you have to read and discard the first 9 files in the archive, to read the 100th file, you have to read past 99 files in the archive, etc.

A performance improvement would be to keep the decompressed stream reader around for any future files that are further along in the archive, (you can't go backwards).

Empty files

Related to #4 add support for empty files. The test case is actually quite easy to reproduce, just create an archive with some zero length files; the files are marked as empty rather than trying to compress nothing which is possibly more efficient.

panic: runtime error: index out of range

panic: runtime error: index out of range [104] with length 5

goroutine 74 [running]:
panic({0x17e2ce0, 0xcaa3260})
c:/go/src/runtime/panic.go:987 +0x364 fp=0xdec1790 sp=0xdec1734 pc=0xcea404
runtime.goPanicIndex(0x68, 0x5)
c:/go/src/runtime/panic.go:113 +0xa3 fp=0xdec17b0 sp=0xdec1790 pc=0xce8373
github.com/bodgit/sevenzip.readFilesInfo({0x2b3c1c40, 0xc91ded8})
C:/Users/Administrator/go/pkg/mod/github.com/bodgit/[email protected]/types.go:707 +0xa66 fp=0xdec1870 sp=0xdec17b0 pc=0x111ed86
github.com/bodgit/sevenzip.readHeader({0x2b3c1c40, 0xc91ded8})
C:/Users/Administrator/go/pkg/mod/github.com/bodgit/[email protected]/types.go:811 +0x108 fp=0xdec18ec sp=0xdec1870 pc=0x111eea8
github.com/bodgit/sevenzip.readEncodedHeader({0x2b3c1c40, 0xc91ded8})
C:/Users/Administrator/go/pkg/mod/github.com/bodgit/[email protected]/types.go:849 +0xe8 fp=0xdec1914 sp=0xdec18ec pc=0x111f498
github.com/bodgit/sevenzip.(*Reader).init(0xc0fe2a0, {0x19e1abc, 0xc717e90}, 0x38a5b)
C:/Users/Administrator/go/pkg/mod/github.com/bodgit/[email protected]/reader.go:356 +0xa42 fp=0xdec1ad0 sp=0xdec1914 pc=0x11153b2
github.com/bodgit/sevenzip.NewReaderWithPassword({0x19e1abc, 0xc717e90}, 0x38a5b, {0x0, 0x0})
C:/Users/Administrator/go/pkg/mod/github.com/bodgit/[email protected]/reader.go:242 +0x8d fp=0xdec1af4 sp=0xdec1ad0 pc=0x111478d
github.com/mholt/archiver/v4.SevenZip.Extract({0x0, {0x0, 0x0}}, {0x19e5ba8, 0xac341c0}, {0x19e1aa8, 0xc717e90}, {0x0, 0x0, 0x0}, ...)
C:/Users/Administrator/go/pkg/mod/github.com/mholt/archiver/[email protected]/7z.go:76 +0x181 fp=0xdec1c44 sp=0xdec1af4 pc=0x1186111
github.com/mholt/archiver/v4.(*SevenZip).Extract(0xac08240, {0x19e5ba8, 0xac341c0}, {0x19e1aa8, 0xc717e90}, {0x0, 0x0, 0x0}, 0xc2ebaf0)
:1 +0x85 fp=0xdec1c7c sp=0xdec1c44 pc=0x1190985
GServices/framework/system/fileapi.(*ArchiveReader).extractFiles(0x20f560c, {0x2b3c1aa8, 0xac08240}, 0xc717e90)
E:/data/landun/workspace/TxiOAClient/client-cross-platform/master/GServices/framework/system/fileapi/archiver.go:131 +0xfa fp=0xdec1ccc sp=0xdec1c7c pc=0x11cff7a
GServices/framework/system/fileapi.(*ArchiveReader).readArchiveFiles(0x20f560c, {0xca37a40, 0x64})
E:/data/landun/workspace/TxiOAClient/client-cross-platform/master/GServices/framework/system/fileapi/archiver.go:91 +0x26a fp=0xdec1d68 sp=0xdec1ccc pc=0x11cfb2a
GServices/framework/system/fileapi.(*ArchiveReader).ReadArchiveFiles(0x20f560c, {0xca37a40, 0x64})
E:/data/landun/workspace/TxiOAClient/client-cross-platform/master/GServices/framework/system/fileapi/archiver.go:67 +0x109 fp=0xdec1d8c sp=0xdec1d68 pc=0x11cf879
GServices/Plugin/Edr/cache.(*ArchiveAssocCache).OnCreate(0x20f5600, {0xca37a40, 0x64}, {0xb7c6180, 0x20})
E:/data/landun/workspace/TxiOAClient/client-cross-platform/master/GServices/Plugin/Edr/cache/archiveassoc.go:74 +0x177 fp=0xdec1e94 sp=0xdec1d8c pc=0x1306dd7
GServices/Plugin/Edr/event/policy/crossplatform.(*ArchivePolicy).createArchive(0xae45280, {0x19e9e54, 0xd8db750})
E:/data/landun/workspace/TxiOAClient/client-cross-platform/master/GServices/Plugin/Edr/event/policy/crossplatform/archivepolicy.go:43 +0x173 fp=0xdec1ef4 sp=0xdec1e94 pc=0x155d0d3
GServices/Plugin/Edr/event/policy/crossplatform.(*ArchivePolicy).EventHandler(0xae45280, {0x19e9e54, 0xd8db750})
E:/data/landun/workspace/TxiOAClient/client-cross-platform/master/GServices/Plugin/Edr/event/policy/crossplatform/archivepolicy.go:23 +0x31 fp=0xdec1f04 sp=0xdec1ef4 pc=0x155cf11
GServices/Plugin/Edr/event.(*fillEvent).eventPolicyChain(0xaf48060, {0x19e9e54, 0xd8db750}, {0x1847ba7, 0x9})
E:/data/landun/workspace/TxiOAClient/client-cross-platform/master/GServices/Plugin/Edr/event/fill.go:158 +0x1a9 fp=0xdec1f7c sp=0xdec1f04 pc=0x157d5a9
GServices/Plugin/Edr/event.(*fillEvent).eventHandler(0xaf48060, {0x1847ba7, 0x9})
E:/data/landun/workspace/TxiOAClient/client-cross-platform/master/GServices/Plugin/Edr/event/fill.go:131 +0x204 fp=0xdec1fe0 sp=0xdec1f7c pc=0x157d264
GServices/Plugin/Edr/event.(*fillEvent).Start.func2()
E:/data/landun/workspace/TxiOAClient/client-cross-platform/master/GServices/Plugin/Edr/event/fill.go:41 +0x39 fp=0xdec1ff0 sp=0xdec1fe0 pc=0x157c3a9
runtime.goexit()
c:/go/src/runtime/asm_386.s:1326 +0x1 fp=0xdec1ff4 sp=0xdec1ff0 pc=0xd1da61
created by GServices/Plugin/Edr/event.(*fillEvent).Start
E:/data/landun/workspace/TxiOAClient/client-cross-platform/master/GServices/Plugin/Edr/event/fill.go:41 +0xa3

runtime error: index out of range [0] with length 0

While processing a large amount of files, I encountered the following panic.

Panic occurred when reading archive     {"error": "runtime error: index out of range [0] with length 0"}

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x21335e2]

goroutine 94212267 [running]:
github.com/bodgit/sevenzip.readHeader({0x3bff3d8, 0xc1c15f9bc0})
        /home/user/go/pkg/mod/github.com/bodgit/[email protected]/types.go:832 +0x2c2
github.com/bodgit/sevenzip.(*Reader).init(0xc1af3e8d80, {0x7f9e8c1b5090?, 0xc1c53ea900}, 0x6d)
        /home/user/go/pkg/mod/github.com/bodgit/[email protected]/reader.go:379 +0x7d3
github.com/bodgit/sevenzip.NewReaderWithPassword({0x7f9e8c1b5090, 0xc1c53ea900}, 0x6d, {0x0, 0x0})
        /home/user/go/pkg/mod/github.com/bodgit/[email protected]/reader.go:242 +0x93
github.com/mholt/archiver/v4.SevenZip.Extract({0x0?, {0x0?, 0x0?}}, {0x3c1a238, 0xc1c53eae10}, {0x3be9340?, 0xc1c53ea900?}, {0x0, 0x0, 0x0}, ...)
        /home/user/go/pkg/mod/github.com/mholt/archiver/[email protected]/7z.go:76 +0x12b
github.com/trufflesecurity/trufflehog/v3/pkg/handlers.(*Archive).openArchive(0xc1c8bc3040, {0x3c1a2e0, 0xc1c433da40}, 0x0, {0x3be9340, 0xc1c53ea900}, 0xc1c15f9b60)
        /home/user/dev/github.com/trufflesecurity/trufflehog/pkg/handlers/archive.go:114 +0x2b3
github.com/trufflesecurity/trufflehog/v3/pkg/handlers.(*Archive).FromFile.func1()
        /home/user/dev/github.com/trufflesecurity/trufflehog/pkg/handlers/archive.go:76 +0x1af
created by github.com/trufflesecurity/trufflehog/v3/pkg/handlers.(*Archive).FromFile in goroutine 59328246
        /home/user/dev/github.com/trufflesecurity/trufflehog/pkg/handlers/archive.go:71 +0xe5

The issue appears to be caused by attempting to access the first (0) value of h.streamsInfo.subStreamsInfo.digest when it is empty. I don't know what the specific culprit was at the moment, but I'm sharing this now in case the issue/solution is obvious.

j := 0

sevenzip/types.go

Lines 832 to 833 in 8185d4f

h.filesInfo.file[i].CRC32 = h.streamsInfo.subStreamsInfo.digest[j]
_, h.filesInfo.file[i].UncompressedSize = h.streamsInfo.FileFolderAndSize(j)

"invalid memory address or nil pointer dereference" when opening 7z file

When attempting to open this file, a panic is encountered. (That repository has some great test cases in general.)

# Program based on  https://pkg.go.dev/github.com/bodgit/sevenzip?utm_source=godoc#example-OpenReader
$ go run main.go

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x59b4c8]

goroutine 1 [running]:
github.com/bodgit/sevenzip.readStreamsInfo({0x650d88, 0xc0001a41e0})
        /home/user/go/pkg/mod/github.com/bodgit/[email protected]/types.go:513 +0x1e8
github.com/bodgit/sevenzip.readHeader({0x650d88, 0xc0001a41e0})
        /home/user/go/pkg/mod/github.com/bodgit/[email protected]/types.go:800 +0x73
github.com/bodgit/sevenzip.(*Reader).init(0xc0001d02d8, {0x650700?, 0xc0001a20b8}, 0x27)
        /home/user/go/pkg/mod/github.com/bodgit/[email protected]/reader.go:379 +0x825
github.com/bodgit/sevenzip.OpenReaderWithPassword({0x5fe82e, 0x1d}, {0x0, 0x0})
        /home/user/go/pkg/mod/github.com/bodgit/[email protected]/reader.go:211 +0x36b
github.com/bodgit/sevenzip.OpenReader(...)
        /home/user/go/pkg/mod/github.com/bodgit/[email protected]/reader.go:228
main.main()
        /tmp/sevenzip/main.go:14 +0x3f
exit status 2

The issue appears to be with s.unpackInfo.folder here. Based on the commit associated with that archive, it's possible for an archive to contain streams but no subStreamsInfo.

if s.subStreamsInfo, err = readSubStreamsInfo(r, s.unpackInfo.folder); err != nil {

Fail to read .exe in .7z

When extracting a file containing an exe file,
the different reading methods for the reader will give different results and generate panic.

Use io.Copy will be fine.

	reader, _ := sevenzip.OpenReader(fp)
	for _, fileIn7z := range reader.File {
		r, _ := fileIn7z.Open()
		var buffer bytes.Buffer
		io.Copy(&buffer, r)
		defer reader.Close()
		data := buffer.Bytes()

		fmt.Println("Data size:", len(data))
		// use data to do something...
	}

Use ioutil.ReadAll will be ok in most cases.
But if the coming file is .exe, it will cause a panic: panic: runtime error: slice bounds out of range [:5] with capacity 3

	reader, _ := sevenzip.OpenReader(fp)
	for _, fileIn7z := range reader.File {
		r, _ := fileIn7z.Open()
		data, _ := ioutil.ReadAll(r)

		fmt.Println("Data size:", len(data))
		// use data to do something...
	}

Here is the complete test code.

package main

import (
	"bytes"
	"fmt"
	"io"
	"io/ioutil"
	"path"

	"github.com/bodgit/sevenzip"
)

func checkErr(err error) {
	if err != nil {
		panic(err)
	}
}

func readByIoutil(fp string) {
	fmt.Println("[readByIoutil] start")

	reader, err := sevenzip.OpenReader(fp)
	checkErr(err)
	for _, fileIn7z := range reader.File {
		filename := path.Base(fileIn7z.Name)
		fmt.Println("Processing", filename)
		reader, err := fileIn7z.Open()
		checkErr(err)
		data, err := ioutil.ReadAll(reader)
		checkErr(err)

		fmt.Println("Data size:", len(data))
		// use data to do something...
	}
	fmt.Println("[readByIoutil] finished\n")
}

func readByCopy(fp string) {
	fmt.Println("[readByCopy] start")
	reader, err := sevenzip.OpenReader(fp)
	checkErr(err)
	for _, fileIn7z := range reader.File {
		filename := path.Base(fileIn7z.Name)
		fmt.Println("Processing", filename)
		r, err := fileIn7z.Open()
		checkErr(err)
		var buffer bytes.Buffer
		_, err = io.Copy(&buffer, r)
		defer reader.Close()
		data := buffer.Bytes()

		fmt.Println("Data size:", len(data))
		// use data to do something...
	}
	fmt.Println("[readByCopy] finished\n")
}

func main() {
	fp := "putty.7z"

	readByCopy(fp)
	readByIoutil(fp)
}

And may get output like below

[readByCopy] start
Processing putty.exe
Data size: 1647912
[readByCopy] finished

[readByIoutil] start
Processing putty.exe
panic: runtime error: slice bounds out of range [:5] with capacity 3

goroutine 1 [running]:

I used putty.exe and compressed it into putty.7z as a test file, which is also attached here.
(putty is a common ssh tool used by windows user)

Source: https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html
64-bit x86: putty.exe

File information:

$ file putty.exe 
putty.exe: PE32+ executable (GUI) x86-64, for MS Windows
$ 7z a putty.7z putty.exe 

7-Zip [64] 17.05 : Copyright (c) 1999-2021 Igor Pavlov : 2017-08-28
p7zip Version 17.05 (locale=utf8,Utf16=on,HugeFiles=on,64 bits,8 CPUs LE)

Open archive: putty.7z
--
Path = putty.7z
Type = 7z
Physical Size = 816140
Headers Size = 122
Method = LZMA2:21 BCJ
Solid = -
Blocks = 1

Scanning the drive:
1 file, 1647912 bytes (1610 KiB)

Updating archive: putty.7z

Items to compress: 1

     
Files read from disk: 1
Archive size: 816140 bytes (798 KiB)
$ file putty.7z
putty.7z: 7-zip archive data, version 0.4

Attached File:
putty.7z.zip

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.