Giter Site home page Giter Site logo

mholt / archiver Goto Github PK

View Code? Open in Web Editor NEW
4.2K 53.0 372.0 451 KB

Easily create & extract archives, and compress & decompress files of various formats

Home Page: https://pkg.go.dev/github.com/mholt/archiver/v4

License: MIT License

Go 100.00%
tar extract zip gzip xz golang rar lz4 bzip2 archives snappy zstandard brotli go compression decompression streaming streams 7zip

archiver's Introduction

archiver Go Reference Ubuntu-latest Macos-latest Windows-latest

Introducing Archiver 4.0 - a cross-platform, multi-format archive utility and Go library. A powerful and flexible library meets an elegant CLI in this generic replacement for several platform-specific or format-specific archive utilities.

⚠️ v4 is in ALPHA. The core library APIs work pretty well but the command has not been implemented yet, nor have most automated tests. If you need the arc command, stick with v3 for now.

Features

  • Stream-oriented APIs
  • Automatically identify archive and compression formats:
    • By file name
    • By header
  • Traverse directories, archive files, and any other file uniformly as io/fs file systems:
  • Compress and decompress files
  • Create and extract archive files
  • Walk or traverse into archive files
  • Extract only specific files from archives
  • Insert (append) into .tar and .zip archives
  • Read from password-protected 7-Zip files
  • Numerous archive and compression formats supported
  • Extensible (add more formats just by registering them)
  • Cross-platform, static binary
  • Pure Go (no cgo)
  • Multithreaded Gzip
  • Adjust compression levels
  • Automatically add compressed files to zip archives without re-compressing
  • Open password-protected RAR archives

Supported compression formats

  • brotli (.br)
  • bzip2 (.bz2)
  • flate (.zip)
  • gzip (.gz)
  • lz4 (.lz4)
  • lzip (.lz)
  • snappy (.sz)
  • xz (.xz)
  • zlib (.zz)
  • zstandard (.zst)

Supported archive formats

  • .zip
  • .tar (including any compressed variants like .tar.gz)
  • .rar (read-only)
  • .7z (read-only)

Tar files can optionally be compressed using any compression format.

Command use

Coming soon for v4. See the last v3 docs.

Library use

$ go get github.com/mholt/archiver/v4

Create archive

Creating archives can be done entirely without needing a real disk or storage device since all you need is a list of File structs to pass in.

However, creating archives from files on disk is very common, so you can use the FilesFromDisk() function to help you map filenames on disk to their paths in the archive. Then create and customize the format type.

In this example, we add 4 files and a directory (which includes its contents recursively) to a .tar.gz file:

// map files on disk to their paths in the archive
files, err := archiver.FilesFromDisk(nil, map[string]string{
	"/path/on/disk/file1.txt": "file1.txt",
	"/path/on/disk/file2.txt": "subfolder/file2.txt",
	"/path/on/disk/file3.txt": "",              // put in root of archive as file3.txt
	"/path/on/disk/file4.txt": "subfolder/",    // put in subfolder as file4.txt
	"/path/on/disk/folder":    "Custom Folder", // contents added recursively
})
if err != nil {
	return err
}

// create the output file we'll write to
out, err := os.Create("example.tar.gz")
if err != nil {
	return err
}
defer out.Close()

// we can use the CompressedArchive type to gzip a tarball
// (compression is not required; you could use Tar directly)
format := archiver.CompressedArchive{
	Compression: archiver.Gz{},
	Archival:    archiver.Tar{},
}

// create the archive
err = format.Archive(context.Background(), out, files)
if err != nil {
	return err
}

The first parameter to FilesFromDisk() is an optional options struct, allowing you to customize how files are added.

Extract archive

Extracting an archive, extracting from an archive, and walking an archive are all the same function.

Simply use your format type (e.g. Zip) to call Extract(). You'll pass in a context (for cancellation), the input stream, the list of files you want out of the archive, and a callback function to handle each file.

If you want all the files, pass in a nil list of file paths.

// the type that will be used to read the input stream
format := archiver.Zip{}

// the list of files we want out of the archive; any
// directories will include all their contents unless
// we return fs.SkipDir from our handler
// (leave this nil to walk ALL files from the archive)
fileList := []string{"file1.txt", "subfolder"}

handler := func(ctx context.Context, f archiver.File) error {
	// do something with the file
	return nil
}

err := format.Extract(ctx, input, fileList, handler)
if err != nil {
	return err
}

Identifying formats

Have an input stream with unknown contents? No problem, archiver can identify it for you. It will try matching based on filename and/or the header (which peeks at the stream):

format, input, err := archiver.Identify("filename.tar.zst", input)
if err != nil {
	return err
}
// you can now type-assert format to whatever you need;
// be sure to use returned stream to re-read consumed bytes during Identify()

// want to extract something?
if ex, ok := format.(archiver.Extractor); ok {
	// ... proceed to extract
}

// or maybe it's compressed and you want to decompress it?
if decom, ok := format.(archiver.Decompressor); ok {
	rc, err := decom.OpenReader(unknownFile)
	if err != nil {
		return err
	}
	defer rc.Close()

	// read from rc to get decompressed data
}

Identify() works by reading an arbitrary number of bytes from the beginning of the stream (just enough to check for file headers). It buffers them and returns a new reader that lets you re-read them anew.

Virtual file systems

This is my favorite feature.

Let's say you have a file. It could be a real directory on disk, an archive, a compressed archive, or any other regular file. You don't really care; you just want to use it uniformly no matter what it is.

Use archiver to simply create a file system:

// filename could be:
// - a folder ("/home/you/Desktop")
// - an archive ("example.zip")
// - a compressed archive ("example.tar.gz")
// - a regular file ("example.txt")
// - a compressed regular file ("example.txt.gz")
fsys, err := archiver.FileSystem(filename)
if err != nil {
	return err
}

This is a fully-featured fs.FS, so you can open files and read directories, no matter what kind of file the input was.

For example, to open a specific file:

f, err := fsys.Open("file")
if err != nil {
	return err
}
defer f.Close()

If you opened a regular file, you can read from it. If it's a compressed file, reads are automatically decompressed.

If you opened a directory, you can list its contents:

if dir, ok := f.(fs.ReadDirFile); ok {
	// 0 gets all entries, but you can pass > 0 to paginate
	entries, err := dir.ReadDir(0)
	if err != nil {
		return err
	}
	for _, e := range entries {
		fmt.Println(e.Name())
	}
}

Or get a directory listing this way:

entries, err := fsys.ReadDir("Playlists")
if err != nil {
	return err
}
for _, e := range entries {
	fmt.Println(e.Name())
}

Or maybe you want to walk all or part of the file system, but skip a folder named .git:

err := fs.WalkDir(fsys, ".", func(path string, d fs.DirEntry, err error) error {
	if err != nil {
		return err
	}
	if path == ".git" {
		return fs.SkipDir
	}
	fmt.Println("Walking:", path, "Dir?", d.IsDir())
	return nil
})
if err != nil {
	return err
}

Use with http.FileServer

It can be used with http.FileServer to browse archives and directories in a browser. However, due to how http.FileServer works, don't directly use http.FileServer with compressed files; instead wrap it like following:

fileServer := http.FileServer(http.FS(archiveFS))
http.HandleFunc("/", func(writer http.ResponseWriter, request *http.Request) {
	// disable range request
	writer.Header().Set("Accept-Ranges", "none")
	request.Header.Del("Range")
	
	// disable content-type sniffing
	ctype := mime.TypeByExtension(filepath.Ext(request.URL.Path))
	writer.Header()["Content-Type"] = nil
	if ctype != "" {
		writer.Header().Set("Content-Type", ctype)
	}
	fileServer.ServeHTTP(writer, request)
})

http.FileServer will try to sniff the Content-Type by default if it can't be inferred from file name. To do this, the http package will try to read from the file and then Seek back to file start, which the libray can't achieve currently. The same goes with Range requests. Seeking in archives is not currently supported by archiver due to limitations in dependencies.

If content-type is desirable, you can register it yourself.

Compress data

Compression formats let you open writers to compress data:

// wrap underlying writer w
compressor, err := archiver.Zstd{}.OpenWriter(w)
if err != nil {
	return err
}
defer compressor.Close()

// writes to compressor will be compressed

Decompress data

Similarly, compression formats let you open readers to decompress data:

// wrap underlying reader r
decompressor, err := archiver.Brotli{}.OpenReader(r)
if err != nil {
	return err
}
defer decompressor.Close()

// reads from decompressor will be decompressed

Append to tarball and zip archives

Tar and Zip archives can be appended to without creating a whole new archive by calling Insert() on a tar or zip stream. However, for tarballs, this requires that the tarball is not compressed (due to complexities with modifying compression dictionaries).

Here is an example that appends a file to a tarball on disk:

tarball, err := os.OpenFile("example.tar", os.O_RDWR, 0644)
if err != nil {
	return err
}
defer tarball.Close()

// prepare a text file for the root of the archive
files, err := archiver.FilesFromDisk(nil, map[string]string{
	"/home/you/lastminute.txt": "",
})

err := archiver.Tar{}.Insert(context.Background(), tarball, files)
if err != nil {
	return err
}

The code is similar for inserting into a Zip archive, except you'll call Insert() on the Zip type instead.

archiver's People

Contributors

fcharlie avatar giuliocomi avatar halfcrazy avatar ibraimgm avatar illiliti avatar iotanbo avatar jabgibson avatar jacalz avatar jandubois avatar jhwz avatar johnarok avatar jservice-rvbd avatar klauspost avatar kross9924 avatar mholt avatar ncw avatar nmiyake avatar petemoore avatar railsmechanic avatar rathann avatar sheltonzhu avatar shubham1172 avatar songmu avatar sorairolake avatar szymongib avatar tatsushid avatar tw4452852 avatar vansante avatar weidideng avatar weingart avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

archiver's Issues

Add possiblity to only extract some files of an archive

Hi,

I need to only extract 1 file (no more that 10KB, that I made it be first in the archive) of a very big .tar archive (more than 1GB).

Could you add a way to provide a list of file that we want to extract without extracting every files.

Thanks.

Directory traversal through symlink

Archiver is not careful enough when unpacking tar archives that contain symlinks. It will happily write over a symlink it previously created. This could cause directory traversal.

Proof of concept:

$ wget -q https://github.com/jwilk/path-traversal-samples/releases/download/0/symlink.tar -O traversal.tar

$ tar -tvvf traversal.tar 
lrwxrwxrwx root/root         0 2018-06-05 16:55 moo -> /tmp/moo
-rw-r--r-- root/root         4 2018-06-05 16:55 moo

$ pwd
/home/jwilk

$ ls /tmp/moo
ls: cannot access '/tmp/moo': No such file or directory

$ archiver open traversal.tar 

$ ls /tmp/moo
/tmp/moo

Tested with git master (e4ef56d).

Support for Multi-part Archives

I'm attempting to extract rar files that are multi-part (.rar .r01 r.02 etc). Using the unrar tools available out there, I just pass them a single file name (the .rar file) and they extract all the subsequent archives. I'm using archiver as a library and calling archiver.Rar.Open(file, path) then receive this error: writing file: rardecode: archive continues in next volume

Am I missing something, or is this support missing? Thanks!

Support for avoiding overwrite upon opening archive

Hi @mholt, this is a great tool/library!

I've found it useful to avoid overwriting files if they already exist when opening an archive to a folder. I'm wondering if you'd like to incorporate this as an option to your library? Here's my change: schollz@8b912ca. Of course, it would need to be amended so it can be toggled as optional.

Error when attempting to extract from git generated gzipped tarball

Hi there.

I have a gzipped tarball generated from git (specifically using the github API).
When I attempt to Open it using archiver.TarGz.Open I get this error: "pax_global_header: unkown type flag: g"

I searched for "pax_global_header" and I learned it seems to be an extended header that holds the git commit ID. I'm running OSX 10.12.

How should I go about investigating further? Is this known? Has anyone else had an issue?

Thanks for the nice utility.
Cheers!

Extract Zip archives on windows that contains /

I am trying to extract a zip file on Windows that seems to be created on Linux.
I think it does not work because inside the archive / are used and in sanitizeExtractPath filepath.Join convert it to \ which cause a mismatch in the check with strings.HasPrefix.

I have created a simple piece of code that can reproduce the issue (only on windows):

package main

import (
	"github.com/mholt/archiver"
	"io"
	"log"
	"net/http"
	"os"
)

var url = "https://dl.google.com/go/go1.10.4.windows-386.zip"

func getTestZip(fileName string) {
	log.Print("Start download")
	out, err := os.Create(fileName)
	if err != nil {
		log.Fatal(err)
	}
	defer out.Close()

	resp, err := http.Get(url)
	if err != nil {
		log.Fatal(err)
	}
	defer resp.Body.Close()

	_, err = io.Copy(out, resp.Body)
	if err != nil {
		log.Fatal(err)
	}
}

func main() {
	tmpZip := "test.zip"

	if _, err := os.Stat(tmpZip); os.IsNotExist(err) {
		getTestZip(tmpZip)
	}

	log.Print("Start extract")
	err := archiver.Zip.Open(tmpZip, "tmp/")
	if err != nil {
		log.Fatal(err)
	}
	log.Print("Finished")
}

which will output:

2018/09/02 15:04:05 Start download
2018/09/02 15:04:19 Start extract
2018/09/02 15:04:19 go/: illegal file path

I used go 1.11 on windows 10.

bug!!!!!!!

打包成gzip压缩包时,目录过长情况会出错。在windows打包,放在linux解压时会有问题
(Packaged into gzip compressed package, the directory is too long will be wrong. Packaged in windows, there will be problems on the linux decompression)

Zip output path set to non-existing directory casues failure

This may be able to be a non-issue, but I was expecting that when I passed a directory location that did not yet exist, the directory would get created and the zip would still get output to that path.

err := archiver.Zip.Make(outputPath, files)

Instead, it returns an error:

error creating [outputPath]: open [outputPath]: The system cannot find the path specified.

I created the output directory structure and the previous code worked fine.

When zipping using a desktop application, you can provide an nonexisting output path and the directories will be created for you. So, who should be the owner of creating the directories?

I'm on Win10 x64

Problems installing on macOS

  • go version go1.11.2 darwin/amd64
  • macOS 10.14.1 Mojave

Hello, I have just tried to run go get -u github.com/mholt/archiver/cmd/archiver and it comes back with...

package github.com/mholt/archiver/cmd/archiver: cannot find package "github.com/mholt/archiver/cmd/archiver" in any of:
	/usr/local/go/src/github.com/mholt/archiver/cmd/archiver (from $GOROOT)
	/Users/carlca/go/src/github.com/mholt/archiver/cmd/archiver (from $GOPATH)

I have Modules set to GO111MODULE=auto.

Any idea what the problem is?

Proposal: regex matching and parallel Open for multiple files support

Hello !
Actually i'm became a frequent user of archiver since its totally generalist and efficient. The case of openning huge archives or several archives at once is bit painfull which leaded me to think about these improvements.

Regex matching:
Archiving or reading archive files using simple regex match

archiver open -r *.zip mydestination

Multiple targets (for regex expressions and simple file names)
for files:

archiver open file1.zip file2.tar mydestination

for regex

archiver open -r *.zip *.tar mydestination

distinct destinations

# simple
archiver open file1.zip -d mydest1 file2.zip -d mydest2 
# regex
archiver open -r *.zip -d mydest1 *.tar -d mydest2

parralel openning

archiver --parallel --cores=2 open -r *.zip *.tar mydestination

All the examples above for multiple files Open situations. For making archives, it's not too usefull to add parralel support but regex matching might be interessting.

archiver make -r myarchive.zip *.png

Specifications
Actually archiver is built on simple command line parser, implementing all this features might need to use Cobra or any other command line library.

Would be glad to hear what do you think and help implementing these features !

File descriptor limit

First of all big thanks for this great little helper. It makes writing replacements for shell scripts a real joy :)

One thing I ran into, though, is that the current implementation of for instance the Zip extractor seems to leak some file descriptors. Especially for larger archives this can lead to too many file descriptors being open at the same time. In such a situation I received something like that as error:

creating new file: open path/to/a/file: too many open files in system

I haven't yet found the dangling descriptors yet :(

Add tags more often

Hi. Can you please add tags more often? It's for dependency management. The last one was over a year ago and if I want to use latest features I have to use commit hashes in my Gopkg.toml.

Cannot find package 'github.com/pierrec/lz4'

Hey @mholt

There seem to be some issues with one dependency of this package: github.com/pierrec/lz4.

github.com/pierrec/lz4 (download) package github.com/pierrec/lz4/v2/internal/xxh32: cannot find package "github.com/pierrec/lz4/v2/internal/xxh32" in any of:         /usr/lib/go-1.10/src/github.com/pierrec/lz4/v2/internal/xxh32 (from $GOROOT)         /home/user001/go;/home/user001/git/filebrowser.github/src/github.com/pierrec/lz4/v2/internal/xxh32 (from $GOPATH)
cmd/server.go:14:4:error: could not import github.com/filebrowser/filebrowser/lib/http (type-checking package "github.com/filebrowser/filebrowser/lib/http" failed (/go/src/github.com/filebrowser/filebrowser/lib/http/download.go:12:2: could not import github.com/mholt/archiver (type-checking package "github.com/mholt/archiver" failed (/go/src/github.com/mholt/archiver/tarlz4.go:9:2: could not import github.com/pierrec/lz4 (type-checking package "github.com/pierrec/lz4" failed (/go/src/github.com/pierrec/lz4/reader.go:9:2: could not import github.com/pierrec/lz4/v2/internal/xxh32 (cannot find package "github.com/pierrec/lz4/v2/internal/xxh32" in any of: (gotype)

The directory v2 of their package seems to be missing.

Error when trying to zip directory

Windows 10 issue

zip.Make comment states that it handles filePaths being a directory. However, on use it os.stat (line 57 of zip.go) fails for paths.

PR: Reader and Writer support

Hi,
Please consider to provide exported functions having Reader / Writer as argument instead of file names too (where possible). It would make the library more flexible by enabling streaming, buffering, in-memory use-cases. Thanks.

Error when trying to unrar archive with subfolders

Here is an example of the file:

test.zip (and test.rar)
_______/testfolder/
________________file1.jpg
________________file2.jpg

When using archive.Unzip, it behaves as expected -- creates the subfolders as they are in the zip archive:
err := archiver.Unzip("c:\\tmp\\test.zip", "c:\\tmp\\test\\")

However when running the same command to unrar, it appears as though it's not able to Mkdir the subfolders.
err := archiver.Unrar("c:\\tmp\\test.rar", "c:\\tmp\\test\\")

Error:

c:\tmp\test\testfolder\file1.jpg: creating new file: open c:\tmp\test\testfolder\file1.jpg: The system cannot find the path specified.

If it helps, line 41 of rar.go is checking
if header.IsDir but it seems to evaluate as false, since technically the path includes both the folder and the filename (I'm assuming).

Potential Incompatibility with CodeDeploy

Having used this library to zip files for use in AWS CodeDeploy, I noticed a strange bug:
All .png files extracted by CodeDeploy have size 0 bytes. I suspect this is the same for all compressed formats. I am using the latest version of both tools.

I fixed it by removing all extensions from the compressedFormats map in zip.go.

There could be some sort of incompatibility between this archiver and the one used by CodeDeploy. I couldn't not investigate this any further due to lack of time, so I apologize for the brief explanation.

Hope this helps.

Trying to untar a file on Mac causes pointer issues

./main_darwin_amd64.go:80:24: invalid method expression archiver.TarGz.Open (needs pointer receiver: (*archiver.TarGz).Open)

the following snippet was used on darwin (macos):

archiver.TarGz.Open(input, installDirPath())

the zip equivalent works on windows

archiver.Zip.Open(input, installDirPath())

go modules support

could you please add a tag v2.0.0 so that it is semantic version compatible?

Error unpacking large archive writing file: rardecode: ...

I have problems with unpacking a large rar archive, the archive itself takes about 6GB and unpacked 25GB. The archive contains XML files with a database of Russian addresses.
You can download it from the official site http://fias.nalog.ru or from the direct link http://fias.nalog.ru/Public/Downloads/Actual/fias_xml.rar
The full error text
ERROR: 2017/10/12 00:37:10 unpack.go:12: Error unpack archive: data\AS_HOUSE_20171008_bed24a8e-4646-448d-acb8-8de765818389.XML: writing file: rardecode: decoder expected more data than is in packed file
The function I'm using is

package unpack
import (
	"github.com/mholt/archiver"
	"os"
	"../loger"
)
func Unpack(dest string, archive string) {
	err := archiver.Rar.Open(archive, dest)
	if err != nil {
		loger.Error.Printf("Error unpack archive: %v\n", err)
		os.Exit(1)
	}
}

I'm new to Go, you can tell me what the problem is.

ISO/Squashfile

as a docker drop in

ISO_URL=

filter to trash iso , get squashfs and unpack @ / would be a nice to have so far i have a slower python script nearly done , but ... as a lib to do that could make re-baking live iso's faster in docker..
go compiled right tends to be mounds faster.

ability to iso dump and squashfile also nice to have.

New interface

With your last change, the interface changed and I am trying to move from archiver.Tar.Make(... to the new interface, but not having any luck. A bit of a Go novice, so please bare with me.

"github.com/mholt/archiver"
...
err := archiver.Archiver.Archive([]string{
		"src",
		"test",
		"help"
	}, "/tmp/some.tar")

gives me:

./main.go:32:34: not enough arguments in call to method expression archiver.Archiver.Archive
	have ([]string, string)
	want (archiver.Archiver, []string, string)

Allow relative symlinks

At the moment my relative symlinks are converted to absolute symlinks when using TarGz, this breaks the code I'm archiving.

symlinks target file err fo tar

tar.go 131
header, err := tar.FileInfoHeader(info, path)

link, e:= os.Readlink(path)
if e ==nil { path=link }
header, err := tar.FileInfoHeader(info, path)

include tar.xz (de)compression

Hi..I think than it could be possible using https://godoc.org/xi2.org/x/xz

package main

import (
    "archive/tar"
    "fmt"
    "io"
    "log"
    "os"

     "xi2.org/x/xz"
 )

func main() {
    // Open a file
    f, err := os.Open("myfile.tar.xz")
    if err != nil {
        log.Fatal(err)
    }
    // Create an xz Reader
    r, err := xz.NewReader(f, 0)
    if err != nil {
        log.Fatal(err)
    }
    // Create a tar Reader
    tr := tar.NewReader(r)
    // Iterate through the files in the archive.
    for {
        hdr, err := tr.Next()
        if err == io.EOF {
            // end of tar archive
            break
        }
        if err != nil {
            log.Fatal(err)
        }
        switch hdr.Typeflag {
        case tar.TypeDir:
            // create a directory
            fmt.Println("creating:   " + hdr.Name)
            err = os.MkdirAll(hdr.Name, 0777)
            if err != nil {
                log.Fatal(err)
            }
        case tar.TypeReg, tar.TypeRegA:
            // write a file
            fmt.Println("extracting: " + hdr.Name)
            w, err := os.Create(hdr.Name)
            if err != nil {
                log.Fatal(err)
            }
            _, err = io.Copy(w, tr)
            if err != nil {
                log.Fatal(err)
            }
            w.Close()
        }
    }
    f.Close()
}

can u check this?..thanks!

targz Create method stucks in infinite call.

in targz.go

// Create opens txz for writing a compressed
// tar archive to out.
func (tgz *TarGz) Create(out io.Writer) error {
	tgz.wrapWriter()
	return tgz.Create(out)
}

is a recursive call, which won't work.
it should be

// Create opens txz for writing a compressed
// tar archive to out.
func (tgz *TarGz) Create(out io.Writer) error {
	tgz.wrapWriter()
	return tgz.Tar.Create(out)
}

output file dir same to compress dir

I want to compress all the file the current directory, output to the current directory; the result is that untar the tar file, it contains a tar file that has a same tar-filename.

Wrong directory structure

  1. compress two directories(/usr/local/bin","/usr/local/etc") to example.tar.gz
    $ tar -ztf example.tar.gz
    bin/
    bin/aria2c
    bin/awk
    ...
    bin/zipdetails
    etc/
    etc/GeoIP.conf
    ...
    etc/nginx/
    etc/nginx/fastcgi.conf

  2. extract example.tar.gz to /tmp/

Current directory structure.
.
├── bin
└── etc

Expected directory structure.
.
└── usr
└── local
├── bin
└── etc

Can't Unzip Folder Entries From Windows Built Zip Archives

This issue seems related to: #61 however I decided to create a new issue in case my hunch was incorrect.

Relevant Info

  • OS - win 10 1803
  • Zipping tools tested: Powershell 5.1 Compress-Archive and the integrated system.io.compression.filesystem Zipfile class (that powershell also uses under the hood).
  • Archiver versions tested - release 2.0 and current master branch

Observed High Level Issue

  • When attempting to unzip a file (using archiver that has been embedded in another utility) I get errors like C:\Users\lg\temp\testfoldera\0\1\a: making directory for file: mkdir C:\Users\lg\temp\testfoldera\0: The system cannot find the path specified. And can see that in the directory structure that should have been created there are "blank files" instead of some directories (aka files with no extension that in the actual zip are also directories).
  • The issue can be recreated using just the Archiver utility as well.

Current Theory

  • Line 195 of zip.go needs to be updated to support Windows styled path endings: https://github.com/mholt/archiver/blob/master/zip.go#L195
  • It seems like what's happening is that any time you have a directory entry in the zip this line will check using the suffix heuristic and create the dir as a dir rather than a file. However, because on windows directories are stored with \ endings (depending on util, i did a very quick test with 7zip which seemed to force everything to unix style), this heuristic fails.
  • The consequences here seem to usually be pretty benign b/c the only "normal" time I could find to have folder entries in the zip was if they were going to be a leaf node in the file-system. Otherwise the path was always implicit in the remaining files.
    • HOWEVER the new powershell compress-archive makes some really weird zip structures that do have folder entries strewn about and make the issue much more prevalent. Not 100% sure but it feels like whenever you have a folder that only contains only folders it will perform this behavior.

Minimum Test Case

  • make a folder and in it create another folder along with a small file. The point here is really you just need some bits on disk, zip utilities seem to try and get cute if there's not actually any data you're trying to zip.
  • Zip the folders. Then try to unzip with archiver open, notice it's the inner folder is now file.

Thoughts

  • In general it feels like the root issue here is the heuristic of looking for a char suffix to indicate dir type isn't as robust as desired.
  • Unfortunately, I couldn't get the isDir() function to work listed here: https://golang.org/pkg/os/#FileMode.IsDir which seems like the obvious solution if it did work.
    • Although, this perhaps could be because the utilities that create the zips don't honor the isDir bit in the zip file format.
  • If there isn't a working method like isDir perhaps something like os.PathSeparator could get halfway there.
  • updating the line in question to if strings.HasSuffix(zf.Name, "/") || strings.HasSuffix(zf.Name, "\\") { did solve my use-case.

Latest release version tag not updated

Go packaging tools that work with versions (e.g. dep) are picking up the release version v2.0.0 from 2016. The latest code needs to be packaged in a release version that is above 2.0.0 so that it can work with the tools. Is this possible to do ASAP?

support excluding directories

I'd like to make a PR to exclude certain directories when archiving.

So we just need to pass down a slice of excluded paths to filepath.Walk, then return filepath.SkipDir if the path matches any of the excluded path?

What do you think?

Compressing tarball on Windows

When compressing a entire folder with
archiver.TarGz.Make("myfolder.tar.gz", []string{"./myfolder"})

If i open that arhive in Windows it will look great i can go through the directory structure... However when i extract this on linux it wil create filenames like this:
"picons\picons\france\1_0_1_EB2_AAFF_7EB6_0_0_0_0.png"
and empty folders named this:
"picons\picons\france"

That is not paths that is the actual name of the file....

Feature request: (de)compression only (single-file archives)

I've been using archiver for a while for a tool that downloads releases from github (and other places) and automatically unpacks them to make their commands available for running. Archiver has worked out very well so far.

Today, I wanted to add restic to my tool, and it's downloads are only bzip2 compressed, as opposed to being tar'd first. This is not supported by archiver.

It looks like it would be possible to add support for archives of one file that are only compressed (gz or bzip2). I am wondering, would this support be in line with the goals of the project and likely be accepted? I want to know before investing more effort.

Thanks for making such a useful library.

Proposal: Rewrite the entire package. Design doc discussion

UPDATE: Done in pull request #99

Having been using this package for a few years now, I've encountered a number of issues that lead me to want to redesign this package entirely: burn it down and start over, copying only the fundamental parts of the code, and not worrying about backwards compatibility.

Some specific issues I've experienced:

  • Too much magic. Recently I spent a day debugging an problem where a .zip file couldn't be extracted every time with archiver. Sometimes it would work, sometimes it wouldn't. I eventually discovered that it's because archiver determines which extractor to use based on the extension and the file header while iterating through a map of formats (which is not ordered). If the Zip format came first, it matched by extension but failed to extract; if the TarGz format came first, it matched by file header (because it was actually a tar.gz file), and extraction succeeded.

  • Weak API. Apparently I was able to accidentally create a .tar.gz file with the Zip archiver, because the name I built for the file was not attached to which archiver format I was using. I can do archiver.TarGz.Make("file.zip") without errors, which is bad. Here's the code that led to my bug in the first place (notice the missing . in "zip"):

var a archiver.Archiver = archiver.TarGz
if filepath.Ext(outputFile) == "zip" {
	a = archiver.Zip
}

^ Bad package design.

  • Not enough customizability. Namely: compression level; whether to include a top-level named folder vs. just its contents), similar to how rsync works based on presence of a trailing slash; and whether to overwrite existing files when outputting.

  • Lack of native streaming capabilities. Recently there were From a library perspective, I should be able to stream in a zip file and spit out individual files, or stream in individual files (or a list of filenames?) and spit out a single zip file.

  • There is no true cross-platform native solution to zip-slip (yet). I had to disable the "security feature" that prevented me extracting a perfectly safe archive. Even "SecureJoin" solutions don't cut it (read the linked thread, and its linked threads). For now, these "mitigations" only get in the way.

  • Not enough power to inspect archives or cherry-pick files. It would be helpful to be able to work with archives' contents without performing an extraction, such as getting listings, or filtering which files are extracted, etc.

General solutions:

  • When possible (almost always), match only by file header and ignore file extension. If the file contents are not (yet) available, then use extension but only after a warning or explicit opt-in. Or, (maybe "Also,") require that the file extension, where present, matches the format when creating an archive.

  • Be verbose in the error messages; if doing any magic, report it or make the magic explicitly opt-in either with a configuration parameter or a special high-level API that is documented as being magical, which wraps the underlying, concrete, explicit functions.

  • Couple the file extension to the archiver. For example: don't allow the Zip archiver to make a .tar.gz file. For example, the buggy code above could have been avoided with something more like this: archiver.Make(outputFile, files...) which uses the extension of outputFile to force a format that matches.

  • Expand the API so that an archiver is created for a specific format before being used, rather than having hard-coded globals like archiver.Zip like we do now. This will allow more customization too. Imagine zipArch := archiver.Zip{CompressionLevel: 10} or something similar.

  • Be explicit about our threat model, which is being adjusted, to state that the files are expected to be trusted, i.e. don't download files you don't trust. Maybe it is possible to inspect a file before extracting it to know whether it could be malicious (e.g. look for zip-slip patterns in file names), but I am not sure about that yet.

  • Moar interfaces. We have one, Archiver, but we might need more, to accommodate an expanded design with more features. Small interfaces are the best.

  • Rename the package to archive. (Decided to keep it the same)

This issue is to track the discussion about the new design; work will hopefully begin soon, as I can find the time.

Travis build failing archiver not backwards compatible

Travis Build log(shortened to show relevant information):
go version go1.7.1 linux/amd64
...
gostuff/util.go:26: cannot call non-function archiver.Zip (type archiver.zipFormat)
gostuff/util.go:36: undefined: archiver.Unzip

Line 26 in util.go:
err := archiver.Zip(destination, source)
Line 36 in util.go:
err := archiver.Unzip(source, destination)

The travis build was successful last night, just pushed a commit tonight and noticed the travis build failing. I tested on my development and production environments and I do not encounter this error. I did not modify any code recently in my util.go file so I can only guess that this archiver package made an update within the past 24 hours that was not backward compatible.

The link to the travis build can be seen here https://travis-ci.org/jonpchin/GoChess/builds/166621466

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.