Giter Site home page Giter Site logo

martinellimarco / t2sz Goto Github PK

View Code? Open in Web Editor NEW
34.0 4.0 0.0 586 KB

Compress a file into a seekable zstd with special handling for .tar archives

License: GNU General Public License v3.0

CMake 6.64% C 85.76% Shell 7.60%
compression zstd seekable tools

t2sz's Introduction

Build Status License t2sz AUR version

t2sz

It compress a file into a seekable zstd splitting the file into multiple frames.

If the file is a tar archive it compress each file in the archive into an independent frame, hence the name: tar 2 seekable zstd.

It operates in two modes. Tar archive mode and raw mode.

By default it runs in tar archive mode for files ending with .tar, unless -r is specified.

For all other files it runs in raw mode.

In tar archive mode it compress the archive keeping each file in a different frame, unless -s or -S is used.

This allows fast seeking and extraction of a single file without decompressing the whole archive.

When -s SIZE is used in tar mode, if the size of the file being compressed into a block is less than SIZE then another one will be added in the same block, and so on until the sum of the sizes of all files packed together is at least SIZE. A file will be never spltted as SIZE is just a minimum value.

When -s SIZE is used in raw mode then it defines exactly the input block size and bigger inputs will be split in blocks of this size accordingly. If there isn't enough input data the last block will be smaller.

When -S SIZE is used, files bigger than SIZE will be splitted in blocks of SIZE length. It is available only in tar mode and ignored in raw mode.

The compressed archive can be decompressed with any Zstandard tool, including zstd.

To take advantage of seeking see the following projects:

Build

You'll need libzstd-dev

sudo apt install libzstd-dev
git clone https://github.com/martinellimarco/t2sz
mkdir t2sz/build
cd t2sz/build
cmake .. -DCMAKE_BUILD_TYPE="Release"
make

Install with

sudo make install

Or if you want a debian package you can run

cpack

then install it with

sudo dpkg -i t2sz*.deb

Usage

Usage: t2sz [OPTIONS...] [TAR ARCHIVE]

Examples:
        t2sz any.file -s 10M                        Compress any.file to any.file.zst, each input block will be of 10M
        t2sz archive.tar                            Compress archive.tar to archive.tar.zst
        t2sz archive.tar -o output.tar.zst          Compress archive.tar to output.tar.zst
        t2sz archive.tar -o /dev/stdout             Compress archive.tar to standard output

Options:
        -l [1..22]         Set compression level, from 1 (lower) to 22 (highest). Default is 3.
        -o FILENAME        Output file name.
        -s SIZE            In raw mode: the exact size of each input block, except the last one.
                           In tar mode: the minimum size of an input block, in bytes.
                                        A block is composed by one or more whole files.
                                        A file is never truncated unless -S is used.
                                        If not specified one block will contain exactly one file, no matter the file size.
                                        Each block is compressed to a zstd frame but if the archive has a lot of small files
                                        having a file per block doesn't compress very well. With this you can set a trade off.
                           The greater is SIZE the smaller will be the archive at the expense of the seek speed.
                           SIZE may be followed by the following multiplicative suffixes:
                               k/K/KiB = 1024
                               M/MiB = 1024^2
                               G/GiB = 1024^3
                               kB/KB = 1000
                               MB = 1000^2
                               GB = 1000^3
        -S SIZE            In raw mode: it is ignored.
                           In tar mode: the maximum size of an input block, in bytes.
                           Unlike -s this option may split big files in smaller chuncks.
                           Remember that each block is compressed independently and a small value here will result in a bigger archive.
                           -S can be used together with -s but MUST be greater or equal to it's value.
                           If -S and -s are equal the input block will be of exactly that size, if there is enough input data.
                           Like -s SIZE may be followed by one of the multiplicative suffixes described above.
        -T [1..N]          Number of thread to spawn. It improves compression speed but cost more memory. Default is single thread.
                           It requires libzstd >= 1.5.0 or an older version compiler with ZSTD_MULTITHREAD.
                           If `-s` or `-S` are too small it is possible that a lower number of threads will be used.
        -r                 Raw mode or non-tar mode. Treat tar archives as regular files, without any special handling.
        -j                 Do not generate a seek table.
        -v                 Verbose. List the elements in the tar archive and their size.
        -f                 Overwrite output without prompting.
        -h                 Print this help.
        -V                 Print the version.

License

See LICENSE

Release

Debian-based

Download the latest stable source code or .deb from the release page. This is the raccomanded version.

Arch-based

Check out t2sz on AUR or pamac build t2sz

Snap

For your convenience you can install the latest release from the snap store but beware that it is distributed in strict mode and it can access only your home directory by default.

You can add access to removable devices such as those stored in /media with sudo snap connect t2sz:removable-media.

If you want to give it access to every file you can install it with --devmode.

Get it from the Snap Store

t2sz's People

Contributors

martinellimarco avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

t2sz's Issues

Usability for non-tar archives

It might be nice to be able to run t2sz against non-tar-format files, eg: t2sz image.ext4 -o image.ext4.sz. It'd be useful for mounting disk images with ratarmount, for example. My tool pixz allows this sort of thing for writing indexed xz files, and I'd love to see the same functionality for zstd.

Memory corruption

git clone --depth=1 --single-branch https://github.com/stedolan/jq
tar -cf jq.tar -- jq
t2sz -vfo jq.tar.zst -- jq.tar

+ <null>
# END OF BLOCK (512)

+ <null>
# END OF BLOCK (512)

munmap_chunk(): invalid pointer
[1]    164384 abort (core dumped)  t2sz -vfo jq.tar.zst -- jq.tar

# OR:(sometimes): free(): invalid pointer
──────
(gdb) bt
#0  0x00007ffff7d2eef5 in raise () from /usr/lib/libc.so.6
#1  0x00007ffff7d18862 in abort () from /usr/lib/libc.so.6
#2  0x00007ffff7d70f38 in __libc_message () from /usr/lib/libc.so.6
#3  0x00007ffff7d78bea in malloc_printerr () from /usr/lib/libc.so.6
#4  0x00007ffff7d7901c in munmap_chunk () from /usr/lib/libc.so.6
#5  0x00007ffff7d7dcdb in free () from /usr/lib/libc.so.6
#6  0x00005555555552bc in main (argc=<optimized out>, argv=<optimized out>) at /home/user/.cache/aurutils/airy/t2sz-git/src/t2sz/src/t2sz.c:372
──────
(gdb) f 6
#6  0x00005555555552bc in main (argc=<optimized out>, argv=<optimized out>) at /home/user/.cache/aurutils/airy/t2sz-git/src/t2sz/src/t2sz.c:372
372         free(ctx->outFilename);
──────
(gdb) l
367             }
368         }
369
370         compressFile(ctx);
371
372         free(ctx->outFilename);
373         free(ctx);
374
375         return 0;
376     }
──────
(gdb)
==167002== Invalid free() / delete / delete[] / realloc()
==167002==    at 0x483F9AB: free (vg_replace_malloc.c:538)
==167002==    by 0x1092BB: main (t2sz.c:372)
==167002==  Address 0x1fff000109 is on thread 1's stack

Threading does not appear to work

Using the latest git commit, the -T option simply doesn't appear to work. t2sz will use only a single thread according to htop no matter how high I set it. This happens with -s and -S set to both high or low values, as well as without them set.

I am on Solus using zstd 1.5.2.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.