Giter Site home page Giter Site logo

Comments (7)

Piezoid avatar Piezoid commented on September 7, 2024 1

First, thank you for libdeflate.

We used your code as a base for experimenting with parallel decompression and found a way to achieve just that: https://github.com/Piezoid/pugz/

It's not yet production ready and we removed lots of features (compression, multiarch: only linux/x86 with SSE3.1 is currently supported). This a rather contrived implementation, I think it should be kept a a specialized library. Notably only ASCII files are currently supported.

The asynchronous API is not yet stabilized. Any input of usage patterns would be appreciated.

from libdeflate.

ebiggers avatar ebiggers commented on September 7, 2024

DEFLATE (and zlib and gzip) streams aren't suitable for parallel decompression.

However, if you aren't locked into a data format that uses a single stream, you can easily parallelize at the application layer by dividing the data into chunks before compression, then compressing and/or decompressing the chunks in parallel. libdeflate already works fine for this; just make sure to allocate a separate libdeflate_compressor or libdeflate_decompressor for each concurrent thread.

from libdeflate.

kanryu avatar kanryu commented on September 7, 2024

@Piezoid It is an interesting product. In the case of gzip, it is an understanding that it is a mechanism to perform parallel processing using the fact that one gz file contains multiple zlib chunks, is it actually like that?

What I questioned is the argument whether it can be accelerated by parallel processing of huffman decoding and lz decoding in a single zlib chunk, but parallelization of that (gzip) is worth it in itself is.

from libdeflate.

Piezoid avatar Piezoid commented on September 7, 2024

What you describe, if I'm not mistaken is similar to the bgzip file format. It use the fact that a gzip file can contain multiple gzip "parts" concatenated. This break the LZ77 dependency between two successive segments allow random access and parallelization. It's is quite ubiquitous for compressing bioinformatics text file formats. It is retroc-ompatbile with gzip tools but require recompression.

Pugz aim at decompressing vanilla gzip files, with a single header/part/footer. In a gzip stream, there is multiple deflate blocks, but they only reset the Huffman tables. The LZ77 sliding window is not reset and dependencies (what we call back-references) are carried from one block to the next.

Pugz solves this problem by doing a first pass that record the origins of back-references in the initial unknown sliding window. Then, after thread synchronization, the back-references are "translated" back to the correct characters using the end of the decompressed chunk coming from another thread.

from libdeflate.

kanryu avatar kanryu commented on September 7, 2024

@Piezoid Is that applicable to a deflate block, such as a PNG image?

from libdeflate.

Piezoid avatar Piezoid commented on September 7, 2024

Yes, but we don't support binary data atm. It could be done in theory, but at higher overhead (memory bandwidth).
Unless you have few very large PNGs I'm not sure if this would bring performance gains.
You are welcome to open an issue on pugz repository if you want to discuss the matter further.

from libdeflate.

ebiggers avatar ebiggers commented on September 7, 2024

Closing since support for parallel processing is currently out of scope for libdeflate itself.

from libdeflate.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.