Giter Site home page Giter Site logo

Deflat Compressor Length about libdeflate HOT 7 CLOSED

Bit00009 avatar Bit00009 commented on September 7, 2024
Deflat Compressor Length

from libdeflate.

Comments (7)

ebiggers avatar ebiggers commented on September 7, 2024 1

What do you think about it? How it's handling unknown size?

Your C# code snippet streams the data to a MemoryStream, which implements an automatically-resizing array. So it could end up copying the data many times as it incrementally reallocates the array, and end up with a buffer up to 2x larger than is required.

You can improve the performance of both the C# version and the libdeflate version, and also make it much easier to use libdeflate, by storing the uncompressed size along with the compressed data.

Also i tried a very small test data (24 byte) DeflateStream compressed file but libdeflate returns 0 byte and compressed file is empty.

Check libdeflate.h for how to use the API.

from libdeflate.

ebiggers avatar ebiggers commented on September 7, 2024 1

I guess there could be a function that simulates the decompression without actually writing any output. It might be useful for checking if a given block of data contains a valid deflate stream, and could report the number of input bytes consumed and output bytes produced, which would be useful here. This might be outside the scope of libdeflate though.

Sure, but that would encourage people to do the wrong thing (decompress the data twice) instead of the right thing (store the uncompressed size along with the compressed data). And I'd like to keep the API simple. So I think it's better to not add it, and instead just encourage people to do the right thing.

from libdeflate.

ebiggers avatar ebiggers commented on September 7, 2024

You'd need to guess some uncompressed size, allocate a buffer of that size, then try decompressing into it. If it fails with LIBDEFLATE_INSUFFICIENT_SPACE, enlarge the buffer and try again. And so on.

It's not really a good solution. It's much better to just store the uncompressed size along with the compressed data.

It only takes a few bytes to store the uncompressed size -- or less if you store an approximate size only.

Note that your code snippet only uses compression level 9, while libdeflate goes up to level 12. You probably could save much more than a few bytes by using a higher compression level.

from libdeflate.

Bit00009 avatar Bit00009 commented on September 7, 2024

Hello @ebiggers thanks for your fast reply,
Yes I'm currently use guessing but it's extremely slow when my data becomes more than 5 MB
I'm really ceriouse how C# DeflateStream manage to handle same situation and It's fast enough , the source is available here.

I noticed there's a buffer and windows size value in .net DeflateStream :

        internal const int DefaultBufferSize = 8192;
        private const int WindowSizeUpperBound = 47;

What do you think about it? How it's handling unknown size?

It's much better to just store the uncompressed size along with the compressed data.

I want to prevent data splitting as much as I can but if it's the last thing I can do, I'll do but I'm dying from curiosity how .net managed deflate compression/decompression.

Also i tried a very small test data (24 byte) DeflateStream compressed file but libdeflate returns 0 byte and compressed file is empty. I attached files.
InputData.txt

This is deflatestream output
compressed-deflatestream.gz

Regards,
Ultran

from libdeflate.

dmitry-azaraev avatar dmitry-azaraev commented on September 7, 2024

@Bit00009 internally any deflate decoder rely on BFINAL bit in stream so them generally has idea when they should stop. Also there is exist some kind of natural chunking in deflate, so it may adapted to work with input/output buffers which may essentially be represented under Stream-like facade.

Technically, at least for decompressing, libdeflate can be extended to support to work with chunks (i guess this needs injection points at decompress_template.h about next_block / block_done), but... while it is might looks as easy, it is not. (And i has no idea about compression at all.)

"Stream"-like facade - is not so simple in fact, as it may look. Just imagine what you have file with 80,000 small compressed blocks (say 500-1.5Kb) which are written in stream one-by-one. Each instance of DeflateStream will buffer input (from file stream) and will decode (decompress) stream. As result after first decoded block (let's say it was 500 bytes) it actually consumes 8Kb (internal buffer), and original file stream essentially no more point to next uncompressed block (did you know this?), like it should be. DeflateStream even doesn't provide information how many bytes it really consumed. (In my case I has compressed data size, so i have options - create another stream which will limit how many bytes DeflateStream may read (just return EOF after N bytes). Or just use codecs like libdeflate which work with raw buffers, as result skipping of this complex machinery saves from lot of allocations, resulting in at least twice better throughput.) Buffered streams which provide access to internal buffer for consumer (consumer is something like DeflateStream) - will have more flexibility, but in .NET is not a option and generally it is not a popular option at all.

I'm generally trying to show what blind reproducing of generic Stream-like interface - is a way to making library with stupid and hard to use interface. Which one interface will be easy to use and satisfy all requirements? I don't know.

from libdeflate.

jibsen avatar jibsen commented on September 7, 2024

I guess there could be a function that simulates the decompression without actually writing any output. It might be useful for checking if a given block of data contains a valid deflate stream, and could report the number of input bytes consumed and output bytes produced, which would be useful here. This might be outside the scope of libdeflate though.

from libdeflate.

wegylexy avatar wegylexy commented on September 7, 2024

When the deflated stream is like several MB, partial read results in LIBDEFLATE_INSUFFICIENT_SPACE or DestinationTooSmall. It is unreasonable to require 2 continuous blocks of memory that big.
For example, a 4MB gitpack may be inflated to 13MB.

from libdeflate.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.