Giter Site home page Giter Site logo

Comments (10)

jimsmart avatar jimsmart commented on August 17, 2024

Possibly related?
facebook/zstd#206

from zstd.

jimsmart avatar jimsmart commented on August 17, 2024

Ok, the best thing to do if one cannot ensure good-sized calls to Write, is to use a bufio.Writer like this:-

func main() {

	b := &bytes.Buffer{}
	for i := 0; i < 500; i++ {
		b.Write([]byte("Hello World! "))
	}
	data1 := b.Bytes()
	fmt.Println("data len", len(data1))

	// Compress 1
	buffer1 := &bytes.Buffer{}
	w1 := zstd.NewWriterLevel(buffer1, CompressionLevel)
	w1.Write(data1)
	w1.Close()

	fmt.Println("Buffer1 len", buffer1.Len())

	// Compress 2
	buffer2 := &bytes.Buffer{}
	w2 := zstd.NewWriterLevel(buffer2, CompressionLevel)
	bw := bufio.NewWriter(w2) // default buffer size = 4k
	// bw := bufio.NewWriterSize(w2, 8192) // buffer size = 8k
	for i := 0; i < 500; i++ {
		bw.Write([]byte("Hello World! "))
	}
	bw.Flush()
	w2.Close()

	fmt.Println("Buffer2 len", buffer2.Len())
}

Output:

data len 6500
Buffer1 len 33
Buffer2 len 44

It's not so elegant, but ¯\(ツ)

— Hope that helps someone!

from zstd.

valyala avatar valyala commented on August 17, 2024

@jimsmart , try gozstd.Writer. It uses another underlying zstd API, which should have lower overhead.

from zstd.

dmoklaf avatar dmoklaf commented on August 17, 2024

The zstd bug referenced above (facebook/zstd#206) has been closed. Is this issue still ongoing? If yes, the need to wrap the writer with a buffer shall be documented, this is a pretty subtle usage advice.

from zstd.

Viq111 avatar Viq111 commented on August 17, 2024

Hi @rgeronimi,

I checked again previous result and indeed you'd currently have the same results.
zstd (this lib) & gozstd (from @valyala above) use 2 slightly different C zstd API with slightly different directions.

(this) zstd library uses ZSTD_compressContinue which basically use buffer-less zstd streaming compression, meaning we have complete control over memory at the expense of needing to do buffers Go-side if you want to optimize for compression size on small inputs.

gozstd uses ZSTD_compressStream which abstract that buffer logic into the C code (at the cost of having less control over memory consumption C land)

Hope this helps!

from zstd.

valyala avatar valyala commented on August 17, 2024

The following limitations for ZSTD_compressContinue look scary:

  • ZSTD_compressContinue() presumes prior input is still accessible and unmodified (up to maximum distance size, see WindowLog). It remembers all previous contiguous blocks, plus one separated memory segment (which can itself consists of multiple contiguous blocks)
  • ZSTD_compressContinue() detects that prior input has been overwritten when src buffer overlaps. In which case, it will "discard" the relevant memory section from its history.

As I understand, they mean two things:

  • zstd could use garbage as a dictionary from the previous buffers used in the ZSTD_compressContinue call if address of these buffers doesn't match the address of the current buffer. This also may lead to segmentation fault if the underlying memory of the previous buffer has been unmapped from the process address space.
  • zstd may have bad compression rate, since it discards dictionary data from the previously compressed block if the buffer passed to the func is re-used.

cc'ing @Cyan4973 for further clarification.

from zstd.

Cyan4973 avatar Cyan4973 commented on August 17, 2024

@Viq111 explanations are correct.

ZSTD_compressContinue() is a fairly low level function, designed for systems which need absolute control over memory allocation. It requires a fairly good control over buffer content and lifetime. To be fair, it's more targeted at embedded environments than managed languages, but I'm not a qualified expert to tell if this is a good fit or not for go.

When in doubt, prefer using ZSTD_compressStream(). It's safer to use, and abstract all the machinery, at the cost of also managing its own internal buffers.

from zstd.

dmoklaf avatar dmoklaf commented on August 17, 2024

ZSTD_compressContinue() is a fairly low level function, designed for systems which need absolute control over memory allocation. It requires a fairly good control over buffer content and lifetime. To be fair, it's more targeted at embedded environments than managed languages, but I'm not a qualified expert to tell if this is a good fit or not for go.

This depends on what the Go wrapper code does. I just checked it and it transmits directly the user-provided buffer as a C pointer. If I understand what @valyala wrote, this could be a critical bug as the zstd C code is expecting this buffer to remain accessible after the function returns. If true, this would have the potential for data corruptions, process crashes, and hard-to-reproduce cases.

When in doubt, prefer using ZSTD_compressStream(). It's safer to use, and abstract all the machinery, at the cost of also managing its own internal buffers.

from zstd.

Viq111 avatar Viq111 commented on August 17, 2024

Reading back at the code, we started implementing the go wrapper at zstd v0.5 which only had the ZBUFF_decompressContinue methods indeed: https://github.com/facebook/zstd/blob/201433a7f713af056cc7ea32624eddefb55e10c8/lib/zstd_buffered.h#L79

It may actually be also the issue for #39

If anyone could put up a PR for migrating to ZSTD_compressStream, we are accepting all contributions!

Otherwise I can also look into it as it seems it could bit a couple of people using the streaming interface

from zstd.

dmoklaf avatar dmoklaf commented on August 17, 2024

We don't have the skillset to zoom into that soon. For storage tasks (e.g., blob storage in DB or compressed custom backup) this bug is a showstopper unfortunately.

from zstd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.