andybalholm / brotli Goto Github PK
View Code? Open in Web Editor NEWPure Go Brotli encoder and decoder
License: MIT License
Pure Go Brotli encoder and decoder
License: MIT License
When resetting, encoderInitState
is called which resets params.quality
.
I would expect the compression level to be retained after a Reset as per the documentation.
Also, I suspect any options provided to NewWriterOptions
are also removed.
Both on i686 and armv7 with Golang 1.12.6:
Testing in: /builddir/build/BUILD/brotli-71eb68cc467c35a70c4b3d0c46b590260fe3f303/_build/src
PATH: /builddir/build/BUILD/brotli-71eb68cc467c35a70c4b3d0c46b590260fe3f303/_build/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/sbin
GOPATH: /builddir/build/BUILD/brotli-71eb68cc467c35a70c4b3d0c46b590260fe3f303/_build:/usr/share/gocode
GO111MODULE: off
command: go test -buildmode pie -compiler gc -ldflags "-X github.com/andybalholm/brotli/version.commit=71eb68cc467c35a70c4b3d0c46b590260fe3f303 -X github.com/andybalholm/brotli/version=0 -extldflags '-Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld '"
testing: github.com/andybalholm/brotli
github.com/andybalholm/brotli
--- FAIL: TestDecodeFuzz (0.00s)
panic: runtime error: index out of range [-2] [recovered]
panic: runtime error: index out of range [-2]
goroutine 88 [running]:
testing.tRunner.func1(0x58cb0000)
/usr/lib/golang/src/testing/testing.go:874 +0x352
panic(0x56a80120, 0x58c163e0)
/usr/lib/golang/src/runtime/panic.go:663 +0x180
github.com/andybalholm/brotli.TestDecodeFuzz(0x58cb0000)
/builddir/build/BUILD/brotli-71eb68cc467c35a70c4b3d0c46b590260fe3f303/_build/src/github.com/andybalholm/brotli/brotli_test.go:322 +0x366
testing.tRunner(0x58cb0000, 0x56a9ecdc)
/usr/lib/golang/src/testing/testing.go:909 +0xae
created by testing.(*T).Run
/usr/lib/golang/src/testing/testing.go:960 +0x2d3
exit status 2
FAIL github.com/andybalholm/brotli 0.455s
@andybalholm the diff since the last release seems significant with a bunch of improvements and optimizations. Would you considering creating a v1.1.0
release?
It would be nice to have some examples in the README for best practices for compress and decompress. More specifically how to reuse the writer.
Can you please do a simple encode and decode instructions for usage of this brotli?
I've only found how to encode it. How do I decode it?
https://gist.github.com/miguelmota/a0a3f18d2c6caee52508f1c393927ec6
Pls provide the instructions on your front page of github. thx
Safari sends gzip, deflate, br
as its Accept-Encoding value. There are no weights.
I am not certain, but it seems the parsing expects something more complex?
Maybe here? https://github.com/andybalholm/brotli/blob/master/http.go#L74
Do I misunderstand?
If this is an issue, would you accept a patch to fix it?
This library seems to allocate a lot even when using Reset() to try to reuse data structures where ever possible between requests. It creates around a thousand allocs (compared to compress/gzip which appears to be zero alloc with multiple) and performs tens of times slower (according to micro benchmarks on the default quality level).
Hi, this looks neat! What is the status of this library? I'd like to add this to archiver and possibly to Caddy next chance I get, if it's ready to use.
(Edit: disregard the license question, duh, for some reason I didn't see it on first pass)
I don't have a reproducer, but I have a stack trace:
panic: runtime error: slice bounds out of range [179264:0]
goroutine 9546228 [running]:
github.com/andybalholm/brotli.(*h5).FindLongestMatch(0xc005abc5a0, 0x0?, {0xc06d724002, 0x810007, 0x810007}, 0x7fffff, {0xc030a201c0, 0x10, 0x0?}, 0x0, ...)
github.com/andybalholm/brotli/h5.go:169 +0x7ff
github.com/andybalholm/brotli.createBackwardReferences(0x10000, 0x0, {0xc06d724002, 0x810007, 0x810007}, 0x63e05d?, 0xc030a20058, {0x1bd2e48, 0xc005abc5a0}, {0xc030a201c0, ...}, ...)
github.com/andybalholm/brotli/backward_references.go:77 +0x359
github.com/andybalholm/brotli.encodeData(0xc030a20028, 0x0, 0x83?)
github.com/andybalholm/brotli/encode.go:844 +0xae7
github.com/andybalholm/brotli.encoderCompressStream(0xc030a20028, 0x1, 0xc000a06e68, 0xc000a06e98)
github.com/andybalholm/brotli/encode.go:1190 +0x2d5
github.com/andybalholm/brotli.(*Writer).writeChunk(0xc030a20028, {0x0?, 0xd46220?, 0x1?}, 0xc030a20028?)
github.com/andybalholm/brotli/writer.go:78 +0xb9
github.com/andybalholm/brotli.(*Writer).Flush(0x6c595d?)
github.com/andybalholm/brotli/writer.go:97 +0x25
Hi Andy,
I understand this is a golang port, but raising this here anyway. I am observing the following benchmark results from encoding and decoding a random character sequence. As the []byte size grows to 32MB the execution time of the encoder slows down to a whole second. This is many orders of magnitude slower than similar algorithms in go and negatively affected by input size.
I was wondering if you can confirm this and/or have any input on how to improve performance?
BenchmarkBrotliEncode128B-16 144399 7416 ns/op
BenchmarkBrotliDecode128B-16 398236 3124 ns/op
BenchmarkBrotliEncode1KB-16 74959 15411 ns/op
BenchmarkBrotliDecode1KB-16 354914 4664 ns/op
BenchmarkBrotliEncode64KB-16 2673 415396 ns/op
BenchmarkBrotliDecode64KB-16 143876 23852 ns/op
BenchmarkBrotliEncode128KB-16 1393 795543 ns/op
BenchmarkBrotliDecode128KB-16 46455 28896 ns/op
BenchmarkBrotliEncode1MB-16 181 5975186 ns/op 5ms
BenchmarkBrotliDecode1MB-16 14785 75049 ns/op
BenchmarkBrotliEncode2MB-16 98 11964768 ns/op 11ms
BenchmarkBrotliDecode2MB-16 2452 849694 ns/op
BenchmarkBrotliEncode4MB-16 45 25300261 ns/op
BenchmarkBrotliDecode4MB-16 5822 1166313 ns/op
BenchmarkBrotliEncode8MB-16 19 60138005 ns/op
BenchmarkBrotliDecode8MB-16 3024 720402 ns/op
BenchmarkBrotliEncode16MB-16 5 205008482 ns/op
BenchmarkBrotliDecode16MB-16 104 14655937 ns/op
BenchmarkBrotliEncode32MB-16 1 1146873803 ns/op 1s!!!
BenchmarkBrotliDecode32MB-16 10 103324444 ns/op
Source code: https://github.com/simonmittag/j8a/blob/166/brotli_test.go
In google/brotli latest release (v1.0.9), they are mentioning that they fixed the vulnerability issue: https://nvd.nist.gov/vuln/detail/CVE-2020-8927:
SECURITY: decoder: fix integer overflow when input chunk is larger than 2GiB (CVE-2020-8927)
I am wondering is it mitigated here as well? in the latest release v1.0.4?
Thanks!
Thanks for this library! It is great to have a pure go brotli encoder.
I am seeing the Flush()
calls not completely process all data following a call to a large Write()
. Modifying your unit test TestEncoderFlush
to use an input size of 32766
instead of 1000
demonstrates the behavior.
func TestEncoderFlush(t *testing.T) {
input := make([]byte, 32766) // MODIFIED
rand.Read(input)
out := bytes.Buffer{}
e := NewWriterOptions(&out, WriterOptions{Quality: 5})
in := bytes.NewReader(input)
_, err := io.Copy(e, in)
if err != nil {
t.Fatalf("Copy Error: %v", err)
}
if err := e.Flush(); err != nil {
t.Fatalf("Flush(): %v", err)
}
if out.Len() == 0 {
t.Fatalf("0 bytes written after Flush()")
}
decompressed := make([]byte, 32766) // MODIFIED
reader := NewReader(bytes.NewReader(out.Bytes()))
n, err := reader.Read(decompressed)
if n != len(decompressed) || err != nil {
t.Errorf("Expected <%v, nil>, but <%v, %v>", len(decompressed), n, err)
}
if !bytes.Equal(decompressed, input) {
t.Errorf(""+
"Decompress after flush: %v\n"+
"%q\n"+
"want:\n%q",
err, decompressed, input)
}
if err := e.Close(); err != nil {
t.Errorf("Close(): %v", err)
}
}
results in:
brotli_test.go:241: Expected <32766, nil>, but <32765, <nil>>
brotli_test.go:244: Decompress after flush: <nil>
...
Is it expected that large Writes()
are supported? Is the unit test somehow using the API incorrectly to ensure all data is both written and flushed?
With some data, I'm seeing this error while it works with the rest. What does this error mean and how can I solve it?
This is my code:
// Compress text with the brotli compression algorithm
func Compress(s string) []byte {
var b bytes.Buffer
bw := brotli.NewWriter(nil)
b.Reset()
// Reset the compressor and encode from some input stream.
bw.Reset(&b)
if _, err := io.WriteString(bw, s); err != nil {
log.Fatal(err)
}
if err := bw.Close(); err != nil {
log.Fatal("failed to comporess:", err)
}
return b.Bytes()
}
// Decomporess brotli comporessed data
func Decomporess(data []byte) string {
b := bytes.NewBuffer(data)
br := brotli.NewReader(nil)
// Reset the decompressor and decode to some output stream.
if err := br.Reset(b); err != nil {
log.Fatal(err)
}
// dst := os.Stdout
dst := bytes.NewBuffer(nil)
if _, err := io.Copy(dst, br); err != nil {
log.Fatal("failed to decomporess:", err)
}
return dst.String()
}
how cache?To avoid repeated compression of the same request every time
thanks
These are bugs introduced by c2go not understanding C's integer promotion rules:
encode.go:960:22: storage[1] (8 bits) too small for shift of 8
transform.go:546:55: (word[0] & 0x0F) (8 bits) too small for shift of 12
transform.go:556:55: (word[1] & 0x3F) (8 bits) too small for shift of 12
transform.go:556:76: (word[0] & 0x07) (8 bits) too small for shift of 18
transform.go:613:63: trans.params[transform_idx*2+1] (8 bits) too small for shift of 8
transform.go:616:63: trans.params[transform_idx*2+1] (8 bits) too small for shift of 8
In theory google/brotli has support for shared dictionaries. In practice I've not seen any example anywhere.
Does this package support shared dictionaries? Could you maybe provide an example?
pelase check the attched zip js file: marked.js.zip
1.0.0 is ok!
/usr/local/Cellar/go/1.16/libexec/src/runtime/panic.go:971 +0x499
github.com/andybalholm/brotli.compressFragmentFastImpl(0xc0002c1380, 0x31, 0x40, 0x31, 0x0, 0xc0002f8a68, 0x400, 0x400, 0x9, 0xc0002faa88, ...)
/pkg/mod/github.com/andybalholm/[email protected]/compress_fragment.go:382 +0x23fb
github.com/andybalholm/brotli.compressFragmentFast(0xc0002c1380, 0x31, 0x40, 0x31, 0xc0002f8a00, 0xc0002f8a68, 0x400, 0x400, 0x200, 0xc0002faa88, ...)
/pkg/mod/github.com/andybalholm/[email protected]/compress_fragment.go:673 +0x418
github.com/andybalholm/brotli.encoderCompressStreamFast(0xc0002f8800, 0x0, 0xc000816448, 0xc000816458, 0x16c4980)
/pkg/mod/github.com/andybalholm/[email protected]/encode.go:971 +0x314
github.com/andybalholm/brotli.encoderCompressStream(0xc0002f8800, 0x0, 0xc000816448, 0xc000816458, 0x10a5680)
/pkg/mod/github.com/andybalholm/[email protected]/encode.go:1101 +0x369
github.com/andybalholm/brotli.(*Writer).writeChunk(0xc0002f8800, 0xc0002c1380, 0x31, 0x40, 0x0, 0x1669a20, 0x0, 0x385f0b8)
/pkg/mod/github.com/andybalholm/[email protected]/writer.go:77 +0xba
github.com/andybalholm/brotli.(*Writer).Write(0xc0002f8800, 0xc0002c1380, 0x31, 0x40, 0xc0002f8800, 0x0, 0x3860b18)
/pkg/mod/github.com/andybalholm/[email protected]/writer.go:111 +0x52
github.com/ije/rex.(*responseWriter).Write(0xc0001de400, 0xc0002c1380, 0x31, 0x40, 0x163e6e0, 0x16bce00, 0x1c00000000000001)
Great work on this. I see that there have been some commits since the last release (which is now more than a year old).
Do you intend to tag a new release soon?
Hi,
I created a bunch of compressed files with the help of your package (version 1.0.4). I can decompress them with your library, however I am failing to decompress them with the CLI (neither with version 1.0.7, nor with 1.0.9). Interestingly, your library is able to decompress the brotli files which were created with the CLI.
The brotli files that are produced by both your library and CLI have the same file size, however they differ in content(when I do a byte-wise comparison).
This is how I compress my files with you library:
bw := brotli.NewWriterLevel(&buf, brotli.BestCompression)
Using the CLI:
$ brotli foo.html -o foo.html.br
Decompressing them with your library:
br := brotli.NewReader(os.Stdin)
_, err := io.Copy(os.Stdout, br)
Decompressing with the CLI:
cat foo.html.br | brotli --decompress
Am I doing something wrong here?
Hello.
I wonder, if this work could be integrated into google/brotli.
As I understand, this repo is generated by c2go project. But:
But the first things go first - I'm asking for authors blessing to do that.
I want to use brotli for stream compression.
I have a use case when i need to interrupt ( stop ) the stream compression for one week before continuing. For this I need to save the dictionary created already. I want to get a serialized state of the dynamic dictionary once the compression is stopped. So that, after, I will be able to de serialize the state and continue the process of compression using the dictionary created.
Is this feature/api available within this library ?
If no, I want to add it to this library, so from where should I start ? Can you give me some hints ?
Are you interested in this use case to add it to the library ?
github.com/golang/gddo
is a large code-base. All that code is pulled in because writer.go
uses httputil.NegotiateContentEncoding
.
It would be simple to make the library self-contained (no dependencies) by extracting NegotionateContentEncoding
(and the code it's using) from github.com/golang/gddo
(and putting e.g. in httputil.go
file).
Hi, I'm getting a consistent panic when I test this function. Am I using it wrong?
func zip(b []byte) ([]byte, error) {
buf := &bytes.Buffer{}
w := brotli.NewWriterLevel(buf, 11)
if n, err := w.Write(b); err != nil {
return nil, err
} else if n < len(b) {
return nil, fmt.Errorf("n too small: %d vs %d for %s", n, len(b), string(b))
}
if err := w.Close(); err != nil {
return nil, err
}
return ioutil.ReadAll(buf)
}
func TestCompBuffer(t *testing.T) {
s := ""
n := 1000
for i := 0; i < n; i++ {
t.Logf("running test #%d", i)
s = uuid.Must(uuid.NewV4()).String()
assert.NotPanics(t, func() {
zipped, err := zip([]byte(s))
assert.NoError(t, err)
unzipped, err := ioutil.ReadAll(brotli.NewReader(bytes.NewReader(zipped)))
assert.NoError(t, err)
assert.Equal(t, s, string(unzipped))
})
}
}
Stack trace:
Panic value: runtime error: index out of range [0] with length 0
Panic stack: goroutine 10 [running]:
github.com/andybalholm/brotli.initBlockSplitIterator(...)
/pkg/mod/github.com/andybalholm/[email protected]/histogram.go:167
github.com/andybalholm/brotli.buildHistogramsWithContext(0xc0006c82c0, 0x1, 0x2c, 0xc000618820, 0xc000618870, 0xc0006188c0, 0xc00032e092, 0x2b, 0x2b, 0x0, ...)
/pkg/mod/github.com/andybalholm/[email protected]/histogram.go:191 +0x67f
github.com/andybalholm/brotli.buildMetaBlock(0xc00032e092, 0x2b, 0x2b, 0x0, 0x7fffff, 0xc00016d9a8, 0x0, 0xc0006c82c0, 0x1, 0x2c, ...)
/pkg/mod/github.com/andybalholm/[email protected]/metablock.go:220 +0x5bb
github.com/andybalholm/brotli.writeMetaBlockInternal(0xc00032e092, 0x2b, 0x2b, 0x7fffff, 0x0, 0x24, 0xc0001ac001, 0x2, 0xc0001ac030, 0xc0001a0000, ...)
/pkg/mod/github.com/andybalholm/[email protected]/encode.go:457 +0x823
github.com/andybalholm/brotli.encodeData(0xc0001ac000, 0x1, 0xc00032e060)
/pkg/mod/github.com/andybalholm/[email protected]/encode.go:859 +0x8ef
github.com/andybalholm/brotli.encoderCompressStream(0xc0001ac000, 0x2, 0xc00016dc20, 0xc00016dc30, 0xc00016dc01)
/pkg/mod/github.com/andybalholm/[email protected]/encode.go:1128 +0x1a7
github.com/andybalholm/brotli.(*Writer).writeChunk(0xc0001ac000, 0x0, 0x0, 0x0, 0x2, 0x24, 0x0, 0x0)
/pkg/mod/github.com/andybalholm/[email protected]/writer.go:77 +0xaf
github.com/andybalholm/brotli.(*Writer).Close(...)
/pkg/mod/github.com/andybalholm/[email protected]/writer.go:103
Issue description
On static analysis of the binary, generated after compiling our go fiber app we found out a lot of text and links in the final binary, which turned out to be test data in the package.
Please check this issue:
Error from Chrome:
{"params":{"description":"Server reset stream.","net_error":"ERR_HTTP2_PROTOCOL_ERROR","stream_id":5},"phase":0,"source":{"id":1493828,"start_time":"732370299","type":1},"time":"732375561","type":224},
Code Responsible:
func ModifyReturnOfTagManager(r *http.Response) error {
// Get the proxy domain from the URL
proxyDomain := r.Request.URL.Query().Get("domain")
fmt.Println(proxyDomain)
bytesFromBody, err := ioutil.ReadAll(r.Body)
defer r.Body.Close()
if err != nil {
return nil
}
if r.Header.Get("Content-Encoding") == "br" {
logger.Logger.Info().Msg("modified gzip")
bReader := brotli.NewReader(bytes.NewBuffer(bytesFromBody))
readableBytes, err := ioutil.ReadAll(bReader)
if err != nil {
logger.Logger.Error().Msg(err.Error())
}
out := bytes.Buffer{}
writer := brotli.NewWriterOptions(&out, brotli.WriterOptions{Quality: 1})
defer writer.Close()
in := bytes.NewReader(readableBytes)
_, err = io.Copy(writer, in)
if err != nil {
logger.Logger.Error().Msg(err.Error())
}
r.ContentLength = int64(len(readableBytes))
r.Header.Set("Content-Length", strconv.FormatInt(int64(len(readableBytes)), 10))
r.Body = ioutil.NopCloser(&out)
return nil
}
return nil
}
Which is called from:
```go proxy = httputil.ReverseProxy{ Director: unblocker.Name, } proxy.ModifyResponse = unblocker.ModifyReturnOfTagManager ```
I have verified that the bytes received are correct and I can read properly, but trying to recompress causes the issue.
There's a bug fix provided in ec682fb that affects brotli.Reader.Reset
. Would you be willing to release v1.0.4 so that a tagged release has this change? Thanks!
brotli.ensureRingBuffer is allocating more than 17 gb of memory and it significantly impact the performance.
Total: 17.86GB 17.86GB (flat, cum) 68.44%
1307 . . var old_ringbuffer []byte = s.ringbuffer
1308 . . if s.ringbuffer_size == s.new_ringbuffer_size {
1309 . . return true
1310 . . }
1311 . .
1312 17.86GB 17.86GB s.ringbuffer = make([]byte, uint(s.new_ringbuffer_size)+uint(kRingBufferWriteAheadSlack))
1313 . . if s.ringbuffer == nil {
1314 . . /* Restore previous value. */
1315 . . s.ringbuffer = old_ringbuffer
1316 . .
1317 . . return false
I suggest reusing buffer with sync.Pool.
When I use your library in a golang webserver to serve a larger wasm file, compressed at level 10 or 11, the file will get truncated when downloaded via nginx.
The nginx is both old nginx 1.18.0 as well as new nginx 1.24 with brotli plugin installed/enabled/disabled.
Reverting to compression level 9 workarounds the problem.
The file: https://core.haircomber.com/ui/combfullui.wasm, currently served by level 9 compression.
The client requests gzip / br / deflate and my go server activates br.
Note: I don't use Content-Length and I want the file to simply pass trough nginx, which we use to add the ssl.
Hi,
I'd like to use brotli in conjunction with https://github.com/mus-format/mus-stream-go.
However, mus-stream-go
needs the readers and writers to implement these interfaces:
package muss
import "io"
// Writer is the interface that groups the WriteByte, Write and WriteString
// methods.
type Writer interface {
io.ByteWriter
io.Writer
io.StringWriter
}
// Reader is the interface that groups the basic ReadByte and Read methods.
type Reader interface {
io.ByteReader
io.Reader
}
Regards.
Thank you for creating this library and updating it for Go modules!
Would it be possible to tag a release version (whether it be v0.0.1, v0.1.0, v1.0.0 etc.?) of the repository to make it easier to consume this module?
As it stands now, importing this module results in using a pseudo-version:
➜ go get github.com/andybalholm/brotli
go: finding github.com/andybalholm/brotli latest
go: downloading github.com/andybalholm/brotli v0.0.0-20190821151343-b60f0d972eeb
go: extracting github.com/andybalholm/brotli v0.0.0-20190821151343-b60f0d972eeb
➜ cat go.mod
module foo
go 1.13
require github.com/andybalholm/brotli v0.0.0-20190821151343-b60f0d972eeb // indirect
Having a tagged version would help to maintain more readable module dependencies.
Hi, I see you added a commit with Reader reset fix, can you release library with this fix?
I believe I've found a bug; please let me know if I'm using the API wrong. This repro consists of a small test runner (code pasted below) plus a binary input file (attached). Note that locally I never actually get to the "FAIL!" case - instead I get a error back from the brotli
reader.
download this to the same directory: repro.data.gz
package main
import (
"bytes"
"compress/gzip"
"fmt"
"io/ioutil"
"os"
"github.com/andybalholm/brotli"
)
func main() {
zipbytes, err := ioutil.ReadFile("repro.data.gz")
if err != nil {
panic(err)
}
r, err := gzip.NewReader(bytes.NewReader(zipbytes))
if err != nil {
panic(err)
}
inbytes, err := ioutil.ReadAll(r)
if err != nil {
panic(err)
}
if len(inbytes) != 2851073 {
panic("something wrong with input file")
}
var buf bytes.Buffer
w := brotli.NewWriter(&buf)
if n, err := w.Write(inbytes); err != nil {
panic(err) // theoretically should never happen
} else if n != len(inbytes) {
panic(fmt.Sprintf("bad write, want %d got %d", len(inbytes), n))
}
if err := w.Close(); err != nil {
panic(err) // theoretically should never happen
}
encbytes := buf.Bytes()
rdr := brotli.NewReader(bytes.NewReader(encbytes))
rawbytes, err := ioutil.ReadAll(rdr)
if err != nil {
panic(err)
}
if bytes.Equal(inbytes, rawbytes) {
fmt.Printf("PASS!\n")
} else {
fmt.Printf("FAIL!\n")
os.Exit(1)
}
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.