Giter Site home page Giter Site logo

segmentio / encoding Goto Github PK

View Code? Open in Web Editor NEW
964.0 15.0 49.0 13.43 MB

Go package containing implementations of efficient encoding, decoding, and validation APIs.

License: MIT License

Makefile 3.23% Go 96.76% PureBasic 0.01%
go golang json ascii iso8601 performance encoding decoding validation protobuf

encoding's Introduction

encoding build status Go Report Card GoDoc

Go package containing implementations of encoders and decoders for various data formats.

Motivation

At Segment, we do a lot of marshaling and unmarshaling of data when sending, queuing, or storing messages. The resources we need to provision on the infrastructure are directly related to the type and amount of data that we are processing. At the scale we operate at, the tools we choose to build programs can have a large impact on the efficiency of our systems. It is important to explore alternative approaches when we reach the limits of the code we use.

This repository includes experiments for Go packages for marshaling and unmarshaling data in various formats. While the focus is on providing a high performance library, we also aim for very low development and maintenance overhead by implementing APIs that can be used as drop-in replacements for the default solutions.

Requirements and Maintenance Schedule

This package has no dependencies outside of the core runtime of Go. It requires a recent version of Go.

This package follows the same maintenance schedule as the Go project, meaning that issues relating to versions of Go which aren't supported by the Go team, or versions of this package which are older than 1 year, are unlikely to be considered.

Additionally, we have fuzz tests which aren't a runtime required dependency but will be pulled in when running go mod tidy. Please don't include these go.mod updates in change requests.

encoding/json GoDoc

More details about how this package achieves a lower CPU and memory footprint can be found in the package README.

The json sub-package provides a re-implementation of the functionalities offered by the standard library's encoding/json package, with a focus on lowering the CPU and memory footprint of the code.

The exported API of this package mirrors the standard library's encoding/json package, the only change needed to take advantage of the performance improvements is the import path of the json package, from:

import (
    "encoding/json"
)

to

import (
    "github.com/segmentio/encoding/json"
)

The improvement can be significant for code that heavily relies on serializing and deserializing JSON payloads. The CI pipeline runs benchmarks to compare the performance of the package with the standard library and other popular alternatives; here's an overview of the results:

Comparing to encoding/json (v1.16.2)

name                           old time/op    new time/op     delta
Marshal/*json.codeResponse2      6.40ms ± 2%     3.82ms ± 1%   -40.29%  (p=0.008 n=5+5)
Unmarshal/*json.codeResponse2    28.1ms ± 3%      5.6ms ± 3%   -80.21%  (p=0.008 n=5+5)

name                           old speed      new speed       delta
Marshal/*json.codeResponse2     303MB/s ± 2%    507MB/s ± 1%   +67.47%  (p=0.008 n=5+5)
Unmarshal/*json.codeResponse2  69.2MB/s ± 3%  349.6MB/s ± 3%  +405.42%  (p=0.008 n=5+5)

name                           old alloc/op   new alloc/op    delta
Marshal/*json.codeResponse2       0.00B           0.00B           ~     (all equal)
Unmarshal/*json.codeResponse2    1.80MB ± 1%     0.02MB ± 0%   -99.14%  (p=0.016 n=5+4)

name                           old allocs/op  new allocs/op   delta
Marshal/*json.codeResponse2        0.00            0.00           ~     (all equal)
Unmarshal/*json.codeResponse2     76.6k ± 0%       0.1k ± 3%   -99.92%  (p=0.008 n=5+5)

Benchmarks were run on a Core i9-8950HK CPU @ 2.90GHz.

Comparing to github.com/json-iterator/go (v1.1.10)

name                           old time/op    new time/op    delta
Marshal/*json.codeResponse2      6.19ms ± 3%    3.82ms ± 1%   -38.26%  (p=0.008 n=5+5)
Unmarshal/*json.codeResponse2    8.52ms ± 3%    5.55ms ± 3%   -34.84%  (p=0.008 n=5+5)

name                           old speed      new speed      delta
Marshal/*json.codeResponse2     313MB/s ± 3%   507MB/s ± 1%   +61.91%  (p=0.008 n=5+5)
Unmarshal/*json.codeResponse2   228MB/s ± 3%   350MB/s ± 3%   +53.50%  (p=0.008 n=5+5)

name                           old alloc/op   new alloc/op   delta
Marshal/*json.codeResponse2       8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
Unmarshal/*json.codeResponse2    1.05MB ± 0%    0.02MB ± 0%   -98.53%  (p=0.000 n=5+4)

name                           old allocs/op  new allocs/op  delta
Marshal/*json.codeResponse2        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
Unmarshal/*json.codeResponse2     37.2k ± 0%      0.1k ± 3%   -99.83%  (p=0.008 n=5+5)

Although this package aims to be a drop-in replacement of encoding/json, it does not guarantee the same error messages. It will error in the same cases as the standard library, but the exact error message may be different.

encoding/iso8601 GoDoc

The iso8601 sub-package exposes APIs to efficiently deal with with string representations of iso8601 dates.

Data formats like JSON have no syntaxes to represent dates, they are usually serialized and represented as a string value. In our experience, we often have to check whether a string value looks like a date, and either construct a time.Time by parsing it or simply treat it as a string. This check can be done by attempting to parse the value, and if it fails fallback to using the raw string. Unfortunately, while the happy path for time.Parse is fairly efficient, constructing errors is much slower and has a much bigger memory footprint.

We've developed fast iso8601 validation functions that cause no heap allocations to remediate this problem. We added a validation step to determine whether the value is a date representation or a simple string. This reduced CPU and memory usage by 5% in some programs that were doing time.Parse calls on very hot code paths.

encoding's People

Contributors

achille-roussel avatar chriso avatar dferstay avatar extemporalgenome avatar fenny avatar jerska avatar jnjackins avatar johngillott avatar kalamay avatar kelp avatar kevinburkesegment avatar lab176 avatar maggieyu-segment avatar pelletier avatar succo avatar tysonmote avatar varunwachaspati avatar yolken-segment avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

encoding's Issues

SIGILL on v0.2.10 and above on older hardware due to #66

On an x86-64 machine that does not support AVX2 (introduced in 2013), segmentio/encoding/ascii now crashes with:

SIGILL: illegal instruction
PC=0x6cf18b m=7 sigcode=2
instruction bytes: 0xc4 0xe2 0x7d 0x59 0xd0 0xc4 0xe2 0x7d 0x59 0xd9 0x48 0x83 0xf9 0x4 0xf 0x8c
goroutine 458 [running]:
github.com/segmentio/encoding/ascii.validPrint16(0xc00134e8ab, 0x2, 0x1)
        github.com/segmentio/[email protected]/ascii/valid_amd64.s:23 +0x4b fp=0xc00311ee48 sp=0xc00311ee40 pc=0x6cf18b

Line 23 of valid_amd64.s is "VPBROADCASTQ X0, Y2".

This is due to the use of AVX2 instructions on machines that don't support them; there's no guard here checking CPUID to fall back to older behavior that I can see. Can that be added?

Or if the intent to kill off support for x86-64 hardware produced between 2003 and 2013, please add this to the release notes and README.

Thanks!
-- Aaron

proto: unexpected fault address when deref struct field

func TestIssue110(t *testing.T) {
	type message struct {
                A *uint32 `protobuf:"fixed32,1,opt"`
	}

	var a uint32 = 0x41c06db4
	data, _ := Marshal(message{
                A: &a,
        })

	var m message
	err := Unmarshal(data, &m)
	if err != nil {
		t.Fatal(err)
	}
	if *m.A != 0x41c06db4 {
        t.Errorf("m.A mismatch, want 0x41c06db4 but got %x", m.A)
    }
}

expected pass test but got

unexpected fault address 0x41c06db4
fatal error: fault
[signal 0xc0000005 code=0x0 addr=0x41c06db4 pc=0x86ec3c]

goroutine 6 [running]:
runtime.throw({0x8be436, 0x89ce00})
	C:/Users/wdvxdr/sdk/go1.17.1/src/runtime/panic.go:1198 +0x76 fp=0xc00004deb8 sp=0xc00004de88 pc=0x738896
runtime.sigpanic()
	C:/Users/wdvxdr/sdk/go1.17.1/src/runtime/signal_windows.go:260 +0x10c fp=0xc00004df00 sp=0xc00004deb8 pc=0x74b52c
github.com/segmentio/encoding/proto.TestFix32Decode(0xc000037d40)
	C:/Users/wdvxdr/Documents/Project/encoding/proto/decode_test.go:164 +0xbc fp=0xc00004df70 sp=0xc00004df00 pc=0x86ec3c
testing.tRunner(0xc000037d40, 0x8cd970)
	C:/Users/wdvxdr/sdk/go1.17.1/src/testing/testing.go:1259 +0x102 fp=0xc00004dfc0 sp=0xc00004df70 pc=0x7f9dc2
testing.(*T).Run·dwrap·21()
	C:/Users/wdvxdr/sdk/go1.17.1/src/testing/testing.go:1306 +0x2a fp=0xc00004dfe0 sp=0xc00004dfc0 pc=0x7faaca
runtime.goexit()
	C:/Users/wdvxdr/sdk/go1.17.1/src/runtime/asm_amd64.s:1581 +0x1 fp=0xc00004dfe8 sp=0xc00004dfe0 pc=0x767801
created by testing.(*T).Run
	C:/Users/wdvxdr/sdk/go1.17.1/src/testing/testing.go:1306 +0x35a

goroutine 1 [chan receive]:
testing.(*T).Run(0xc000037ba0, {0x8c098e, 0x76a0d3}, 0x8cd970)
	C:/Users/wdvxdr/sdk/go1.17.1/src/testing/testing.go:1307 +0x375
testing.runTests.func1(0xc0000706c0)
	C:/Users/wdvxdr/sdk/go1.17.1/src/testing/testing.go:1598 +0x6e
testing.tRunner(0xc000037ba0, 0xc000079d18)
	C:/Users/wdvxdr/sdk/go1.17.1/src/testing/testing.go:1259 +0x102
testing.runTests(0xc000102080, {0xa1a4e0, 0x17, 0x17}, {0x78a74d, 0x8c0705, 0x0})
	C:/Users/wdvxdr/sdk/go1.17.1/src/testing/testing.go:1596 +0x43f
testing.(*M).Run(0xc000102080)
	C:/Users/wdvxdr/sdk/go1.17.1/src/testing/testing.go:1504 +0x51d
main.main()
	_testmain.go:119 +0x14b


Process finished with the exit code 1

Decoder hangs when decoding invalid json

func TestHangingDecoder(t *testing.T) {
	b := []byte(`{
	"userId": "blah",
	}`)

	d := NewDecoder(bytes.NewReader(b))

	var a struct {
		UserId string `json:"userId"`
	}
	err := d.Decode(&a)
	if err == nil {
		t.Fatal("should have errored")
	}
}

This test hangs on master. Light on details for now, will add more when I have them.

Looks like it's an issue in Decoder. It works fine with Unmarshal:

// this works
func TestHangingDecoder(t *testing.T) {
	b := []byte(`{
	"userId": "blah",
	}`)
	var a struct {
		UserId string `json:"userId"`
	}
	err := Unmarshal(b, &a)
	if err == nil {
		t.Fatal("should have errored")
	}
}

Marshal ignores fields on embedded struct when using an omitempty tag

If you use an embedded struct via a pointer and omitempty tag on one of struct fields, fields on embedded struct which also tagged with omitempty are ignored in the output json, even if there is a value.

Example code:

package main

import (
	"encoding/json"
	"fmt"
	segjson "github.com/segmentio/encoding/json"
)

type A struct {
	Surname string `json:"surname,omitempty"`
}

type B struct {
	*A
	MiddleName string `json:"middle-name,omitempty"`
	Name       string `json:"name"`
}

func main() {
	a := &A{Surname: "surname"}
	b := B{A: a, Name: "name"}

	r1, err := json.Marshal(b)
	fmt.Printf("json: %s, err: %v\n", r1, err)

	r2, err := segjson.Marshal(b)
	fmt.Printf("segmentio: %s, err: %v\n", r2, err)
}

Example output:

json: {"surname":"surname","name":"name"}, err: <nil>
segmentio: {"name":"name"}, err: <nil>

Notes:

  • Not reproduced if structure A is used by value and not by reference
  • Not reproduced if MiddleName field is declared after Name field
  • Not reproduced if MiddleName field has value

Go: go1.20
segmentio/encoding/json: v0.3.6

Proposal: ability to marshal with flags

Hi! Me again with another proposal!

json.Parse offers a readily available alternative to json.Unmarshal with the simple addition of flags.

In comparison, json.Append takes two extra parameters compared to json.Marshal: a []byte to fill and the flags.
This means that unfortunately, you can't benefit from the encoderBufferPool if you just want to add some AppendFlags.

This would give 3 functions from most to least flexible :

  • Append(b []byte, x interface {}, flags AppendFlags) - can use tags, no extra copy, but no buffer pool
  • MarshalWithFlags(x interface{}, flags AppendFlags) - can use tags, buffer pool & one extra copy
  • Marshal(x interface{}) - cannot use tags, buffer pool & one extra copy

Having to pre-allocate a buffer for Append might be quite a challenge, because you often don't know how long the resulting buffer will be.
For instance, allocating a full page like the buffer pool does is not necessarily a great option for small payloads.
Reimplementing a similar pool in our code is definitely an option, but i'd be great to simply be able to reuse the logic that's already there in the lib.

Would this be something you'd consider adding in? Again, happy to propose a PR.

json: Go 1.16 allows for ; in json tags

https://golang.org/doc/go1.16#encoding/json

package main

import (
	"encoding/json"
	"fmt"

	sjson "github.com/segmentio/encoding/json"
)

type S struct {
	F string `json:"a;b"`
}

func main() {
	data := []byte(`{"a;b": "hello"}`)

	var s1 S
	err1 := json.Unmarshal(data, &s1)
	fmt.Printf("stdlib: err=%s s=%s\n", err1, s1)

	var s2 S
	err2 := sjson.Unmarshal(data, &s2)
	fmt.Printf("segment: err=%s s=%s\n", err2, s2)

	if err1 != err2 {
		panic("err mismatch")
	}
	if s1 != s2 {
		panic("value mismatch")
	}
}

https://play.golang.com/p/l3RoNt3FYnB

~/tmp$ go version && go run main.go
go version go1.16.2 linux/amd64
stdlib: err=%!s(<nil>) s={hello}
segment: err=%!s(<nil>) s={}
panic: value mismatch

goroutine 1 [running]:
main.main()
	/home/thomas/tmp/main.go:29 +0x389
exit status 2
~/tmp$ go1.15 version && go1.15 run main.go
go version go1.15.10 linux/amd64
stdlib: err=%!s(<nil>) s={}
segment: err=%!s(<nil>) s={}

map: invalid keys sort

@achille-roussel Hello,

As I was reading the code of the package, I noticed the sortKeys function that is created based on the map key type to encode, and saw that you sort on the reflect.Value value rather than the string representation of the key.

I was perplex to find that as an optin functionality, since the package aims to be compliant with the standard library, unless I misunderstood the README.

The following example shows that the output is different when using a map with keys of type int: https://play.golang.org/p/1ddbUssdbaf

The only reference to a map with keys of signed/unsigned-integer kind I could find in the tests is at

map[int]bool{1: false, 42: true},
but the keys are already lexicographicaly sorted.

How to benchmark against other libs ?

Hello,

I just discovered this repository by chance while researching various Go JSON encoder/decoder implementations. I am however unsure how to benchmark this against other libs such as easyjson.

I noticed the benchmarks folder, however I am unsure how I am supposed to use it if at all. I am not really familiar with Makefile syntax.

Running make in benchmarks

dev@dev-u20d-a:~/tmp/encoding/benchmarks$ make
make: Nothing to be done for 'all'.
dev@dev-u20d-a:~/tmp/encoding/benchmarks$ make bench
go build -o results/benchmark ./cmd/benchmark
stat /home/dev/tmp/encoding/benchmarks/cmd/benchmark: directory not found
make: *** [Makefile:101: results/benchmark] Error 1

Any ideas?

marshal of nested structs results in panic

Description:
Attempting to marshal a nested struct of type struct{map[string]struct{map[string]string}} and similarly nested structs results in a panic.

Environment:
Architecture: amd64
OS: darwin and linux
go version: 1.13.5
json module version: 0.1.6

Example:
https://play.golang.org/p/UoCOVM8Lh8g

package main

import (
	"github.com/segmentio/encoding/json"
)

type testInner struct {
	InnerMap map[string]string `json:"inner_map"`
}

type testOuter struct {
	OuterMap map[string]testInner `json:"outer_map"`
}

func main() {
	t := testOuter{
		map[string]testInner{
			"outer": {
				map[string]string{
					"inner": "value",
				},
			},
		},
	}
	json.Marshal(t)
}

resulting stack trace:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x1 pc=0x100f42a]

goroutine 1 [running]:
reflect.maplen(0x1, 0xc000070100)
        /usr/local/go/src/runtime/map.go:1369 +0xa
reflect.Value.MapKeys(0x10f0e20, 0xc0000761b0, 0x195, 0x10f0e20, 0xc0000761b0, 0x195)
        /usr/local/go/src/reflect/value.go:1200 +0x20e
github.com/segmentio/encoding/json.encoder.encodeMap(0x3, 0xc0000a2000, 0x23, 0x1000, 0xc0000761b0, 0x1139600, 0x10f0e20, 0x111d790, 0x111d790, 0x111d5b8, ...)
        /Users/koreybarton/go/pkg/mod/github.com/segmentio/[email protected]/json/encode.go:323 +0xe2
github.com/segmentio/encoding/json.constructMapEncodeFunc.func1(0x3, 0xc0000a2000, 0x23, 0x1000, 0xc0000761b0, 0x100cfab, 0xc000074310, 0xc000080c38, 0x68779c2f88049301, 0x680000c00008c160)
        /Users/koreybarton/go/pkg/mod/github.com/segmentio/[email protected]/json/codec.go:412 +0x93
github.com/segmentio/encoding/json.encoder.encodeStruct(0x3, 0xc0000a2000, 0x16, 0x1000, 0xc0000761b0, 0xc0000700c0, 0x14, 0x5, 0x1115408, 0x1, ...)
        /Users/koreybarton/go/pkg/mod/github.com/segmentio/[email protected]/json/encode.go:557 +0x1de
github.com/segmentio/encoding/json.constructStructEncodeFunc.func1(0x3, 0xc0000a2000, 0x16, 0x1000, 0xc0000761b0, 0xc0000a2000, 0x15, 0x1000, 0x0, 0x0)
        /Users/koreybarton/go/pkg/mod/github.com/segmentio/[email protected]/json/codec.go:467 +0x66
github.com/segmentio/encoding/json.encoder.encodeMap(0x3, 0xc0000a2000, 0xd, 0x1000, 0xc0000b7e10, 0x1139600, 0x10f0dc0, 0x111d790, 0xc000074280, 0x111d5b8, ...)
        /Users/koreybarton/go/pkg/mod/github.com/segmentio/[email protected]/json/encode.go:345 +0x2ec
github.com/segmentio/encoding/json.constructMapEncodeFunc.func1(0x3, 0xc0000a2000, 0xd, 0x1000, 0xc0000b7e10, 0x1139600, 0x10f5ae0, 0xc000074300, 0xc0000742f0, 0x0)
        /Users/koreybarton/go/pkg/mod/github.com/segmentio/[email protected]/json/codec.go:412 +0x93
github.com/segmentio/encoding/json.encoder.encodeStruct(0x3, 0xc0000a2000, 0x0, 0x1000, 0xc0000b7e10, 0xc000070080, 0x0, 0x0, 0x0, 0x0, ...)
        /Users/koreybarton/go/pkg/mod/github.com/segmentio/[email protected]/json/encode.go:557 +0x1de
github.com/segmentio/encoding/json.constructStructEncodeFunc.func1(0x3, 0xc0000a2000, 0x0, 0x1000, 0xc0000b7e10, 0x0, 0x0, 0x0, 0x0, 0x0)
        /Users/koreybarton/go/pkg/mod/github.com/segmentio/[email protected]/json/codec.go:467 +0x66
github.com/segmentio/encoding/json.constructInlineValueEncodeFunc.func1(0x3, 0xc0000a2000, 0x0, 0x1000, 0xc000076180, 0x1000, 0xc0000a2000, 0xc0000a2000, 0xc00007ce78, 0x10609c6)
        /Users/koreybarton/go/pkg/mod/github.com/segmentio/[email protected]/json/codec.go:766 +0x5e
github.com/segmentio/encoding/json.Append(0xc0000a2000, 0x0, 0x1000, 0x10f5ae0, 0xc000076180, 0x3, 0xc000080c30, 0x680000000110a2a0, 0x1, 0x68779c2f880493bd, ...)
        /Users/koreybarton/go/pkg/mod/github.com/segmentio/[email protected]/json/json.go:135 +0x100
github.com/segmentio/encoding/json.Marshal(0x10f5ae0, 0xc000076180, 0x1115408, 0x5, 0xc000080cb8, 0x0, 0x0)
        /Users/koreybarton/go/pkg/mod/github.com/segmentio/[email protected]/json/json.go:160 +0x91
main.main()
        /Users/koreybarton/Downloads/seg-test/main.go:25 +0xec

Unmarshaling into an interface{} instead of structs

I'm trying to work with Amazon S3 Bucket Policies, which are polymorphic enough that not even the AWS SDK team has attempted to model the rules for them with Go structs. So I'm trying to decode the S3 Bucket Policy JSON into an interface{}.

Here's some rough code, extracted from my project.

type S3BucketPolicy interface{}

var bucketPolicy S3BucketPolicy

request := client.GetBucketPolicyRequest(&s3.GetBucketPolicyInput{
    Bucket: &bucketname,
})

// Errors are acceptable
response, _ := request.Send(ctx)

if response != nil && response.Policy != nil && *response.Policy != emptyString {
    if err := json.Unmarshal([]byte(*response.Policy), &bucketPolicy); err != nil {
        exitErrorf(err)
    }
}

Using encoding/json, this works just fine. However, after dropping-in github.com/segmentio/encoding/json as a replacement, I receive the following error:

Error: json: cannot unmarshal {"Version":"2012-10-17","Statement":[{"Sid":"AWSCloudTrailWrite","Effect":"Allow","Principal":{"Service":"cloudtrail.amazonaws.com"},"Action":"s3:PutObject","Resource":"arn:aws:s3:::BUCKET/AWSLogs/ACCOUNTID/*","Condition":{"StringEquals":{"s3:x-amz-acl":"bucket-owner-full-control"}}},{"Sid":"AWSLogDeliveryWrite","Effect":"Allow","Principal":{"Service":"delivery.logs.amazonaws.com"},"Action":"s3:PutObject","Resource":"arn:aws:s3:::BUCKET/AWSLogs/ACCOUNTID/*","Condition":{"StringEquals":{"s3:x-amz-acl":"bucket-owner-full-control"}}},{"Sid":"AWSLogDeliveryAclCheck","Effect":"Allow","Principal":{"Service":"delivery.logs.amazonaws.com"},"Action":"s3:GetBucketAcl","Resource":"arn:aws:s3:::BUCKET"},{"Sid":"AWSELBWrite","Effect":"Allow","Principal":{"AWS":"arn:aws:iam::ACCOUNTID:root"},"Action":"s3:PutObject","Resource":"arn:aws:s3:::BUCKET/*"},{"Sid":"AWSAclCheck","Effect":"Allow","Principal":{"Service":"cloudtrail.amazonaws.com","AWS":"arn:aws:iam::ACCOUNTID:user/logs"},"Action":"s3:GetBucketAcl","Resource":"arn:aws:s3:::BUCKET"},{"Sid":"AWSRedshiftWrite","Effect":"Allow","Principal":{"AWS":"arn:aws:iam::ACCOUNTID:user/logs"},"Action":"s3:PutObject","Resource":"arn:aws:s3:::BUCKET/*"}]} into Go value of type main.S3BucketPolicy

Here is the JSON (pretty printed), which is valid:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AWSCloudTrailWrite",
      "Effect": "Allow",
      "Principal": {
        "Service": "cloudtrail.amazonaws.com"
      },
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::BUCKET/AWSLogs/ACCOUNTID/*",
      "Condition": {
        "StringEquals": {
          "s3:x-amz-acl": "bucket-owner-full-control"
        }
      }
    },
    {
      "Sid": "AWSLogDeliveryWrite",
      "Effect": "Allow",
      "Principal": {
        "Service": "delivery.logs.amazonaws.com"
      },
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::BUCKET/AWSLogs/ACCOUNTID/*",
      "Condition": {
        "StringEquals": {
          "s3:x-amz-acl": "bucket-owner-full-control"
        }
      }
    },
    {
      "Sid": "AWSLogDeliveryAclCheck",
      "Effect": "Allow",
      "Principal": {
        "Service": "delivery.logs.amazonaws.com"
      },
      "Action": "s3:GetBucketAcl",
      "Resource": "arn:aws:s3:::BUCKET"
    },
    {
      "Sid": "AWSELBWrite",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::ACCOUNTID:root"
      },
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::BUCKET/*"
    },
    {
      "Sid": "AWSAclCheck",
      "Effect": "Allow",
      "Principal": {
        "Service": "cloudtrail.amazonaws.com",
        "AWS": "arn:aws:iam::ACCOUNTID:user/logs"
      },
      "Action": "s3:GetBucketAcl",
      "Resource": "arn:aws:s3:::BUCKET"
    },
    {
      "Sid": "AWSRedshiftWrite",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::ACCOUNTID:user/logs"
      },
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::BUCKET/*"
    }
  ]
}

Is this use-case supported?

Proposal: add an option to prevent encoder from adding a newline on Encode

Hi there!

Thanks for this amazing lib that allowed me to get close to 10x perf improvements on a project for both Marshaling & Unmarshaling use-cases.

A small suggestion I'd have following my usage of library would be to add an option to prevent adding a newline at the end of Encode.
For manual crafting of a JSON without newlines, this behavior requires to add a Truncate call on the byte buffer that is both easy to forget & confusing for readers.

val := "foo"

var buf bytes.Buffer
buf.WriteByte('[')
encoder := json.NewEncoder(&buf)
encoder.Encode(val)
buf.Truncate(buf.Len() - 1) // Remove useless newline
buf.WriteByte(']')

// Outputs:
// ["foo"]

// Without the trucate, outputs:
// ["foo"
// ]

https://go.dev/play/p/4Jkoe_A6sk0

Your version of Encoder already has a few non-standard options (e.g. SetTrustRawMessage), so I believe this could be another one, e.g. SetNoExtraNewline.
Is this an addition you'd be open to? I'd be happy to open a PR.

Marshal silently encodes bad JSON for embedded null pointer structs (works in encoding/json)

While playing around with struct embedding and omitempty I discovered Marshal emits an erroneous comma for embedded null pointer structs (unless they're first, maybe in other situations too).

See https://play.golang.org/p/21NWHuHchQq or the example below.

package main

import (
	"encoding/json"
	"fmt"
	segjson "github.com/segmentio/encoding/json"
)

type N struct {
	Zero int
	*One
	Two int
}

type One struct {
	One int
}

func main() {
	data, err := segjson.Marshal(N{Two: 2})
	fmt.Println(string(data))
	fmt.Println(err)
	
	data, err = json.Marshal(N{Two: 2})
	fmt.Println(string(data))
	fmt.Println(err)
}

The above produces this output:

{"Zero":0,,"Two":2}
<nil>
{"Zero":0,"Two":2}
<nil>

Different output when using anonymous fields and omitempty tags

Hello,

I recently tried using your library in one of our software, and I'd like to report an issue that we have when doing so.

The output when calling Marshal on encoding/json and segmentio/encoding/json differs.

I was able to reproduce the error using the following code:

package main

import (
	"encoding/json"
	"fmt"
	segmentio "github.com/segmentio/encoding/json"
)

type MyStruct struct {
	MyField string `json:"my_field,omitempty"`
}

type MyStruct2 struct {
	*MyStruct
	Code int `json:"code"`
}

func main() {
	value := MyStruct2{
		MyStruct: &MyStruct{
			MyField: "test",
		},
		Code: 0,
	}

	res, _ := json.Marshal(value)
	fmt.Println(string(res)) // Prints {"my_field":"test","code":0}

	res, _ = segmentio.Marshal(value)
	fmt.Println(string(res)) // Prints {"code":0} - Not correct

	value2 := MyStruct2{
		MyStruct: &MyStruct{
			MyField: "test",
		},
		Code:     10,
	}

	res, _ = json.Marshal(value2)
	fmt.Println(string(res)) // Prints {"my_field":"test","code":10}

	res, _ = segmentio.Marshal(value2)
	fmt.Println(string(res)) // Prints {"my_field":"test","code":10}
}

Note that when removing the omitempty tag from MyStruct.MyField, the output is correct.

Segmentio Encoding Version: 0.2.7
Golang Version: 1.15.7
OS: macOS Catalina 10.15.7

General Library

Hi, in order to reduce complexity, would it be convenient for you to make a generic library?
Function is to realize the structure to the n-ary tree storage, the subsequent implementation of new codec algorithms only need to traverse the n-ary tree to complete, which greatly reduces the complexity of the
naryTree

json.Unescape has issues with UNC paths

I have a test string that has a value with an escaped UNC path in it .
It validates ok according to: https://jsonlint.com/

"{\"dataPath\":\"\\\\bnk11977fs\\bnk11977\\xyz11146682\\xyzdata\\\"}"

When you pass it through json.Unescape the result is invalid json

See: playground

Highlights from the stdout are:

raw sample0 "\"{\\\"dataPath\\\":\\\"\\\\\\\\bnk11977fs\\\\bnk11977\\\\be11146682\\\\encompassdata\\\\\\\"}\""

unescape for sample0 was invalidated "{\"dataPath\":\"\\\\bnk11977fs\\bnk11977\\be11146682\\encompassdata\\\"}" ; error=json: invalid character 'e' in string escape code: "\\bnk11977fs\bnk11977\be1114668...

You may ask, "If the original is valid json, then why do you need to unescape it anyway?"

Answer: The raw escaped form comes from reading a logging entry.
A log processor service (lambda) needs to create output without all the escapes, else another downstream reader (logstash ingest) see's the dcument event as a big ass string and not a nice nested object.

So the goal is to be able to have the lambda processor write the line something like the following.
yes the UNC and escapes are a bitch

And this is a simple example. Imagine a raw json with 100 deply nested fields.
You certinly can't just blind Replacement of backslash-quote with quote !

{"dataPath":"\\bnk11977fs\bnk11977\be11146682\encompassdata\"}

Different error message with encoding/json

Here is an example
json.go

package main

import (
	"encoding/json"
	"fmt"
	"math"
)

type A struct {
	B float64
}

func main() {
	c := A{
		B: math.NaN(),
	}
	_, err := json.Marshal(c)
	fmt.Println(err)
}
diff json.go json_segmentio.go
< 	"encoding/json"
---
> 	"github.com/segmentio/encoding/json"

The output of json.go:

json: unsupported value: NaN

The output of json_segmentio.go:

json: unsupported value: unsupported value

"checkptr: pointer arithmetic result points to invalid allocation" caused by #81

Using segmentio/encoding v0.2.16 on both MacOS and Linux with Go 1.16.2, I get fatal error: checkptr: pointer arithmetic result points to invalid allocation from Go's race detector pointing a finger at ascii/valid_print.go:31 with certain input to json.Marshal(), suggesting it is related to #81.

Here's simplified test code to trigger it:

package main

import (
	"testing"

	"github.com/segmentio/encoding/json"
)

type Foo struct {
	Source struct {
		Table string
	}
}

func TestUnmarshal(t *testing.T) {
	input := []byte(`{"source": {"table": "1234567"}}`)
	r := &Foo{}
	json.Unmarshal(input, r)
}

Run:

go mod init segtest
go mod tidy
go test -v -race -trimpath ./...

And you should see:

fatal error: checkptr: pointer arithmetic result points to invalid allocation

goroutine 19 [running]:
runtime.throw(0x12669e8, 0x40)
	runtime/panic.go:1117 +0x72 fp=0xc000042b28 sp=0xc000042af8 pc=0x1078352
runtime.checkptrArithmetic(0xc00012a060, 0xc000042bc8, 0x1, 0x1)
	runtime/checkptr.go:43 +0xbe fp=0xc000042b58 sp=0xc000042b28 pc=0x1047b7e
github.com/segmentio/encoding/ascii.ValidPrintString(0xc00012a040, 0x20, 0x1)
	github.com/segmentio/[email protected]/ascii/valid_print.go:31 +0x4e5 fp=0xc000042be0 sp=0xc000042b58 pc=0x11d2725
github.com/segmentio/encoding/ascii.ValidPrint(...)
	github.com/segmentio/[email protected]/ascii/valid_print.go:8
github.com/segmentio/encoding/json.internalParseFlags(0xc00012a040, 0x20, 0x20, 0xc000121380)
	github.com/segmentio/[email protected]/json/parse.go:33 +0xcb fp=0xc000042c68 sp=0xc000042be0 pc=0x120420b
github.com/segmentio/encoding/json.Parse(0xc00012a040, 0x20, 0x20, 0x1223620, 0xc00011a520, 0x0, 0x10, 0xc000030640, 0xc00011a520, 0x138f720, ...)
	github.com/segmentio/[email protected]/json/json.go:303 +0x114 fp=0xc000042dd0 sp=0xc000042c68 pc=0x1203a94
github.com/segmentio/encoding/json.Unmarshal(0xc00012a040, 0x20, 0x20, 0x1223620, 0xc00011a520, 0x0, 0x0)
	github.com/segmentio/[email protected]/json/json.go:285 +0x93 fp=0xc000042e78 sp=0xc000042dd0 pc=0x1203833
segtest.TestUnmarshal(0xc000102900)
	segtest/main_test.go:18 +0x108 fp=0xc000042ed0 sp=0xc000042e78 pc=0x1213b68

If you can't reproduce, I'd be happy to provide more detail.

json: stack overflow when encoding cyclic structures

Encoding a cyclic data structure results in a stack overflow. This is an invalid operation, but the behavior here is different than the stdlib that detects it and returns an error.


Program to reproduce:

package main

import (
	"fmt"

	"github.com/segmentio/encoding/json"
)

type Foo struct {
	Bar *Bar
}

type Bar struct {
	Foo *Foo
}

func main() {
	f := &Foo{}
	b := &Bar{}
	f.Bar = b
	b.Foo = f

	_, err := json.Marshal(f)
	fmt.Println(err)
}

Crash in that example:

$ go run main.go           
runtime: goroutine stack exceeds 1000000000-byte limit
runtime: sp=0xc0201603e0 stack=[0xc020160000, 0xc040160000]
fatal error: stack overflow

runtime stack:
runtime.throw({0x4eafe9, 0x59f520})
	/home/thomas/src/github.com/golang/go/src/runtime/panic.go:1198 +0x71
runtime.newstack()
	/home/thomas/src/github.com/golang/go/src/runtime/stack.go:1088 +0x5ac
runtime.morestack()
	/home/thomas/src/github.com/golang/go/src/runtime/asm_amd64.s:461 +0x8b

goroutine 1 [running]:
github.com/segmentio/encoding/json.encoder.encodeStruct({0x3}, {0xc002568000, 0x873eaf, 0x8ac000}, 0xc00000e030, 0xc00007e240)
	/home/thomas/pkg/mod/github.com/segmentio/[email protected]/json/encode.go:758 +0x47a fp=0xc0201603f0 sp=0xc0201603e8 pc=0x4c185a
github.com/segmentio/encoding/json.constructStructEncodeFunc.func1({0x0}, {0xc002568000, 0x0, 0x0}, 0x0)
	/home/thomas/pkg/mod/github.com/segmentio/[email protected]/json/codec.go:497 +0x25 fp=0xc020160430 sp=0xc0201603f0 pc=0x4ada05
github.com/segmentio/encoding/json.encoder.encodePointer({0x0}, {0xc002568000, 0x0, 0x0}, 0x0, {0x0, 0x0}, 0x0)
	/home/thomas/pkg/mod/github.com/segmentio/[email protected]/json/encode.go:818 +0x7c fp=0xc020160470 sp=0xc020160430 pc=0x4c19fc
github.com/segmentio/encoding/json.constructPointerEncodeFunc.func1({0x0}, {0xc002568000, 0x0, 0x0}, 0x0)
	/home/thomas/pkg/mod/github.com/segmentio/[email protected]/json/codec.go:738 +0x2a fp=0xc0201604c0 sp=0xc020160470 pc=0x4af5ea
github.com/segmentio/encoding/json.encoder.encodeStruct({0x0}, {0xc002568000, 0x873ea8, 0x0}, 0xc00000e028, 0xc00007e1e0)
	/home/thomas/pkg/mod/github.com/segmentio/[email protected]/json/encode.go:789 +0x307 fp=0xc020160598 sp=0xc0201604c0 pc=0x4c16e7
github.com/segmentio/encoding/json.constructStructEncodeFunc.func1({0x0}, {0xc002568000, 0x0, 0x0}, 0x0)
	/home/thomas/pkg/mod/github.com/segmentio/[email protected]/json/codec.go:497 +0x25 fp=0xc0201605d8 sp=0xc020160598 pc=0x4ada05
github.com/segmentio/encoding/json.encoder.encodePointer({0x0}, {0xc002568000, 0x0, 0x0}, 0x0, {0x0, 0x0}, 0x0)
	/home/thomas/pkg/mod/github.com/segmentio/[email protected]/json/encode.go:818 +0x7c fp=0xc020160618 sp=0xc0201605d8 pc=0x4c19fc
...additional frames elided...
exit status 2

Output using stdlib's encoding/json:

$ go run main.go           
json: unsupported value: encountered a cycle via *main.Foo

Stdlib implementation: https://cs.opensource.google/go/go/+/refs/tags/go1.17.3:src/encoding/json/encode.go;l=784-793;drc=refs%2Ftags%2Fgo1.17.3

Missing Function Body

I got this error in go version go1.11beta2 using github.com/go-pg/pg
Run ok back then, but after update this error show up
src/github.com/segmentio/encoding/json/reflect_optimize.go:15:6: missing function body

Can't reproduce speed/memory benefits in benchmarks

I tried this library as a drop in stdlib replacement and found that with our data, it was slightly worse than stdlib in both memory and speed.

So I thought OK, benchmarks are highly dependent on the specific data used, I'll try with the sample data used by this project. To my surprise the results were even worse -- this library seems to use more memory than stdlib and perform more slowly.

Then I noticed your README benchmarks were with Go 1.16.2, so I tried with that. Same outcome.

I feel like I must be doing something really wrong, so I've put together a repo with the code and some of the benchmark stats I got at https://github.com/lpar/segmentio

JSON tests fail on Go 1.13

Hi @achille-roussel, first I want to say you did an awesome job with this library 💯

According to the go.mod file this package should be compatible with Go 1.13, but when I run go test ./... -v -race on this version it seems to fail. Should this be correct since we are only testing on Go 1.14 https://github.com/segmentio/encoding/blob/master/.circleci/config.yml#L5?

=== RUN   TestCompact
--- FAIL: TestCompact (0.00s)
##[error]    golang_scanner_test.go:73: Compact(`{"":"<>&

"}`) = `{"":"<>&\u2028\u2029"}`, want original
##[error]    golang_scanner_test.go:81: Compact("{\n\t\"\": \"<>&\u2028\u2029\"\n}") = `{"":"<>&\u2028\u2029"}`, want `{"":"<>&

"}`
=== RUN   TestCompactSeparators
--- FAIL: TestCompactSeparators (0.00s)
##[error]    golang_scanner_test.go:100: Compact("{\"\u2028\": 1}") = "{\"\\u2028\":1}", want "{\"\u2028\":1}"
##[error]    golang_scanner_test.go:100: Compact("{\"\u2029\" :2}") = "{\"\\u2029\":2}", want "{\"\u2029\":2}"

=== RUN   TestGithubIssue11
--- PASS: TestGithubIssue11 (0.00s)
##[error]    json_test.go:1228: json: unsupported value: NaN
=== RUN   TestGithubIssue13
--- PASS: TestGithubIssue13 (0.00s)
##[error]    json_test.go:1249: {"Stringer":null,"MyInt":0}

--- PASS: TestDecodeLines (0.00s)
    --- PASS: TestDecodeLines/bare_object (0.00s)
    --- PASS: TestDecodeLines/multiple_objects_on_one_line (0.00s)
    --- PASS: TestDecodeLines/object_spanning_multiple_lines (0.00s)
    --- PASS: TestDecodeLines/trailing_newline (0.00s)
    --- PASS: TestDecodeLines/multiple_trailing_newlines (0.00s)
    --- PASS: TestDecodeLines/blank_lines (0.00s)
    --- PASS: TestDecodeLines/no_trailing_newline (0.00s)
    --- PASS: TestDecodeLines/leading_whitespace (0.00s)
    --- PASS: TestDecodeLines/one_object,_multiple_reads (0.00s)
    --- PASS: TestDecodeLines/one_object_+_EOF (0.00s)
    --- PASS: TestDecodeLines/leading_whitespace_+_EOF (0.00s)
    --- PASS: TestDecodeLines/multiple_objects_+_EOF (0.00s)
    --- PASS: TestDecodeLines/one_object_+_multiple_reads_+_EOF (0.00s)
    --- PASS: TestDecodeLines/multiple_objects_+_multiple_reads_+_EOF (0.00s)
    --- PASS: TestDecodeLines/unmarshal_error_while_decoding (0.00s)
##[error]        json_test.go:735: unmarshal error json: cannot unmarshal "42}" into Go struct field json.obj.Good. of type bool
    --- PASS: TestDecodeLines/unmarshal_error_while_decoding_last_object (0.00s)
##[error]        json_test.go:735: unmarshal error json: cannot unmarshal "42}" into Go struct field json.obj.Good. of type bool

Encoding a struct with error type field fails in various ways

While encoding a struct containing error type I get panic or error(json: unsupported type: ).

Code:

package main

import (
	"bytes"
	stdjson "encoding/json"
	"errors"
	"fmt"

	segjson "github.com/segmentio/encoding/json"
)

type Column struct {
	Name      string `json:"name"`
	FieldName string `json:"fieldName"`
}

type QueryResult struct {
	Columns []Column                 `json:"columns"`
	Rows    []map[string]interface{} `json:"rows"`
}

type QueryError struct {
	Message string `json:"message"`
	Err     error  `json:"err"`
}

type GroupQueryResult struct {
	GroupId int          `json:"groupId"`
	Data    *QueryResult `json:"data"`
	Error   *QueryError  `json:"error"`
}

func main() {
	v := GroupQueryResult{GroupId: 1, Error: &QueryError{Message: "error message", Err: errors.New("will fail")}}
	fmt.Println("stdEncode: " + stdEncode(v))
	fmt.Println("stdMarshal: " + stdMarshal(v))
	fmt.Println("segEncode: " + segEncode(v))
	fmt.Println("segMarshal: " + segMarshal(v))
}

func stdEncode(v GroupQueryResult) string {
	buf := &bytes.Buffer{}
	enc := stdjson.NewEncoder(buf)
	enc.SetEscapeHTML(true)
	if err := enc.Encode(v); err != nil {
		return err.Error()
	}
	return string(buf.Bytes())
}

func stdMarshal(v GroupQueryResult) string {
	jsonString, err := stdjson.Marshal(v)
	if err != nil {
		return err.Error()
	}
	return string(jsonString)
}

func segEncode(v GroupQueryResult) string {
	buf := &bytes.Buffer{}
	enc := segjson.NewEncoder(buf)
	enc.SetEscapeHTML(true)
	if err := enc.Encode(v); err != nil {
		return err.Error()
	}
	return string(buf.Bytes())
}

func segMarshal(v GroupQueryResult) string {
	jsonString, err := segjson.Marshal(v)
	if err != nil {
		return err.Error()
	}
	return string(jsonString)
}

While running given code(go version 1.13.6, library version 0.1.9)
in go playground i get:

stdEncode: {"groupId":1,"data":null,"error":{"message":"error message","err":{}}}
stdMarshal: {"groupId":1,"data":null,"error":{"message":"error message","err":{}}}
segEncode: json: unsupported type: 
segMarshal: json: unsupported type:

on my pc (windows, amd64) i get:

stdEncode: {"groupId":1,"data":null,"error":{"message":"error message","err":{}}}
stdMarshal: {"groupId":1,"data":null,"error":{"message":"error message","err":{}}}
runtime: nameOff 0x50e5e0 out of range 0x4ee000 - 0x55cfe3
fatal error: runtime: name offset out of range

goroutine 1 [running]:
runtime.throw(0x53900d, 0x21)
	C:/Go/src/runtime/panic.go:774 +0x79 fp=0xc000089598 sp=0xc000089568 pc=0x42e639
runtime.resolveNameOff(0x557cc0, 0x1000050e5e0, 0x4f7f78)
	C:/Go/src/runtime/type.go:190 +0x2ee fp=0xc000089600 sp=0xc000089598 pc=0x44e68e
reflect.resolveNameOff(0x557cc0, 0x50e5e0, 0xc0000426d0)
	C:/Go/src/runtime/runtime1.go:478 +0x3a fp=0xc000089628 sp=0xc000089600 pc=0x43ccda
reflect.(*rtype).nameOff(...)
	C:/Go/src/reflect/type.go:691
reflect.(*rtype).String(0x557cc0, 0x52b980, 0x557cc0)
	C:/Go/src/reflect/type.go:761 +0x3d fp=0xc000089660 sp=0xc000089628 pc=0x47914d
reflect.(*rtype).ptrTo(0x557cc0, 0x557cc0)
	C:/Go/src/reflect/type.go:1398 +0x90 fp=0xc000089708 sp=0xc000089660 pc=0x47c5d0
reflect.PtrTo(...)
	C:/Go/src/reflect/type.go:1384
github.com/segmentio/encoding/json.constructCodec(0x55a140, 0x557cc0, 0xc0000897b0, 0x0, 0x0, 0x0)
	C:/Users/Mynde/go/pkg/mod/github.com/segmentio/[email protected]/json/codec.go:180 +0x14d fp=0xc000089760 sp=0xc000089708 pc=0x4c9d6d
github.com/segmentio/encoding/json.constructCachedCodec(0x55a140, 0x557cc0, 0xc00006ac00, 0x6320e0, 0x100000000000000)
	C:/Users/Mynde/go/pkg/mod/github.com/segmentio/[email protected]/json/codec.go:62 +0xe4 fp=0xc0000898c0 sp=0xc000089760 pc=0x4c9b54
github.com/segmentio/encoding/json.Append(0xc00009c000, 0x42, 0x1000, 0x557cc0, 0xc000042230, 0x3, 0x3a, 0xd, 0x5339e9, 0xc00000a517, ...)
	C:/Users/Mynde/go/pkg/mod/github.com/segmentio/[email protected]/json/json.go:132 +0x187 fp=0xc000089958 sp=0xc0000898c0 pc=0x4e3107
github.com/segmentio/encoding/json.encoder.encodeInterface(0x3, 0xc00009c000, 0x42, 0x1000, 0xc0000044f0, 0xc00009c000, 0x3b, 0x1000, 0x0, 0x0)
	C:/Users/Mynde/go/pkg/mod/github.com/segmentio/[email protected]/json/encode.go:592 +0x70 fp=0xc0000899c0 sp=0xc000089958 pc=0x4e18e0
github.com/segmentio/encoding/json.encoder.encodeStruct(0x3, 0xc00009c000, 0x21, 0x1000, 0xc0000044e0, 0xc00001a280, 0x51cae0, 0x403b34, 0x6132f0, 0xc00006ac00, ...)
	C:/Users/Mynde/go/pkg/mod/github.com/segmentio/[email protected]/json/encode.go:557 +0x1e5 fp=0xc000089a80 sp=0xc0000899c0 pc=0x4e1215
github.com/segmentio/encoding/json.constructStructEncodeFunc.func1(0x3, 0xc00009c000, 0x21, 0x1000, 0xc0000044e0, 0x50f8c0, 0x4fd500, 0x0, 0x0, 0x0)
	C:/Users/Mynde/go/pkg/mod/github.com/segmentio/[email protected]/json/codec.go:466 +0x6d fp=0xc000089ae8 sp=0xc000089a80 pc=0x4eb35d
github.com/segmentio/encoding/json.encoder.encodePointer(0x3, 0xc00009c000, 0x21, 0x1000, 0xc000004950, 0x55a140, 0x514e80, 0xc000042740, 0xc00009c000, 0x18, ...)
	C:/Users/Mynde/go/pkg/mod/github.com/segmentio/[email protected]/json/encode.go:586 +0x10c fp=0xc000089b50 sp=0xc000089ae8 pc=0x4e180c
github.com/segmentio/encoding/json.constructPointerEncodeFunc.func1(0x3, 0xc00009c000, 0x21, 0x1000, 0xc000004950, 0xc00009c000, 0x18, 0x1000, 0x0, 0x0)
	C:/Users/Mynde/go/pkg/mod/github.com/segmentio/[email protected]/json/codec.go:704 +0x85 fp=0xc000089bc8 sp=0xc000089b50 pc=0x4eb765
github.com/segmentio/encoding/json.encoder.encodeStruct(0x3, 0xc00009c000, 0x0, 0x1000, 0xc000004940, 0xc00001a180, 0x0, 0x0, 0x0, 0x0, ...)
	C:/Users/Mynde/go/pkg/mod/github.com/segmentio/[email protected]/json/encode.go:557 +0x1e5 fp=0xc000089c88 sp=0xc000089bc8 pc=0x4e1215
github.com/segmentio/encoding/json.constructStructEncodeFunc.func1(0x3, 0xc00009c000, 0x0, 0x1000, 0xc000004940, 0x1000, 0xc00009c000, 0xc00009c000, 0xc000089d78, 0x4653ed)
	C:/Users/Mynde/go/pkg/mod/github.com/segmentio/[email protected]/json/codec.go:466 +0x6d fp=0xc000089cf0 sp=0xc000089c88 pc=0x4eb35d
github.com/segmentio/encoding/json.Append(0xc00009c000, 0x0, 0x1000, 0x519de0, 0xc000004940, 0x3, 0xc000089f90, 0xc00008a280, 0xc000089e00, 0x409926, ...)
	C:/Users/Mynde/go/pkg/mod/github.com/segmentio/[email protected]/json/json.go:135 +0x107 fp=0xc000089d88 sp=0xc000089cf0 pc=0x4e3087
github.com/segmentio/encoding/json.(*Encoder).Encode(0xc000089e60, 0x519de0, 0xc000004940, 0xc000004940, 0x60cf40)
	C:/Users/Mynde/go/pkg/mod/github.com/segmentio/[email protected]/json/json.go:372 +0xb9 fp=0xc000089e10 sp=0xc000089d88 pc=0x4e38a9
main.segEncode(0x1, 0x0, 0xc0000044e0, 0x1, 0x1)
	C:/WorkGo/src/segmenio-json-test/main.go:63 +0x119 fp=0xc000089ec0 sp=0xc000089e10 pc=0x4eda09
main.main()
	C:/WorkGo/src/segmenio-json-test/main.go:37 +0x243 fp=0xc000089f60 sp=0xc000089ec0 pc=0x4ed473
runtime.main()
	C:/Go/src/runtime/proc.go:203 +0x21e fp=0xc000089fe0 sp=0xc000089f60 pc=0x43001e
runtime.goexit()
	C:/Go/src/runtime/asm_amd64.s:1357 +0x1 fp=0xc000089fe8 sp=0xc000089fe0 pc=0x456841

proto: StartGroup

I see that StartGroup is not defined:

encoding/proto/message.go

Lines 142 to 149 in 101dc9c

Varint WireType = 0
Fixed64 WireType = 1
Varlen WireType = 2
Fixed32 WireType = 5
// Wire types 3 and 4 were used for StartGroup and EndGroup, but are
// deprecated so we don't expose them here.
//
// https://developers.google.com/protocol-buffers/docs/encoding#structure

I think this is a problem, as some servers are still using it, for example android.clients.google.com.

json.Unmarshaler difference: UnmarshalJSON([]byte("null")) skipped

Go 1.14.8 linux/amd64

v0.1.15 does not use my custom UnmarshalJSON() for []byte("null")

Documentation says:

By convention, to approximate the behavior of Unmarshal itself, Unmarshalers implement UnmarshalJSON([]byte("null")) as a no-op.

encoding/json calls it, github.com/segmentio/encoding/json skips it.

package main

import (
	//"encoding/json"
	"fmt"
	"github.com/segmentio/encoding/json"
)

type RawJsonString string

func (r *RawJsonString) UnmarshalJSON(b []byte) error {
	fmt.Printf("UnmarshalJSON: %s\n", string(b))
	if len(b) == 0 {
		*r = "null"
	} else {
		*r = RawJsonString(b)
	}
	return nil
}

func main() {
	var out RawJsonString
	if err := json.Unmarshal([]byte("null"), &out); err != nil {
		panic(err)
	}
	fmt.Printf("out = %#v\n", out)
}

encoding/json:

  UnmarshalJSON: null
  out = "null"

github.com/segmentio/encoding/json:

  out = ""

json.UnmarshalTypeError appends extra dot

Hey, guys! We met an issue(gofiber/fiber#1012). And it turns out that json.UnmarshalTypeError doesn't have same behavior with standard "encoding/json".

Here is the reproduction:

With "encoding/json" playgrould

package main

import (
	"encoding/json"
	"fmt"
)

type Cc struct {
	Id   int    `json:"id" xml:"id" form:"id"`
	Name string `json:"name" xml:"name" form:"name"`
}

func main() {
	var s Cc
	err := json.Unmarshal([]byte(`{ "id":"123" }`), &s)
	switch err.(type) {
	case *json.UnmarshalTypeError:
		unmarshalTypeError := err.(*json.UnmarshalTypeError)
		fmt.Printf("%v\n", unmarshalTypeError.Field)
	}
}

Output is id

With "github.com/segmentio/encoding/json" playground

package main

import (
	"github.com/segmentio/encoding/json"
	"fmt"
)

type Cc struct {
	Id   int    `json:"id" xml:"id" form:"id"`
	Name string `json:"name" xml:"name" form:"name"`
}

func main() {
	var s Cc
	err := json.Unmarshal([]byte(`{ "id":"123" }`), &s)
	switch err.(type) {
	case *json.UnmarshalTypeError:
		unmarshalTypeError := err.(*json.UnmarshalTypeError)
		fmt.Printf("%v\n", unmarshalTypeError.Field)
	}
}

Output is id.

Edit:
I'm sorry, it seems to be kind of duplicated with #40

unable to run benchmarks

I'm attempting to run bench-simple but for some reason it's not producing any output.

$ make bench-simple
GO111MODULE=off go install golang.org/x/tools/cmd/benchcmp
PASS
ok      github.com/segmentio/encoding/json      17.447s
PASS
ok      github.com/segmentio/encoding/json      18.226s
benchcmp encoding-json.txt segmentio-encoding-json.txt
benchcmp: no repeated benchmarks
make: *** [Makefile:26: bench-simple] Error 1

if I replace -bench /codeResponse with -bench BenchmarkMarshal/int for example it works fine, I get the benchmark output and benchcmp works too.

marshaling byte slice ignores json.Marshaler/encoding.TextMarshaler interfaces

When a codec is created for a byte-slice type at

case bytesType:

it ignores the fact that the type could be implementing the json.Marshaler or encoding.TextMarshaler interfaces. This should be considered in priority, to be compliant with the standard library.

The following playground demonstrates the issue:
https://play.golang.org/p/Bm0d3jsWkZf

The relevant code can be found in the standard library at https://github.com/golang/go/blob/da4d58587e0e4028ea384580053c3c455127e446/src/encoding/json/encode.go#L415
The function newTypeEncoder checks in priority if the reflect.Value implements one of the interfaces.

handling of time.Duration deviates from stdlib

This library currently encodes and decodes time.Duration values as strings. This deviates from the stdlib, as the type time.Duration does not implement json.Marshaller and is treated as an int64 instead.

Since this library implements this custom logic for both encode and decode, using it on both sides of an exchange of data is no problem. However, when interacting with other decoders (including the stdlib, but also extends to other languages/tools) the difference can be breaking.

My understanding is that this particular behavior is relied on heavily within Segment, so simply changing it to adhere strictly to the stdlib will be breaking for us.

There is currently no way to turn this functionality off, so in the short-term we should expose a config flag to support this. Over the long-term though, matching the stdlib should be the default and the flag should instead enable this additional functionality.

Marshalling fails with Interface members when they are nil

I was migrating some marshalling code to use this library over the standard library (thank you for making it!) and encountered this error. Note that the stdlib reports a nil error with data, while segmentio/encoding/json returns an error. i

Minimal repro:

package main

import (
	"fmt"

	"github.com/segmentio/encoding/json"
)

type segmentJSON struct {
	Stringer fmt.Stringer
	Field    int `json:"MyInt"`
}

func main() {
	fmt.Println("segment-error")
	js := segmentJSON{Field: 2}
	fmt.Println(json.Marshal(js))
}

Output when using encoding/json:

❯ go run main.go
segment-error
[123 34 83 116 114 105 110 103 101 114 34 58 110 117 108 108 44 34 77 121 73 110 116 34 58 50 125] <nil>

with github.com/segmentio/encoding/json:

❯ go run main.go
segment-error
[] json: unsupported type: fmt.Stringer

Found pointer to free object

We've been seeing some occasional restarts in our orchestrator service with added load. Looking at the logs, it seems to be a free object pointer:

0xc01f75bff0 alloc unmarked
fatal error: found pointer to free object

This matches the issue identified in https://github.com/segmentio/webhook-functions-consumer/pull/173 and I can also find the segmentio/encoding library in my stack trace. I updated my service to use the stdlib json library and have not seen this issue since.

decoding of 'missing' json values goes totally undetected

problem

in the example below there is an array of Example structs for which the second contains no values {}. The decoder does not complain and constrcuts the missing values with the default values for the respective types.

for me this goes against intuition of the programmer: a missing field should result in a decoder error, as: Field "event" was expected but not given.

func TestDecoderUndefinedJsonKeys(t *testing.T) {
	eventJSON := `[{"event":"foobar", "id": 55}, {}]`

	type Example struct {
		Event string `json:"event"`
		ID    uint64 `json:"id"`
	}

	var events []Example

	reader := strings.NewReader(eventJSON)
	decoder := json.NewDecoder(reader)
	decoder.UseNumber()
	decErr := decoder.Decode(&events)
	if decErr != nil {
		assert.NoError(t, decErr)
		return
	}
	assert.Equal(t, int(events[0].ID), 55)
	assert.Equal(t, events[0].Event, "foobar")

	// this confuses me
	assert.Equal(t, int(events[1].ID), 0)
	assert.Equal(t, events[1].Event, "")
}

this will execute as:

TestDecoderUndefinedJsonKeys
=== RUN   TestDecoderUndefinedJsonKeys
--- PASS: TestDecoderUndefinedJsonKeys (0.00s)
PASS

This test will run without an issue.

The setting: decoder.DisallowUnknownFields() would point out that unknown fields would result in an error. But as far as I can tell there is no such thing for 'expected' fields which were not set.

solution

i've checked the implementation and your issue tracker but this hasn't been discussed so far - to my surprise.

validator

a automated detected would require a validator as mentioned:

"github.com/alecthomas/jsonschema"
"github.com/xeipuuv/gojsonschema"

https://stackoverflow.com/questions/19633763/unmarshaling-json-in-go-required-field

manual checks

if the fields were pointers, one could manually check for nil. this is illustrated also in:

https://stackoverflow.com/questions/19633763/unmarshaling-json-in-go-required-field

go playground validator

https://github.com/go-playground/validator

strict decoding (wish)

i'd love if we had a

decoder.StrictDecoding() 

or

decoder.DisallowUnsetFields()

option which would enforce all struct values to be in the json and otherwise fail with an error.

thoughts

i'm interested on your thoughts on the matter.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.