planetscale / vtprotobuf Goto Github PK

View Code? Open in Web Editor NEW

832.0 22.0 73.0 11.29 MB

A Protocol Buffers compiler that generates optimized marshaling & unmarshaling Go code for ProtoBuf APIv2

License: BSD 3-Clause "New" or "Revised" License

Go 97.14% Makefile 2.49% Shell 0.37%

go protobuf grpc codegen vitess

vtprotobuf's People

Stargazers

Watchers

Forkers

pi isgasho nghialv davygeek lukealonso terwey euroelessar daotlresearch alexeykiselev maticz gandhikrishna npordash huntsman90 technicallyty warmchang elireisman ekoaw vibhavp fenollp laplacekorea kyleconroy rohita5l markus-wa better-commons zcolleen atercattus davidflanagan bogdandrutu paralin hueypark convto devk-insurance misberner noil pfouilloux joshcarp blastbao gg-big-org horpto jbarnette anima-os-dev super-rain pomo-mondreganto tradinglite machworklab iq-scm howardjohn longit644 carsonip userpro grafana nockty datadog biosvs gnagel sivukhin elastic cristaloleg evgfedotov jamestiotio mrtdeh maheeshap-canopus ww166 aureliar8 benjamin99 bhollis prodvana mmorel-35 rohankumardubey castbox tnako

vtprotobuf's Issues

Performance regression for int32 and sfixed32 lists

While benchmarking vtprotobuf in our projects, we noticed a performance regression in case of lists of int32 and sfixed32 numbers.

Marshaling and unmarshaling both seem to be slower with vtprotobuf in case of repeated int32 fields.

Although unmarshaling is faster with vtprotobuf for repeated sfixed32, marshaling is slower.

This repository contains samples of these microbenchmarks: https://github.com/themreza/vtprotobuf-bench/tree/main

What could be causing this? Is there a way to improve the performance?

It would be helpful to have automated benchmarks for different data types comparing vtprotobuf with the built-in proto.Marshal and proto.Unmarshal.

how to use vtprotobuf in bufbuild/buf should in README

how to use vtprotobuf in bufbuild/buf guide:

in macOS

step 1:

set up PATH

export GOBIN=/Users/xxxx/go/bin
export PATH=$PATH:$GOBIN:

step 2:

check out README, install vtprotobuf

go install github.com/planetscale/vtprotobuf/cmd/protoc-gen-go-vtproto@latest

step 3:

check out https://docs.buf.build/installation
to install buf in macOS, like this

brew install bufbuild/buf/buf

step 4:

add vtprotobuf to buf.gen.yaml

version: v1
managed:
  enabled: true
  go_package_prefix:
    default: github.com/your/grpc-project
plugins:
  - plugin: buf.build/bufbuild/connect-go
    out: ./
    opt: paths=source_relative
  - plugin: buf.build/protocolbuffers/go
    out: ./
    opt: paths=source_relative
  - plugin: go-vtproto
    out: ./
    opt: paths=source_relative

here is

  - plugin: go-vtproto
    out: ./
    opt: paths=source_relative

add to buf generate plugins

final , generate proto to support vtprobuf

buf generate

that's all done.

Unmarshaling empty messages is incompatible with proto.Unmarshal

When unmarshaling an empty message embedded in another message, vtprotobuf is allocating the message, but proto.Unmarshal is using a typed nil.

Steps to reproduce:

syntax = "proto3";
package repro;

message TopLevel {
  message Empty {}
  Empty embedded = 1;
}

func TestEmbeddedEmpty(t *testing.T) {
	m := &pb.TopLevel{Embedded: &pb.TopLevel_Empty{}}

	// Marshal it with protobuf.
	protobuf, err := proto.Marshal(m)
	if err != nil {
		t.Fatal(err)
	}

	// Unmarshal it with protobuf.
	pbm := &pb.TopLevel{}
	if err := proto.Unmarshal(protobuf, m); err != nil {
		t.Fatal(err)
	}

	// Unmarshal it with vtprotobuf.
	vtm := &pb.TopLevel{}
	if err := vtm.UnmarshalVT(protobuf); err != nil {
		t.Fatal(err)
	}

	fmt.Printf("%#v\n\n", pbm)
	fmt.Printf("%#v\n", vtm)

	require.True(t, pbm.EqualVT(vtm), "EqualVT")
	require.True(t, proto.Equal(pbm, vtm), "proto.Equal")
}

Output:

&proto.TopLevel{state:impl.MessageState{NoUnkeyedLiterals:pragma.NoUnkeyedLiterals{}, DoNotCompare:pragma.DoNotCompare{}, DoNotCopy:pragma.DoNotCopy{}, atomicMessageInfo:(*impl.MessageInfo)(nil)}, sizeCache:0, unknownFields:[]uint8(nil), Embedded:(*proto.TopLevel_Empty)(nil)}

&proto.TopLevel{state:impl.MessageState{NoUnkeyedLiterals:pragma.NoUnkeyedLiterals{}, DoNotCompare:pragma.DoNotCompare{}, DoNotCopy:pragma.DoNotCopy{}, atomicMessageInfo:(*impl.MessageInfo)(nil)}, sizeCache:0, unknownFields:[]uint8(nil), Embedded:(*proto.TopLevel_Empty)(0xc00011fbc0)}

--- FAIL: TestEmbeddedEmpty (0.00s)
    main_test.go:37: 
        	Error Trace:	main_test.go:37
        	Error:      	Should be true
        	Test:       	TestEmbeddedEmpty
        	Messages:   	EqualVT
FAIL
exit status 1

Proto2 Support

Is it expected that the vtproto files aren't currently being generated for proto2 files or do I have something misconfigured?

Marshaling empty message in oneof is incompatible with proto.Marshal

An empty message used in a oneof is not marshaled the same as proto.Marshal().

Steps to reproduce:

syntax = "proto3";
package repro;

message Repro {
  message Empty {}
  oneof str_or_empty {
    string str = 1;
    Empty empty = 2;
  }
}

func TestEmptyOneOf(t *testing.T) {
	m := &pb.Repro{StrOrEmpty: &pb.Repro_Empty_{}}

	protobuf, err := proto.Marshal(m)
	if err != nil {
		t.Fatal(err)
	}

	vtprotobuf, err := m.MarshalVT()
	if err != nil {
		t.Fatal(err)
	}

	fmt.Printf("protobuf: %#v\n", protobuf)
	fmt.Printf("vtprotobuf: %#v\n", vtprotobuf)

	require.True(t, bytes.Equal(protobuf, vtprotobuf))
}

Output:

protobuf: []byte{0x12, 0x0}
vtprotobuf: []byte{}
--- FAIL: TestEmptyOneOf (0.00s)
    main_test.go:129: 
        	Error Trace:	main_test.go:129
        	Error:      	Should be true
        	Test:       	TestEmptyOneOf
FAIL
exit status 1

Unable to get pooling methods to gen

I am currently attempting to put together a POC for using vtprotobuf in an application I work on. However the most promising feature (pooling) does not seem to exist in the generated code.

My command line:

protoc \
  -Ipkg/.patched-proto \
  --go_out=paths=source_relative:./pkg/tempopb/ \
  --go-grpc_out=paths=source_relative:./pkg/tempopb/ \
  --go-vtproto_out=paths=source_relative:./pkg/tempopb/ \
  --go-vtproto_opt=features=marshal+unmarshal+size+pool \
  pkg/.patched-proto/trace/v1/trace.proto

The output files seem to be generated correctly and there are no errors:

But I'm not seeing ResetVT, ReturnVTToPool, or *FromVTPool generated. I have tried with 0.2.0 as well as tip of main. I have also tried not specifying --go-vtproto_opt without luck.

I am seeing MarshalVT, MarshalToVT, SizeVT, UnmarshalVT, ... generated.

Thanks for your time!

Support pooling repeated fields

As far as I can tell single fields are pooled correctly but repeated fields are not. Using the following proto:

message Parent {
  option (vtproto.mempool) = true;
  repeated Child children = 1;
  Child one = 2;
}

message Child {
  option (vtproto.mempool) = true;
  uint32 field = 1;
}

I can see the the ResetVT and UnmarshalVT methods correctly handle the "one" field but not the "children" field.

func (m *Parent) ResetVT() {
	for _, mm := range m.Children {
		mm.ResetVT()  // does not return the slice pointers to the pool
	}
	m.One.ReturnToVTPool()   // correctly returns this pointer to the pool
	m.Reset()
}

func (m *Parent) UnmarshalVT(dAtA []byte) error {
...
		switch fieldNum {
		case 1:
...
			if len(m.Children) == cap(m.Children) {
				m.Children = append(m.Children, &Child{}) // allocates new object for slice
			} else {
				m.Children = m.Children[:len(m.Children)+1]
				if m.Children[len(m.Children)-1] == nil {
					m.Children[len(m.Children)-1] = &Child{} // allocates new object for slice
				}
			}
...
		case 2:
...
			if m.One == nil {
				m.One = ChildFromVTPool() // correctly pulls from pool
			}
...

Is there a way to do this that I'm not seeing?

equal: code doesn't correctly differentiate between absence and zero values in maps

The generated code doesn't check for presence in the map in other. This may lead to spuriously returning true if:

both maps are of equal size,
keys present in both maps map to the same respective values, and
keys present in only the first map map to the zero value for the respective type.

Failing test case:

func TestEqualVT_Map_AbsenceVsZeroValue(t *testing.T) {
	a := &TestAllTypesProto3{
		MapInt32Int32: map[int32]int32{
			1: 0,
			2: 37,
		},
	}
	b := &TestAllTypesProto3{
		MapInt32Int32: map[int32]int32{
			2: 37,
			3: 42,
		},
	}

	aJson, err := protojson.Marshal(a)
	require.NoError(t, err)
	bJson, err := protojson.Marshal(b)
	require.NoError(t, err)

	if a.EqualVT(b) {
		assert.JSONEq(t, string(aJson), string(bJson))
		err := fmt.Errorf("these %T should not be equal:\nmsg = %+v\noriginal = %+v", a, a, b)
		require.NoError(t, err)
	}
}

Avoid proto reflection for builtin types

The optimized generated code for marshal, unmarshal, clone etc. still resorts to generic, protoreflect-based logic for builtin/well-known types such as google.protobuf.Timestamp, google.protobuf.Duration etc. Since these types are well known, it would be nice if optimized unrolled code could be used for operations on these types as well - especially as there seems to be a significant performance penalty for the "context-switch" to reflection with each individual proto.Clone invocation (benchmark).

I'm happy to send a PR, but one thing I'd ask the library maintainers to chime in on is whether it's acceptable to have the generated code reference global helper functions from a package within this module (e.g., github.com/planetscale/vtprotobuf/support/...), or whether it would be preferable to just generate package-private helper functions for all referenced types on demand.

Using stack trace feature for returned errors

Hello.
There's stack trace feature in official package github.com/pkg/errors.
I'd offer to wrap returned errors in generated files with errors.WithStack(err) from this package.
Other words, let's turn this generated code:

if err != nil {
	return err
}
if (skippy < 0) || (iNdEx+skippy) < 0 {
	return ErrInvalidLength
}
if (iNdEx + skippy) > l {
	return io.ErrUnexpectedEOF
}

into that:

if err != nil {
	return errors.WithStack(err)
}
if (skippy < 0) || (iNdEx+skippy) < 0 {
	return errors.WithStack(ErrInvalidLength)
}
if (iNdEx + skippy) > l {
	return errors.WithStack(io.ErrUnexpectedEOF)
}

That would be extremely useful for debug purposes.
I could prepare pull request if you wish.

Offering assistance and discussing upkeep / releases for vtprotobuf

Hello everyone,

I'd like to discuss the future maintenance of vtprotobuf. As an avid user from Datadog, I've recognized its potential in enhancing protobuf message operations. However, recent activity in the repository has been limited, with the last release dating back to January.

To bolster vtprotobuf's capabilities for developers, I propose exploring a more active maintenance and release cycle. Acknowledging the demands of open-source projects, I'm here to extend a hand, along with some of my coworkers from Datadog, to offer assistance.

Although notable bugs and promising ideas exist in pull requests and issues (like #83 and #54), no PRs have been merged since the beginning of the year. We're enthusiastic about bridging this gap and improving vtprotobuf's efficiency and robustness.

In this regard, I suggest opening a discussion on project maintenance and future releases. Our goal is collaborative growth, not imposition. Whether it's contributing code, managing issues, updating documentation, or handling releases, we are more than willing to step in.

Let's collectively ensure that vtprotobuf remains a valuable resource for developers. Your insights are vital, and I'm excited about the potential improvements we can achieve.

Looking forward to your thoughts and suggestions. Thank you for your time and consideration.

Codec with GRPC Server Support

Thought/Question - Why does the GRPC codec not try and ReturnToVTPool on the way out? Is anyone else doing this. My thoughts are to have this for optimising the return of complex payloads - build them in the handler with ...FromVTPool() then let the codec release them once the wire-work has finished and the bytes are sent?

Are there reasons for this not being in the Codec, and if not, is there an opening for a PR with a PoolAwareCodec to be added to the project?

Proto3 optional field support

Hello, i have just tried running this generator on my proto3 files and it failed with:

myproto.proto is a proto3 file that contains optional fields, but code generator protoc-gen-go-vtproto hasn't been updated to support optional fields in proto3. Please ask the owner of this code generator to support proto3 optional.--go-vtproto_out:

Is optional field supported by this generator ?

Thanks

Repeated field with nulls break pool

It appears there is no null checks for repeated field with returning to the pool, causing panic. Is this not supported or am I missing how to handle properly?

equal: generated code does not check reference equality

Expected: Generated code returns true if this and that instances are the same, i.e. msg.EqualVT(msg) should be super fast regardless of the fields in the message

Actual: Generated code still walks the whole message hierarchy and compares field-by-field

I believe implementing this should be as simple as replacing the currently generated code

if this == nil {
    return that == nil
} else if that == nil {
    return false
}

with

if this == that {
    return true
} else if this == nil || that == nil {
    return false
}

Could not find a contribution guide, so let me know if you'd like a PR for this or if you would rather make the change yourselves. Thanks for the great project.

Discussion on the use of codec

Does vtprotobuf supports the use of vtprotobuf and standard proto.Message in the same project (we can rely on the Pb. Go file generated by other projects);
Our real scene is that the internal proto file uses vtprotobu to improve performance, and some other functions rely on the third-party pb.go file; The following error occurred during processing

stream is closed with error: xds: stream.Recv() failed: rpc error: code = Internal desc = grpc: error while marshaling: failed to marshal, message is *envoy_service_discovery_v3.DiscoveryRequest (missing vtprotobuf helpers)" func=goexit

Marshalling of optional empty byte is different from proto.Marshal

syntax = "proto3";  
package tutorial;  
message OptionalByte{  
    optional bytes value = 1;  
}

Output of proto.Marshal:
[]uint8{0xa, 0x0}

Output of MarshalVT:
[]uint8{}

Error compiling generated code with third party imports

Hello,

I was looking to experiment with this and ran into the following error:

type *"google.golang.org/genproto/googleapis/rpc/status".Status has no field or method MarshalToSizedBufferVT

The protobuf files that are complaining import files like:

import "google/rpc/status.proto";

with messages that look like:

message TestMessage {
   google.rpc.Status status = 2;
}

The google/rpc/status.proto file is copied locally for the code generation, but the generated code is importing the Go module from google.golang.org/genproto/googleapis/rpc/status so it's not part of the vtproto generation steps.

Is this an issue that you've had to resolve or any suggestions on how to approach this?

Make proto.Marshal use MarshalVT under the hood

This fast-marshaling is cool, but I would like to avoid having the VT suffix in my codebase, and would like to simply continue using proto.Marshal(...).

I read that protoreflect's Message supports an optional ProtoMethods method, and quoting the docs:

ProtoMethods returns optional fast-path implementions of various operations.

Is there a way vtprotobuf's fast-marshaling could be added to those methods, instead of new (Un)MarshalVT methods?

Nil pointer panic marshalling nil oneof field

Hi 👋 Ran into a bit of a weird one. Given the following proto:

syntax = "proto3";

package proto;

option go_package = "github.com/pfouilloux/vttest/proto";

message TestMsg {
  oneof Test {
    string a = 1;
    string b = 2;
    string c = 3;
  }
}

and the following code:

package main_test

import (
	"testing"

	"github.com/planetscale/vtprotobuf/codec/grpc"
	_ "google.golang.org/grpc/encoding/proto"
	"vttest/proto/github.com/pfouilloux/vttest/proto"
)

//go:generate protoc --proto_path=proto --go_out=proto --go_opt=paths=source_relative --go-vtproto_out=proto --go-vtproto_opt=features=marshal+unmarshal+size oneof.proto

func TestMarshal(t *testing.T) {
	test := &proto.TestMsg{Test: getA()}
	_, err := grpc.Codec{}.Marshal(test)
	if err != nil {
		panic(err)
	}
}

func getA() *proto.TestMsg_A {
	return nil
}

I'm seeing the following error:

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x1239f1d]

goroutine 4 [running]:
testing.tRunner.func1.2({0x126e880, 0x14b3f20})
	/usr/local/opt/go/libexec/src/testing/testing.go:1396 +0x24e
testing.tRunner.func1()
	/usr/local/opt/go/libexec/src/testing/testing.go:1399 +0x39f
panic({0x126e880, 0x14b3f20})
	/usr/local/opt/go/libexec/src/runtime/panic.go:884 +0x212
vttest/proto/github.com/pfouilloux/vttest/proto.(*TestMsg_A).MarshalToSizedBufferVT(0x126e580?, {0x14efc18?, 0xc000057601?, 0x0?})
	/Users/pfouilloux/code/vttest/proto/github.com/pfouilloux/vttest/proto/oneof_vtproto.pb.go:72 +0x1d
vttest/proto/github.com/pfouilloux/vttest/proto.(*TestMsg_A).MarshalToVT(0x1240e01?, {0x14efc18?, 0x0?, 0x123a5a4?})
	/Users/pfouilloux/code/vttest/proto/github.com/pfouilloux/vttest/proto/oneof_vtproto.pb.go:68 +0x6a
vttest/proto/github.com/pfouilloux/vttest/proto.(*TestMsg).MarshalToSizedBufferVT(0xc0001049c0?, {0x14efc18, 0x0, 0x0})
	/Users/pfouilloux/code/vttest/proto/github.com/pfouilloux/vttest/proto/oneof_vtproto.pb.go:58 +0x133
vttest/proto/github.com/pfouilloux/vttest/proto.(*TestMsg).MarshalVT(0xc0001049c0)
	/Users/pfouilloux/code/vttest/proto/github.com/pfouilloux/vttest/proto/oneof_vtproto.pb.go:27 +0x58
github.com/planetscale/vtprotobuf/codec/grpc.Codec.Marshal({}, {0x12a20c0, 0xc0001049c0})
	/Users/pfouilloux/go/pkg/mod/github.com/planetscale/[email protected]/codec/grpc/grpc_codec.go:20 +0x42
vttest_test.TestMarshal(0x0?)
	/Users/pfouilloux/code/vttest/oneof_test.go:15 +0x47
testing.tRunner(0xc0000076c0, 0x12c8b20)
	/usr/local/opt/go/libexec/src/testing/testing.go:1446 +0x10b
created by testing.(*T).Run
	/usr/local/opt/go/libexec/src/testing/testing.go:1493 +0x35f


Process finished with the exit code 1

It looks like there is a nil check missing in the implementation of MarshalToVT for *TestMsg_A

func (m *TestMsg_A) MarshalToVT(dAtA []byte) (int, error) {
	size := m.SizeVT()
	return m.MarshalToSizedBufferVT(dAtA[:size])
}

I'm more than happy to raise a PR to address this if you could give me some guidance on where to add the appropriate tests.

Kind regards & thanks for sharing your work with the community!

Per Msg/Field Features

👋

Are there any plans to support adding features per-msg or per-field?

For example, for some of our string or []byte fields, we prefer to unmarshal them as "unsafe" so as to avoid an allocation if we don't plan to keep the data in memory past the lifetime of the original message.

I was thinking that type of behavior could be added as an annotation in the proto, similar to how the extensions in gogo currently work.

Is that something that is 1.) feasible with this project 2.) something that you would be interested in?

If so, I could try to come up with a POC

Thanks

unknown feature: "clone"

Works fine...

--go-vtproto_opt=features=marshal+unmarshal+size+pool \

Making sure to get latest go install github.com/planetscale/vtprotobuf/cmd/protoc-gen-go-vtproto@latest

--go-vtproto_opt=features=marshal+unmarshal+size+pool+clone \

Gives --go-vtproto_out: unknown feature: "clone"

Support DiscardUnknown fields with UnmarshalVT

https://pkg.go.dev/google.golang.org/protobuf/proto#UnmarshalOptions

Currently UnmarshalVT tracks unknownFields:

default:
 			iNdEx = preIndex
 			skippy, err := skip(dAtA[iNdEx:])
 			if err != nil {
 				return err
 			}
 			if (skippy < 0) || (iNdEx+skippy) < 0 {
 				return ErrInvalidLength
 			}
 			if (iNdEx + skippy) > l {
 				return io.ErrUnexpectedEOF
 			}
 			m.unknownFields = append(m.unknownFields, dAtA[iNdEx:iNdEx+skippy]...) <----
 			iNdEx += skippy

It would be helpful to have an option to discard these.

Provide clarification on reusing serialization buffers

Documentation says:

MarshalToVT() ... This function is useful e.g. when using memory pooling to re-use serialization buffers.

Could you please provide clarification on how to use it? I'm asking because protobufs are typically used with gRPC, and grpc-go's SendMsg() returns before the buffer gets put on the wire. Hence, you cannot reuse it. Here is a relevant issue: grpc/grpc-go#2159. Someone has even attempted this http://www.golangdevops.com/2019/12/31/autopool/ but it's not a solution and the finalizers have poor performance. You could find even more details here thanos-io/thanos#4609.

Are there any examples of the usage of this function? I couldn't find anything.

If I am correct then the recommendation in the README seems dangerous.

How to generate pool methods from buf cli

So I've got a slightly different method of generating protobufs. I used the buf cli to generate protos and my config file looks like the following:

version: v1
plugins:
  - name: go
    out: ./generated/
    opt: paths=source_relative
  - plugin: buf.build/grpc/go:v1.3.0
    out: ./
    opt:
      - paths=source_relative
  - plugin: go-vtproto
    out: ./
    opt:
      - paths=source_relative
      - features=marshal+unmarshal+size+equal+pool+clone

I have both used and not used features, which ends up having the same outcome. None of the pool methods are generated on the types, such as ReturnToVTPool or ResetVT. All other methods are generated.

I'm using the latest version of the plugin.

I noticed that in a bug, someone was doing:

message Parent {
  option (vtproto.mempool) = true;
  repeated Child children = 1;
  Child one = 2;
}

message Child {
  option (vtproto.mempool) = true;
  uint32 field = 1;
}

with option (vtproto.mempool) = true;

This didn't work for me, giving an error about an unsupported option. Also, the only reference I found was in the bug, I never saw it in regular documentation.

I'm sure I'm missing something simple, but not sure what it is. Any help would be appreciated. And thanks for the hard work on this project. We certainly needed something to take up the slack with gogoproto being deprecated.

Support unconditional file generation

In order to support usage of a protoc plugin in bazel rules one needs to ensure that such plugin does always generate a file per each input file.

Would it be fine to add a feature to force a file generation? e.g. force feature which would just unconditionally return true as the result of GenerateFile(...)

Duplicate Functions and Variables in Generated Package

First, just want to say thank you for working on and releasing vtprotobuf. We're working on transitioning out of gogo/protobuf and read your great blog post announcing this alternative.

We've found a slight issue with our use case. We have a few proto packages that have multiple files in them. The generated _vtproto.pb.go files redeclare some utility functions/variables (e.g. sov, skip, ErrInvalidLength, etc.). As an example:

hellopb/service.proto:

syntax = "proto3";

package hellopb;

...

message HelloRequest {
  string q = 1;
}

message HelloResponse {
  string response = 2;
}

service HelloService {
  rpc Hello(HelloRequest) returns (HelloResponse) {}
}

hellopb/db.proto:

syntax = "proto3";

package hellopb;

...

message MessageEntry {
  string q = 1;
  ...
}

We then run:

protoc --proto_path=. --proto_path=../../../  \
  --go_out=../../../ --plugin protoc-gen-go=/go/bin/protoc-gen-go \
  --go-grpc_out=../../../ --plugin protoc-gen-go-grpc=/go/bin/protoc-gen-go-grpc \
  --grpc-gateway_out=../../../ \
  --go-vtproto_out=../../../ --plugin protoc-gen-go-vtproto=/go/bin/protoc-gen-go-vtproto \
  --go-vtproto_opt=features=marshal+unmarshal+size hellopb/service.proto

and

protoc --proto_path=. --proto_path=../../../  \
  --go_out=../../../ --plugin protoc-gen-go=/go/bin/protoc-gen-go \
  --go-grpc_out=../../../ --plugin protoc-gen-go-grpc=/go/bin/protoc-gen-go-grpc \
  --grpc-gateway_out=../../../ \
  --go-vtproto_out=../../../ --plugin protoc-gen-go-vtproto=/go/bin/protoc-gen-go-vtproto \
  --go-vtproto_opt=features=marshal+unmarshal+size hellopb/db.proto

If we run go vet we get:

service_vtproto.pb.go:214:6: encodeVarint redeclared in this block
    db_vtproto.pb.go:125:54: previous declaration
service_vtproto.pb.go:308:6: sov redeclared in this block
    db_vtproto.pb.go:188:23: previous declaration
service_vtproto.pb.go:311:6: soz redeclared in this block
    db_vtproto.pb.go:191:23: previous declaration
service_vtproto.pb.go:694:6: skip redeclared in this block
    db_vtproto.pb.go:545:36: previous declaration
service_vtproto.pb.go:774:2: ErrInvalidLength redeclared in this block
    db_vtproto.pb.go:625:2: previous declaration
service_vtproto.pb.go:775:2: ErrIntOverflow redeclared in this block
    db_vtproto.pb.go:626:2: previous declaration
service_vtproto.pb.go:776:2: ErrUnexpectedEndOfGroup redeclared in this block
    db_vtproto.pb.go:627:2: previous declaration

Is there anyway to avoid this? Post protoc cleanup on this is pretty tough. Is there anyway those functions/variables could just be imported from vtprotobuf?

Provide ability to use golang (v1) marshal/unmarshal under the hood

Hello 👋
We have a large protobuf repo, using golang codegen, most of which is based on the V1 go message format. We are now starting to move to using the V2 message format, and are using vtprotobuf for fast (de)serialization. However, we cannot do the migration all at once. Due to this, we have V2 generated code, that can depend on V1 format generated code during the migration. So, we need the vtproto-generated code to be compatible with the older code.

To handle such scenarios, it would be very helpful if there is an ability to substitute google.golang.org/protobuf/proto with github.com/golang/protobuf/proto. This would make our migration a lot smoother.

This could be exposed via an option/flag at codegen time.

cc @euroelessar

Better support for maps

Hello,
While testing a bit this library in our code base, I noted that map support with pool is not perfect.

map themselves are not pooled
ResetVT does not return to pool values from a map<key, MessagePooled>
UnmarshalVT allocates a new message instead of using the pool for map<key, MessagePooled>

Doing these changes manually in the .pb.go reduces a lot allocations and speeds up unmarshalling.

I can provide a sample proto file and modifications made if needed.

Pre changes:
BenchmarkUnmarshalStdProto-12 202058 4971 ns/op 3840 B/op 54 allocs/op
BenchmarkUnmarshalVTProto-12 228591 5238 ns/op 3429 B/op 47 allocs/op
BenchmarkUnmarshalVTProtoWithPool-12 238689 4967 ns/op 2605 B/op 44 allocs/op

Post changes:
BenchmarkUnmarshalStdProto-12 203602 5240 ns/op 3840 B/op 54 allocs/op
BenchmarkUnmarshalVTProto-12 199917 5864 ns/op 3433 B/op 47 allocs/op
BenchmarkUnmarshalVTProtoWithPool-12 601562 2009 ns/op 302 B/op 5 allocs/op

Twirp Integration

In response to "I actually have no idea of how to switch encoders in Twirp. Maybe it's not even possible.", this can't be done out of the box with the code generation, but can be achieved (server-side) with a simple find and replace.

The changes are below:

proto.Marshal(respContent) >> respContent.MarshalVT()
proto.Unmarshal(buf, reqContent) >> reqContent.UnmarshalVT(buf)

I'm using make to control all code gen, so I added the below as a final step:

for twirp in $${dir}/*.twirp.go; \
do \
  echo 'Updating' $${twirp}; \
  sed -i '' -e 's/respBytes, err := proto.Marshal(respContent)/respBytes, err := respContent.MarshalVT()/g' $${twirp}; \
  sed -i '' -e 's/if err = proto.Unmarshal(buf, reqContent); err != nil {/if err = reqContent.UnmarshalVT(buf); err != nil {/g' $${twirp}; \
done; \

Add optional unsafe operations

Context

For messages that contain many string fields (e.g. repeated string with many elements coming in), UnmarshalVT can spend a lot of CPU time in runtime.slicetobytestring. Indeed, when decoding the []byte data, it does a string(bytes) cast (e.g. m.Foo1 = string(dAtA[iNdEx:postIndex])). Since []byte is mutable and string is not, this cast requires an allocation for safety. This allocation, repeated many times, sometimes turns out to be expensive.

Feature request

We could avoid this by using the unsafe package that allows us to perform this cast without an allocation:

func unsafeBytesToString(b []byte) string {
	return *(*string)(unsafe.Pointer(&b))
}

Of course, the user has to be careful because if they overwrite the []byte data they received from the wire, then the string is mutated. So, this feature should be opt-in in my opinion.

This feature is mentioned in another issue: Per Msg/Field Features. I think having per-message / per-field features is not mandatory to implement unsafe casting, though.

Note that this feature applies to bytes fields as well where we could reference data instead of copying it.

Proposal

I think this feature is worth implementing and I'm open to ideas. In my opinion, a simple and pragmatic approach to implement it would be to add unsafe functions, such as UnmarshalVTUnsafe, which perform such operations for all applicable fields. I've actually started such an implementation on my personal fork and it seems to work well (diff for anyone curious, although it doesn't work yet for bytes fields). Now, before spending more time on it, I would love to gauge interest and hear considerations from others!

A few considerations I had on my side:

I think having a different function from UnmarshalVT is mandatory because several applications can use the same generated code for a message and we cannot ask them to all be careful not overwriting data received from the wire.
I considered adding a feature but it sounds weird because it would be transversal to other features.
- Either we could just always generate both the safe and unsafe versions of the function;
- or I think we could add an --unsafe flag (different from features) to trigger the generation of unsafe functions.

Let me know what you folks think about all this!

wrong pool unmarshal slize

wrong pool unmarshal size.

my proto file:

// protoc  --go_out=. --plugin protoc-gen-go="/Users/jie.yang05/go/bin/protoc-gen-go"  --go-vtproto_out=.  --plugin protoc-gen-go-vtproto="/Users/jie.yang05/go/bin/protoc-gen-go-vtproto"  --go-vtproto_opt=features=marshal+unmarshal+size+pool ./lineentry.proto
syntax = "proto3";

package index;

option go_package="./proto";

import "github.com/planetscale/vtprotobuf/vtproto/ext.proto";

message lineEntries {
  option (vtproto.mempool) = true; // Enable memory pooling
  repeated lineEntry lineEntries = 1;
}

message lineEntry {
  uint64 address = 1;
  uint32 line = 2;
  uint32 file = 3;
}

comand

yangjie05-mac:index jie.yang05$ protoc  --go_out=. --plugin protoc-gen-go="/Users/jie.yang05/go/bin/protoc-gen-go"  --go-vtproto_out=.  --plugin protoc-gen-go-vtproto="/Users/jie.yang05/go/bin/protoc-gen-go-vtproto"   -I /Users/jie.yang05/go/pkg/mod/github.com/planetscale/vtprotobuf\@v0.4.0/include -I ./ ./lineentry.proto

generate code:

func (m *LineEntries) ResetVT() {
	for _, mm := range m.LineEntries {
		mm.ResetVT()
	}
	m.Reset()
}

func (m *LineEntries) ReturnToVTPool() {
	if m != nil {
		m.ResetVT()
		vtprotoPool_LineEntries.Put(m)
	}
}

generate code have wrong ResetVT.

New extension: scrub

Proposal: new extension called scrub which adds a function Scrub() to messages.

Similar to Reset(), except it recursively overwrites all buffers & fields in the message with zeros.

TinyGo support

First off, wanted to say thanks for such a great project. While the issue title may sound like a feature request, the code generated by vtprotobuf already works with TinyGo. You can see an example of that here: https://github.com/kyleconroy/go-wasm-plugins

What I'd like to ask about is making TinyGo support explicit via a CI job. Is this something you'd be interested in supporting? If so I can take a first pass.

Deterministic mode

Upstream proto has a "deterministic" mode: https://github.com/protocolbuffers/protobuf-go/blob/f221882bfb484564f1714ae05f197dea2c76898d/proto/encode.go#L50

I think this is more-strict than marshal_strict (but I could be wrong) in that it also sorts maps.

Interestingly, there is a "Stable" field here:

vtprotobuf/features/marshal/marshalto.go

Line 43 in 96ede25

Stable, once, strict bool

. but its never used.

It might be nice to add this option, or if its already possible documentation around it.

Adding support for proto extensions?

I'm using protobuf and trying this plugin, but seems like it's lacking the support for extensionFields 🤔? Am I right, or misunderstanding something here?

buf.build support?

Unfortunately when using buf.build, #19 pops up again, except since buf is doing the work there's no easy way just run a single protoc.

I'm not sure if this is something vtprotobuf can work around, or if this requires a feature in buf but it's a shame that these two tools don't play nicely together.

Replace sync.Pool with zeropool

Generated pool code currently uses sync.Pool which is nice because there are no external dependencies.
However, there is a small dependency we use instead that avoids a known allocation issue with sync.Pool (and introduces type safety, but it doesn't really matter since this is generated code).

ReturnToVTPool() recursive?

When func (p *YourProto) ReturnToVTPool() is called, children of YourProto that implement method ReturnToVTPool() should also be returned to the pool.

equal: code doesn't distinguish between oneof fields when zero-valued

Because the comparison logic for oneof fields relies on the getters for the individual fields, it cannot differentiate between a field not being set, and a field being set to the zero value. While the nil checks allow distinguishing protos where one has a oneof field set to a zero value, while the other doesn't have any field in the oneof set, the code fails to distinguish protos where different fields in a oneof are set to the respective zero value.

Test case:

func TestEqualVT_Oneof_AbsenceVsZeroValue(t *testing.T) {
	a := &TestAllTypesProto3{
		OneofField: &TestAllTypesProto3_OneofUint32{
			OneofUint32: 0,
		},
	}
	b := &TestAllTypesProto3{
		OneofField: &TestAllTypesProto3_OneofString{
			OneofString: "",
		},
	}

	aJson, err := protojson.Marshal(a)
	require.NoError(t, err)
	bJson, err := protojson.Marshal(b)
	require.NoError(t, err)

	if a.EqualVT(b) {
		assert.JSONEq(t, string(aJson), string(bJson))
		err := fmt.Errorf("these %T should not be equal:\nmsg = %+v\noriginal = %+v", a, a, b)
		require.NoError(t, err)
	}
}

This is similar to #48 , but applies to oneofs and exercises different paths in the code generation.

Incorrect generation of "optional" parameters in the "response" message

I tried to return the optional field in the rpc response message and got the wrong generation

Add support for non nullable objects - similar to gogoproto.nullable

Thanks for putting this library together!

Would it be possible to add support for non nullable objects - similar to gogo annotation gogoproto.nullable?

Custom Tag Support

Does vtprotobuf provide custom tag support like gogoprotobuf? I can't seem to find any information about this online.

in a .proto file, with gogoprotobuf I can attach customized tags like:

import "github.com/gogo/protobuf/gogoproto/gogo.proto";

option go_package                  = "events";
option (gogoproto.unmarshaler_all) = true;
option (gogoproto.sizer_all)       = true;
option (gogoproto.marshaler_all)   = true;

message Event {
  string AuctionID                               = 1 [(gogoproto.moretags) = 'gorm:"column:auction_id;type:VARCHAR(64);primary_key"'];
  int64  CampaignID                              = 2 [(gogoproto.moretags) = 'gorm:"column:campaign_id;type:BIGINT;index"'];
  int64  ImpIndex                                = 3 [(gogoproto.moretags) = 'gorm:"column:imp_index;type:INT"'];
  string DomainKey                               = 5 [(gogoproto.moretags) = 'gorm:"column:domain_key;type:VARCHAR(1024)"'];

Does vtprotobuf provide any support for anything like gogoproto.moretags?

Question: grpc

One of the features (which is on by default if features are not specified) is grpc. A quick look at the code and I can't seem to find any difference between the regular grpc plugin and the code generated vtprotobuf. Is there a plan to extends this to use say use pool for object creation in the future?

equal: presence of bytes fields not honored

All fields in proto2 and fields explicitly marked optional in proto3 have a presence property, i.e., the field not being set is always different from the field being set, even when set to the zero value. The generated code for equal correctly checks for equality of this presence property for optional scalar and message fields (i.e., fields with a pointer Go type), but not for bytes (which, in the proto world, is a scalar type, but in Go maps to a nilable reference type).

It is correct to not differentiate between []byte(nil) and []byte{} for fields without presence (i.e., fields in a oneof, repeated fields, and fields with neither optional nor repeated in proto3), however, for all the other cases, this difference should be taken into account as it indicates presence/absence.

Example test case:

func TestEqualVT_Proto2_BytesPresence(t *testing.T) {
	a := &TestAllTypesProto2{
		OptionalBytes: nil,
	}
	b := &TestAllTypesProto2{
		OptionalBytes: []byte{},
	}

	require.False(t, proto.Equal(a, b))

	aJson, err := protojson.Marshal(a)
	require.NoError(t, err)
	bJson, err := protojson.Marshal(b)
	require.NoError(t, err)

	if a.EqualVT(b) {
		assert.JSONEq(t, string(aJson), string(bJson))
		err := fmt.Errorf("these %T should not be equal:\nmsg = %+v\noriginal = %+v", a, a, b)
		require.NoError(t, err)
	}
}

PR coming

Question about behavior of `ReturnToVTPool()` and `Reset()`

Using this proto:

message Parent {
  option (vtproto.mempool) = true;
  repeated Child children = 1;
  Child one = 2;
}

message Child {
  option (vtproto.mempool) = true;
  uint32 field = 1;
}

When calling ReturnToVTPool() on Parent it calls ResetVT on all children and then calls m.Reset()

func (m *Parent) ResetVT() {
	for _, mm := range m.Children {
		mm.ResetVT()
	}
	m.One.ReturnToVTPool()
	m.Reset()
}

However m.Reset() allocates a new object and overwrites the existing object entirely:

func (x *Parent) Reset() {
	*x = Parent{}

This nils out all fields on the parent throwing away the slice for the GC to handle. Am I missing something? Is there some way to put back into the pool, call ResetVT() but not call Reset()?

Inconsistent (de)serialization behavior

I am building an application protocol with protobufs, and I'm using vtprotobuf exclusively to marshal and unmarshal the messages. Currently, I'm experiencing strange behavior I'm not understanding that I think is related to vtprotobuf.

Here are my message definitions:

message Header {
  fixed32 Size = 1; // Size of the next message
  fixed32 Checksum = 2; // Checksum of the serialized message
}

message RaftControlPayload {
  oneof Types {
    GetLeaderIDRequest GetLeaderIdRequest = 1;
    GetLeaderIDResponse GetLeaderIdResponse = 2;
    IdRequest IdRequest = 3;
    IdResponse IdResponse = 4;
    IndexState IndexState = 5;
    ModifyNodeRequest ModifyNodeRequest = 6;
    ReadIndexRequest ReadIndexRequest = 7;
    ReadLocalNodeRequest ReadLocalNodeRequest = 8;
    RequestLeaderTransferResponse RequestLeaderTransferResponse = 9;
    RequestSnapshotRequest RequestSnapshotRequest = 10;
    SnapshotOption SnapshotOption = 12;
    StopNodeResponse StopNodeResponse = 13;
    StopRequest StopRequest = 14;
    StopResponse StopResponse = 15;
    SysOpState SysOpState = 16;
    DBError Error = 17;
  }
  enum MethodName {
      ADD_NODE = 0;
      ADD_OBSERVER = 1;
      ADD_WITNESS = 2;
      GET_ID = 3;
      GET_LEADER_ID = 4;
      READ_INDEX = 5;
      READ_LOCAL_NODE = 6;
      REQUEST_COMPACTION = 7;
      REQUEST_DELETE_NODE = 8;
      REQUEST_LEADER_TRANSFER = 9;
      REQUEST_SNAPSHOT = 10;
      STOP = 11;
      STOP_NODE = 12;
  }
  MethodName Method = 18;
}

This message serializes to 10 bytes, which I send across a network stream as a header for whatever unknown message payload is coming next. This allows me to simply pass raw protobuf messages across a network stream without having to leverage gRPC or other RPC frameworks.

Sending a message across the network stream is pretty straightforward. I prepare a message, serialize the message, create a header with all of the appropriate values, serialize the header, send the header, then send the message.

idReqPayload := &database.RaftControlPayload{
	Method: database.RaftControlPayload_GET_ID,
	Types: &database.RaftControlPayload_IdRequest{
		IdRequest: &database.IdRequest{},
	},
}
payloadBuf, _ := idReqPayload.MarshalVT()

initialHeader := &transportv1.Header{
	Size: uint32(len(payloadBuf)),
	Checksum: crc32.ChecksumIEEE(payloadBuf),
}
headerBuf, _ := initialHeader.MarshalVT()

stream.Write(headerBuf)
stream.Write(payloadBuf)

Receiving a message on the network stream is also pretty straightforward. I read the header into a buffer, deserialize it, read the next N bytes from the stream based off the Size field in the header message, and verify some checksums, then serialize the byte array into the equivalent messages.

headerBuf := make([]byte, 10)
if _, err := io.ReadFull(stream, headerBuf); err != nil {
	logger.Error().Err(err).Msg("cannot readAndHandle raft control header")
	continue
}

// marshall the header
header := &transportv1.Header{}
if err := header.UnmarshalVT(headerBuf); err != nil {
	logger.Error().Err(err).Msg("cannot unmarshal header")
	return
}

// prep the message buffer
msgBuf := make([]byte, header.Size)
if _, err := io.ReadFull(stream, msgBuf); err != nil {
	logger.Error().Err(err).Msg("cannot read message payload")
	return
}

// verify the message is intact
checked := crc32.ChecksumIEEE(msgBuf)
if checked != header.GetChecksum() {
	logger.Error().Msg("checksums do not match")
}

// unmarshal the payload
msg := &database.RaftControlPayload{}
if err := msg.UnmarshalVT(msgBuf); err != nil {
	logger.Error().Err(err).Msg("cannot unmarshal payload")
}

Here's where things start to get confusing. When I serialize idReqPayload via MarshalVT() and run a checksum against it, I'll get uint32(1298345897); when I send the header as you see here, the Size field is uint32(5) and Checksum is uint32(1298345897). When the header message gets deserialized on the receiving end of a localhost connection, it looks very different.

The header message gets deserialized with the Size field being uint32(5) and the Checksum field being uint(1). That's the first strange thing.

When I run a checksum against the next 5 bytes of the serialized idReqPayload payload which followed, it checksums to uint32(737000948) even though there was no change to the byte array from the time it was serialized to the time it was received. That's the second strange thing.

When I run an equality check against the value of the deserialised header Checksum field against a local checksum of the serialized idReqPayload payload with checked := crc32.ChecksumIEEE(msgBuf); if checked != header.GetChecksum() { // ... }, it passes an equality check - the deserialized header Checksum field's value is uint(1) whereas the calculated checksum of the received message is uint32(737000948). That's the third strange thing.

When I deserialize the serialized idReqPayload byte array, it deserializes without an error. However, the message information is incorrectly serialized. When I serialize protobuf with this configuration:

idReqPayload := &database.RaftControlPayload{
	Method: database.RaftControlPayload_GET_ID,
	Types: &database.RaftControlPayload_IdRequest{
		IdRequest: &database.IdRequest{},
	},
}

It deserializes into this equivalent:

msg := &database.RaftControlPayload{
	Method: database.RaftControlPayload_ADD_NODE,
	Types: nil,
}

The Method field is reset so the enum is defaulted to 0, and the Types field is nil.

I'm fairly positive this could partially be related to #51, but I updated my local protoc-gen-go-vtproto binary to 0ae748f and the problem still persists. I've also eliminated the network stream as it's a localhost network stream, so nothing is intercepting it or modifying it in transit.

Am I doing something wrong or is this a bug of some kind?

Cloning messages

"clone" would generates a VTClone() function to duplicate a message (like proto.Clone)

"copy" would generate "VTCopy(dst)" to copy the contents of a message to a target message.

gRPC and pooling

I have a couple of questions regarding pooling and gRPC that I could not fully understand from existing issues or the readme. (happy to do a PR to clarify in the readme after understanding for the next person)

In #16 (comment) it was mentioned, that memory-pooled objects are automatically unmarshaled using objects from a pool, which I could get to work. How is this intended to work for the other way around though? Does it happen automatically somewhere like during marshaling (if so I can't find the code that does this)? If not, is there a recommended or suggested place where this could be done (maybe in the codec?)

My use case is I have a lot of objects that I read from a key-value store that eventually end up in a gRPC response, but after marshaling the response I would like to return the objects to the pool.

cut new release?

Hey all, any chance of cutting a new release?

The latest version is 0.3.0 and a bunch of stuff has been added since then such as some of the documented options (pool and so on). Is the current main stable? Can a new release be made?