planetscale / vtprotobuf Goto Github PK
View Code? Open in Web Editor NEWA Protocol Buffers compiler that generates optimized marshaling & unmarshaling Go code for ProtoBuf APIv2
License: BSD 3-Clause "New" or "Revised" License
A Protocol Buffers compiler that generates optimized marshaling & unmarshaling Go code for ProtoBuf APIv2
License: BSD 3-Clause "New" or "Revised" License
While benchmarking vtprotobuf in our projects, we noticed a performance regression in case of lists of int32
and sfixed32
numbers.
Marshaling and unmarshaling both seem to be slower with vtprotobuf in case of repeated int32
fields.
Although unmarshaling is faster with vtprotobuf for repeated sfixed32
, marshaling is slower.
This repository contains samples of these microbenchmarks: https://github.com/themreza/vtprotobuf-bench/tree/main
What could be causing this? Is there a way to improve the performance?
It would be helpful to have automated benchmarks for different data types comparing vtprotobuf with the built-in proto.Marshal
and proto.Unmarshal
.
how to use vtprotobuf in bufbuild/buf guide:
in macOS
set up PATH
export GOBIN=/Users/xxxx/go/bin
export PATH=$PATH:$GOBIN:
check out README, install vtprotobuf
go install github.com/planetscale/vtprotobuf/cmd/protoc-gen-go-vtproto@latest
check out https://docs.buf.build/installation
to install buf in macOS, like this
brew install bufbuild/buf/buf
add vtprotobuf to buf.gen.yaml
version: v1
managed:
enabled: true
go_package_prefix:
default: github.com/your/grpc-project
plugins:
- plugin: buf.build/bufbuild/connect-go
out: ./
opt: paths=source_relative
- plugin: buf.build/protocolbuffers/go
out: ./
opt: paths=source_relative
- plugin: go-vtproto
out: ./
opt: paths=source_relative
here is
- plugin: go-vtproto
out: ./
opt: paths=source_relative
add to buf generate plugins
buf generate
that's all done.
When unmarshaling an empty message embedded in another message, vtprotobuf is allocating the message, but proto.Unmarshal is using a typed nil.
Steps to reproduce:
syntax = "proto3";
package repro;
message TopLevel {
message Empty {}
Empty embedded = 1;
}
func TestEmbeddedEmpty(t *testing.T) {
m := &pb.TopLevel{Embedded: &pb.TopLevel_Empty{}}
// Marshal it with protobuf.
protobuf, err := proto.Marshal(m)
if err != nil {
t.Fatal(err)
}
// Unmarshal it with protobuf.
pbm := &pb.TopLevel{}
if err := proto.Unmarshal(protobuf, m); err != nil {
t.Fatal(err)
}
// Unmarshal it with vtprotobuf.
vtm := &pb.TopLevel{}
if err := vtm.UnmarshalVT(protobuf); err != nil {
t.Fatal(err)
}
fmt.Printf("%#v\n\n", pbm)
fmt.Printf("%#v\n", vtm)
require.True(t, pbm.EqualVT(vtm), "EqualVT")
require.True(t, proto.Equal(pbm, vtm), "proto.Equal")
}
Output:
&proto.TopLevel{state:impl.MessageState{NoUnkeyedLiterals:pragma.NoUnkeyedLiterals{}, DoNotCompare:pragma.DoNotCompare{}, DoNotCopy:pragma.DoNotCopy{}, atomicMessageInfo:(*impl.MessageInfo)(nil)}, sizeCache:0, unknownFields:[]uint8(nil), Embedded:(*proto.TopLevel_Empty)(nil)}
&proto.TopLevel{state:impl.MessageState{NoUnkeyedLiterals:pragma.NoUnkeyedLiterals{}, DoNotCompare:pragma.DoNotCompare{}, DoNotCopy:pragma.DoNotCopy{}, atomicMessageInfo:(*impl.MessageInfo)(nil)}, sizeCache:0, unknownFields:[]uint8(nil), Embedded:(*proto.TopLevel_Empty)(0xc00011fbc0)}
--- FAIL: TestEmbeddedEmpty (0.00s)
main_test.go:37:
Error Trace: main_test.go:37
Error: Should be true
Test: TestEmbeddedEmpty
Messages: EqualVT
FAIL
exit status 1
Is it expected that the vtproto files aren't currently being generated for proto2 files or do I have something misconfigured?
An empty message used in a oneof is not marshaled the same as proto.Marshal().
Steps to reproduce:
syntax = "proto3";
package repro;
message Repro {
message Empty {}
oneof str_or_empty {
string str = 1;
Empty empty = 2;
}
}
func TestEmptyOneOf(t *testing.T) {
m := &pb.Repro{StrOrEmpty: &pb.Repro_Empty_{}}
protobuf, err := proto.Marshal(m)
if err != nil {
t.Fatal(err)
}
vtprotobuf, err := m.MarshalVT()
if err != nil {
t.Fatal(err)
}
fmt.Printf("protobuf: %#v\n", protobuf)
fmt.Printf("vtprotobuf: %#v\n", vtprotobuf)
require.True(t, bytes.Equal(protobuf, vtprotobuf))
}
Output:
protobuf: []byte{0x12, 0x0}
vtprotobuf: []byte{}
--- FAIL: TestEmptyOneOf (0.00s)
main_test.go:129:
Error Trace: main_test.go:129
Error: Should be true
Test: TestEmptyOneOf
FAIL
exit status 1
I am currently attempting to put together a POC for using vtprotobuf in an application I work on. However the most promising feature (pooling) does not seem to exist in the generated code.
My command line:
protoc \
-Ipkg/.patched-proto \
--go_out=paths=source_relative:./pkg/tempopb/ \
--go-grpc_out=paths=source_relative:./pkg/tempopb/ \
--go-vtproto_out=paths=source_relative:./pkg/tempopb/ \
--go-vtproto_opt=features=marshal+unmarshal+size+pool \
pkg/.patched-proto/trace/v1/trace.proto
The output files seem to be generated correctly and there are no errors:
But I'm not seeing ResetVT
, ReturnVTToPool
, or *FromVTPool
generated. I have tried with 0.2.0 as well as tip of main. I have also tried not specifying --go-vtproto_opt
without luck.
I am seeing MarshalVT
, MarshalToVT
, SizeVT
, UnmarshalVT
, ... generated.
Thanks for your time!
As far as I can tell single fields are pooled correctly but repeated fields are not. Using the following proto:
message Parent {
option (vtproto.mempool) = true;
repeated Child children = 1;
Child one = 2;
}
message Child {
option (vtproto.mempool) = true;
uint32 field = 1;
}
I can see the the ResetVT
and UnmarshalVT
methods correctly handle the "one" field but not the "children" field.
func (m *Parent) ResetVT() {
for _, mm := range m.Children {
mm.ResetVT() // does not return the slice pointers to the pool
}
m.One.ReturnToVTPool() // correctly returns this pointer to the pool
m.Reset()
}
func (m *Parent) UnmarshalVT(dAtA []byte) error {
...
switch fieldNum {
case 1:
...
if len(m.Children) == cap(m.Children) {
m.Children = append(m.Children, &Child{}) // allocates new object for slice
} else {
m.Children = m.Children[:len(m.Children)+1]
if m.Children[len(m.Children)-1] == nil {
m.Children[len(m.Children)-1] = &Child{} // allocates new object for slice
}
}
...
case 2:
...
if m.One == nil {
m.One = ChildFromVTPool() // correctly pulls from pool
}
...
Is there a way to do this that I'm not seeing?
The generated code doesn't check for presence in the map in other. This may lead to spuriously returning true if:
Failing test case:
func TestEqualVT_Map_AbsenceVsZeroValue(t *testing.T) {
a := &TestAllTypesProto3{
MapInt32Int32: map[int32]int32{
1: 0,
2: 37,
},
}
b := &TestAllTypesProto3{
MapInt32Int32: map[int32]int32{
2: 37,
3: 42,
},
}
aJson, err := protojson.Marshal(a)
require.NoError(t, err)
bJson, err := protojson.Marshal(b)
require.NoError(t, err)
if a.EqualVT(b) {
assert.JSONEq(t, string(aJson), string(bJson))
err := fmt.Errorf("these %T should not be equal:\nmsg = %+v\noriginal = %+v", a, a, b)
require.NoError(t, err)
}
}
The optimized generated code for marshal, unmarshal, clone etc. still resorts to generic, protoreflect
-based logic for builtin/well-known types such as google.protobuf.Timestamp
, google.protobuf.Duration
etc. Since these types are well known, it would be nice if optimized unrolled code could be used for operations on these types as well - especially as there seems to be a significant performance penalty for the "context-switch" to reflection with each individual proto.Clone
invocation (benchmark).
I'm happy to send a PR, but one thing I'd ask the library maintainers to chime in on is whether it's acceptable to have the generated code reference global helper functions from a package within this module (e.g., github.com/planetscale/vtprotobuf/support/...
), or whether it would be preferable to just generate package-private helper functions for all referenced types on demand.
Hello.
There's stack trace feature in official package github.com/pkg/errors
.
I'd offer to wrap returned errors in generated files with errors.WithStack(err)
from this package.
Other words, let's turn this generated code:
if err != nil {
return err
}
if (skippy < 0) || (iNdEx+skippy) < 0 {
return ErrInvalidLength
}
if (iNdEx + skippy) > l {
return io.ErrUnexpectedEOF
}
into that:
if err != nil {
return errors.WithStack(err)
}
if (skippy < 0) || (iNdEx+skippy) < 0 {
return errors.WithStack(ErrInvalidLength)
}
if (iNdEx + skippy) > l {
return errors.WithStack(io.ErrUnexpectedEOF)
}
That would be extremely useful for debug purposes.
I could prepare pull request if you wish.
Hello everyone,
I'd like to discuss the future maintenance of vtprotobuf. As an avid user from Datadog, I've recognized its potential in enhancing protobuf message operations. However, recent activity in the repository has been limited, with the last release dating back to January.
To bolster vtprotobuf's capabilities for developers, I propose exploring a more active maintenance and release cycle. Acknowledging the demands of open-source projects, I'm here to extend a hand, along with some of my coworkers from Datadog, to offer assistance.
Although notable bugs and promising ideas exist in pull requests and issues (like #83 and #54), no PRs have been merged since the beginning of the year. We're enthusiastic about bridging this gap and improving vtprotobuf's efficiency and robustness.
In this regard, I suggest opening a discussion on project maintenance and future releases. Our goal is collaborative growth, not imposition. Whether it's contributing code, managing issues, updating documentation, or handling releases, we are more than willing to step in.
Let's collectively ensure that vtprotobuf remains a valuable resource for developers. Your insights are vital, and I'm excited about the potential improvements we can achieve.
Looking forward to your thoughts and suggestions. Thank you for your time and consideration.
Thought/Question - Why does the GRPC codec not try and ReturnToVTPool on the way out? Is anyone else doing this. My thoughts are to have this for optimising the return of complex payloads - build them in the handler with ...FromVTPool()
then let the codec release them once the wire-work has finished and the bytes are sent?
Are there reasons for this not being in the Codec, and if not, is there an opening for a PR with a PoolAwareCodec
to be added to the project?
Hello, i have just tried running this generator on my proto3 files and it failed with:
myproto.proto is a proto3 file that contains optional fields, but code generator protoc-gen-go-vtproto hasn't been updated to support optional fields in proto3. Please ask the owner of this code generator to support proto3 optional.--go-vtproto_out:
Is optional field supported by this generator ?
Thanks
It appears there is no null checks for repeated field with returning to the pool, causing panic. Is this not supported or am I missing how to handle properly?
Expected: Generated code returns true if this
and that
instances are the same, i.e. msg.EqualVT(msg)
should be super fast regardless of the fields in the message
Actual: Generated code still walks the whole message hierarchy and compares field-by-field
I believe implementing this should be as simple as replacing the currently generated code
if this == nil {
return that == nil
} else if that == nil {
return false
}
with
if this == that {
return true
} else if this == nil || that == nil {
return false
}
Could not find a contribution guide, so let me know if you'd like a PR for this or if you would rather make the change yourselves. Thanks for the great project.
Does vtprotobuf supports the use of vtprotobuf and standard proto.Message in the same project (we can rely on the Pb. Go file generated by other projects);
Our real scene is that the internal proto file uses vtprotobu to improve performance, and some other functions rely on the third-party pb.go file; The following error occurred during processing
stream is closed with error: xds: stream.Recv() failed: rpc error: code = Internal desc = grpc: error while marshaling: failed to marshal, message is *envoy_service_discovery_v3.DiscoveryRequest (missing vtprotobuf helpers)" func=goexit
syntax = "proto3";
package tutorial;
message OptionalByte{
optional bytes value = 1;
}
Output of proto.Marshal:
[]uint8{0xa, 0x0}
Output of MarshalVT:
[]uint8{}
Hello,
I was looking to experiment with this and ran into the following error:
type *"google.golang.org/genproto/googleapis/rpc/status".Status has no field or method MarshalToSizedBufferVT
The protobuf files that are complaining import files like:
import "google/rpc/status.proto";
with messages that look like:
message TestMessage {
google.rpc.Status status = 2;
}
The google/rpc/status.proto
file is copied locally for the code generation, but the generated code is importing the Go module from google.golang.org/genproto/googleapis/rpc/status
so it's not part of the vtproto generation steps.
Is this an issue that you've had to resolve or any suggestions on how to approach this?
This fast-marshaling is cool, but I would like to avoid having the VT
suffix in my codebase, and would like to simply continue using proto.Marshal(...)
.
I read that protoreflect's Message supports an optional ProtoMethods
method, and quoting the docs:
ProtoMethods returns optional fast-path implementions of various operations.
Is there a way vtprotobuf's fast-marshaling could be added to those methods, instead of new (Un)MarshalVT
methods?
Hi 👋 Ran into a bit of a weird one. Given the following proto:
syntax = "proto3";
package proto;
option go_package = "github.com/pfouilloux/vttest/proto";
message TestMsg {
oneof Test {
string a = 1;
string b = 2;
string c = 3;
}
}
and the following code:
package main_test
import (
"testing"
"github.com/planetscale/vtprotobuf/codec/grpc"
_ "google.golang.org/grpc/encoding/proto"
"vttest/proto/github.com/pfouilloux/vttest/proto"
)
//go:generate protoc --proto_path=proto --go_out=proto --go_opt=paths=source_relative --go-vtproto_out=proto --go-vtproto_opt=features=marshal+unmarshal+size oneof.proto
func TestMarshal(t *testing.T) {
test := &proto.TestMsg{Test: getA()}
_, err := grpc.Codec{}.Marshal(test)
if err != nil {
panic(err)
}
}
func getA() *proto.TestMsg_A {
return nil
}
I'm seeing the following error:
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x1239f1d]
goroutine 4 [running]:
testing.tRunner.func1.2({0x126e880, 0x14b3f20})
/usr/local/opt/go/libexec/src/testing/testing.go:1396 +0x24e
testing.tRunner.func1()
/usr/local/opt/go/libexec/src/testing/testing.go:1399 +0x39f
panic({0x126e880, 0x14b3f20})
/usr/local/opt/go/libexec/src/runtime/panic.go:884 +0x212
vttest/proto/github.com/pfouilloux/vttest/proto.(*TestMsg_A).MarshalToSizedBufferVT(0x126e580?, {0x14efc18?, 0xc000057601?, 0x0?})
/Users/pfouilloux/code/vttest/proto/github.com/pfouilloux/vttest/proto/oneof_vtproto.pb.go:72 +0x1d
vttest/proto/github.com/pfouilloux/vttest/proto.(*TestMsg_A).MarshalToVT(0x1240e01?, {0x14efc18?, 0x0?, 0x123a5a4?})
/Users/pfouilloux/code/vttest/proto/github.com/pfouilloux/vttest/proto/oneof_vtproto.pb.go:68 +0x6a
vttest/proto/github.com/pfouilloux/vttest/proto.(*TestMsg).MarshalToSizedBufferVT(0xc0001049c0?, {0x14efc18, 0x0, 0x0})
/Users/pfouilloux/code/vttest/proto/github.com/pfouilloux/vttest/proto/oneof_vtproto.pb.go:58 +0x133
vttest/proto/github.com/pfouilloux/vttest/proto.(*TestMsg).MarshalVT(0xc0001049c0)
/Users/pfouilloux/code/vttest/proto/github.com/pfouilloux/vttest/proto/oneof_vtproto.pb.go:27 +0x58
github.com/planetscale/vtprotobuf/codec/grpc.Codec.Marshal({}, {0x12a20c0, 0xc0001049c0})
/Users/pfouilloux/go/pkg/mod/github.com/planetscale/[email protected]/codec/grpc/grpc_codec.go:20 +0x42
vttest_test.TestMarshal(0x0?)
/Users/pfouilloux/code/vttest/oneof_test.go:15 +0x47
testing.tRunner(0xc0000076c0, 0x12c8b20)
/usr/local/opt/go/libexec/src/testing/testing.go:1446 +0x10b
created by testing.(*T).Run
/usr/local/opt/go/libexec/src/testing/testing.go:1493 +0x35f
Process finished with the exit code 1
It looks like there is a nil check missing in the implementation of MarshalToVT for *TestMsg_A
func (m *TestMsg_A) MarshalToVT(dAtA []byte) (int, error) {
size := m.SizeVT()
return m.MarshalToSizedBufferVT(dAtA[:size])
}
I'm more than happy to raise a PR to address this if you could give me some guidance on where to add the appropriate tests.
Kind regards & thanks for sharing your work with the community!
👋
Are there any plans to support adding features per-msg or per-field?
For example, for some of our string
or []byte
fields, we prefer to unmarshal them as "unsafe" so as to avoid an allocation if we don't plan to keep the data in memory past the lifetime of the original message.
I was thinking that type of behavior could be added as an annotation in the proto, similar to how the extensions in gogo currently work.
Is that something that is 1.) feasible with this project 2.) something that you would be interested in?
If so, I could try to come up with a POC
Thanks
Works fine...
--go-vtproto_opt=features=marshal+unmarshal+size+pool \
Making sure to get latest go install github.com/planetscale/vtprotobuf/cmd/protoc-gen-go-vtproto@latest
--go-vtproto_opt=features=marshal+unmarshal+size+pool+clone \
Gives --go-vtproto_out: unknown feature: "clone"
https://pkg.go.dev/google.golang.org/protobuf/proto#UnmarshalOptions
Currently UnmarshalVT tracks unknownFields:
default:
iNdEx = preIndex
skippy, err := skip(dAtA[iNdEx:])
if err != nil {
return err
}
if (skippy < 0) || (iNdEx+skippy) < 0 {
return ErrInvalidLength
}
if (iNdEx + skippy) > l {
return io.ErrUnexpectedEOF
}
m.unknownFields = append(m.unknownFields, dAtA[iNdEx:iNdEx+skippy]...) <----
iNdEx += skippy
It would be helpful to have an option to discard these.
Documentation says:
MarshalToVT() ... This function is useful e.g. when using memory pooling to re-use serialization buffers.
Could you please provide clarification on how to use it? I'm asking because protobufs are typically used with gRPC, and grpc-go's SendMsg() returns before the buffer gets put on the wire. Hence, you cannot reuse it. Here is a relevant issue: grpc/grpc-go#2159. Someone has even attempted this http://www.golangdevops.com/2019/12/31/autopool/ but it's not a solution and the finalizers have poor performance. You could find even more details here thanos-io/thanos#4609.
Are there any examples of the usage of this function? I couldn't find anything.
If I am correct then the recommendation in the README seems dangerous.
So I've got a slightly different method of generating protobufs. I used the buf cli to generate protos and my config file looks like the following:
version: v1
plugins:
- name: go
out: ./generated/
opt: paths=source_relative
- plugin: buf.build/grpc/go:v1.3.0
out: ./
opt:
- paths=source_relative
- plugin: go-vtproto
out: ./
opt:
- paths=source_relative
- features=marshal+unmarshal+size+equal+pool+clone
I have both used and not used features
, which ends up having the same outcome. None of the pool methods are generated on the types, such as ReturnToVTPool
or ResetVT
. All other methods are generated.
I'm using the latest version of the plugin.
I noticed that in a bug, someone was doing:
message Parent {
option (vtproto.mempool) = true;
repeated Child children = 1;
Child one = 2;
}
message Child {
option (vtproto.mempool) = true;
uint32 field = 1;
}
with option (vtproto.mempool) = true;
This didn't work for me, giving an error about an unsupported option. Also, the only reference I found was in the bug, I never saw it in regular documentation.
I'm sure I'm missing something simple, but not sure what it is. Any help would be appreciated. And thanks for the hard work on this project. We certainly needed something to take up the slack with gogoproto being deprecated.
In order to support usage of a protoc plugin in bazel rules one needs to ensure that such plugin does always generate a file per each input file.
Would it be fine to add a feature to force a file generation? e.g. force
feature which would just unconditionally return true
as the result of GenerateFile(...)
First, just want to say thank you for working on and releasing vtprotobuf
. We're working on transitioning out of gogo/protobuf
and read your great blog post announcing this alternative.
We've found a slight issue with our use case. We have a few proto packages that have multiple files in them. The generated _vtproto.pb.go
files redeclare some utility functions/variables (e.g. sov
, skip
, ErrInvalidLength
, etc.). As an example:
hellopb/service.proto
:
syntax = "proto3";
package hellopb;
...
message HelloRequest {
string q = 1;
}
message HelloResponse {
string response = 2;
}
service HelloService {
rpc Hello(HelloRequest) returns (HelloResponse) {}
}
hellopb/db.proto
:
syntax = "proto3";
package hellopb;
...
message MessageEntry {
string q = 1;
...
}
We then run:
protoc --proto_path=. --proto_path=../../../ \
--go_out=../../../ --plugin protoc-gen-go=/go/bin/protoc-gen-go \
--go-grpc_out=../../../ --plugin protoc-gen-go-grpc=/go/bin/protoc-gen-go-grpc \
--grpc-gateway_out=../../../ \
--go-vtproto_out=../../../ --plugin protoc-gen-go-vtproto=/go/bin/protoc-gen-go-vtproto \
--go-vtproto_opt=features=marshal+unmarshal+size hellopb/service.proto
and
protoc --proto_path=. --proto_path=../../../ \
--go_out=../../../ --plugin protoc-gen-go=/go/bin/protoc-gen-go \
--go-grpc_out=../../../ --plugin protoc-gen-go-grpc=/go/bin/protoc-gen-go-grpc \
--grpc-gateway_out=../../../ \
--go-vtproto_out=../../../ --plugin protoc-gen-go-vtproto=/go/bin/protoc-gen-go-vtproto \
--go-vtproto_opt=features=marshal+unmarshal+size hellopb/db.proto
If we run go vet
we get:
service_vtproto.pb.go:214:6: encodeVarint redeclared in this block
db_vtproto.pb.go:125:54: previous declaration
service_vtproto.pb.go:308:6: sov redeclared in this block
db_vtproto.pb.go:188:23: previous declaration
service_vtproto.pb.go:311:6: soz redeclared in this block
db_vtproto.pb.go:191:23: previous declaration
service_vtproto.pb.go:694:6: skip redeclared in this block
db_vtproto.pb.go:545:36: previous declaration
service_vtproto.pb.go:774:2: ErrInvalidLength redeclared in this block
db_vtproto.pb.go:625:2: previous declaration
service_vtproto.pb.go:775:2: ErrIntOverflow redeclared in this block
db_vtproto.pb.go:626:2: previous declaration
service_vtproto.pb.go:776:2: ErrUnexpectedEndOfGroup redeclared in this block
db_vtproto.pb.go:627:2: previous declaration
Is there anyway to avoid this? Post protoc
cleanup on this is pretty tough. Is there anyway those functions/variables could just be imported from vtprotobuf
?
Hello 👋
We have a large protobuf repo, using golang codegen, most of which is based on the V1 go message format. We are now starting to move to using the V2 message format, and are using vtprotobuf
for fast (de)serialization. However, we cannot do the migration all at once. Due to this, we have V2 generated code, that can depend on V1 format generated code during the migration. So, we need the vtproto-generated code to be compatible with the older code.
To handle such scenarios, it would be very helpful if there is an ability to substitute google.golang.org/protobuf/proto
with github.com/golang/protobuf/proto
. This would make our migration a lot smoother.
This could be exposed via an option/flag at codegen time.
cc @euroelessar
Hello,
While testing a bit this library in our code base, I noted that map support with pool is not perfect.
Doing these changes manually in the .pb.go reduces a lot allocations and speeds up unmarshalling.
I can provide a sample proto file and modifications made if needed.
Pre changes:
BenchmarkUnmarshalStdProto-12 202058 4971 ns/op 3840 B/op 54 allocs/op
BenchmarkUnmarshalVTProto-12 228591 5238 ns/op 3429 B/op 47 allocs/op
BenchmarkUnmarshalVTProtoWithPool-12 238689 4967 ns/op 2605 B/op 44 allocs/op
Post changes:
BenchmarkUnmarshalStdProto-12 203602 5240 ns/op 3840 B/op 54 allocs/op
BenchmarkUnmarshalVTProto-12 199917 5864 ns/op 3433 B/op 47 allocs/op
BenchmarkUnmarshalVTProtoWithPool-12 601562 2009 ns/op 302 B/op 5 allocs/op
In response to "I actually have no idea of how to switch encoders in Twirp. Maybe it's not even possible.", this can't be done out of the box with the code generation, but can be achieved (server-side) with a simple find and replace.
The changes are below:
proto.Marshal(respContent) >> respContent.MarshalVT()
proto.Unmarshal(buf, reqContent) >> reqContent.UnmarshalVT(buf)
I'm using make to control all code gen, so I added the below as a final step:
for twirp in $${dir}/*.twirp.go; \
do \
echo 'Updating' $${twirp}; \
sed -i '' -e 's/respBytes, err := proto.Marshal(respContent)/respBytes, err := respContent.MarshalVT()/g' $${twirp}; \
sed -i '' -e 's/if err = proto.Unmarshal(buf, reqContent); err != nil {/if err = reqContent.UnmarshalVT(buf); err != nil {/g' $${twirp}; \
done; \
For messages that contain many string
fields (e.g. repeated string
with many elements coming in), UnmarshalVT
can spend a lot of CPU time in runtime.slicetobytestring
. Indeed, when decoding the []byte
data, it does a string(bytes)
cast (e.g. m.Foo1 = string(dAtA[iNdEx:postIndex])
). Since []byte
is mutable and string
is not, this cast requires an allocation for safety. This allocation, repeated many times, sometimes turns out to be expensive.
We could avoid this by using the unsafe
package that allows us to perform this cast without an allocation:
func unsafeBytesToString(b []byte) string {
return *(*string)(unsafe.Pointer(&b))
}
Of course, the user has to be careful because if they overwrite the []byte
data they received from the wire, then the string
is mutated. So, this feature should be opt-in in my opinion.
This feature is mentioned in another issue: Per Msg/Field Features. I think having per-message / per-field features is not mandatory to implement unsafe casting, though.
Note that this feature applies to bytes
fields as well where we could reference data instead of copying it.
I think this feature is worth implementing and I'm open to ideas. In my opinion, a simple and pragmatic approach to implement it would be to add unsafe functions, such as UnmarshalVTUnsafe
, which perform such operations for all applicable fields. I've actually started such an implementation on my personal fork and it seems to work well (diff for anyone curious, although it doesn't work yet for bytes
fields). Now, before spending more time on it, I would love to gauge interest and hear considerations from others!
A few considerations I had on my side:
UnmarshalVT
is mandatory because several applications can use the same generated code for a message and we cannot ask them to all be careful not overwriting data received from the wire.--unsafe
flag (different from features) to trigger the generation of unsafe functions.Let me know what you folks think about all this!
wrong pool unmarshal size.
my proto file:
// protoc --go_out=. --plugin protoc-gen-go="/Users/jie.yang05/go/bin/protoc-gen-go" --go-vtproto_out=. --plugin protoc-gen-go-vtproto="/Users/jie.yang05/go/bin/protoc-gen-go-vtproto" --go-vtproto_opt=features=marshal+unmarshal+size+pool ./lineentry.proto
syntax = "proto3";
package index;
option go_package="./proto";
import "github.com/planetscale/vtprotobuf/vtproto/ext.proto";
message lineEntries {
option (vtproto.mempool) = true; // Enable memory pooling
repeated lineEntry lineEntries = 1;
}
message lineEntry {
uint64 address = 1;
uint32 line = 2;
uint32 file = 3;
}
comand
yangjie05-mac:index jie.yang05$ protoc --go_out=. --plugin protoc-gen-go="/Users/jie.yang05/go/bin/protoc-gen-go" --go-vtproto_out=. --plugin protoc-gen-go-vtproto="/Users/jie.yang05/go/bin/protoc-gen-go-vtproto" -I /Users/jie.yang05/go/pkg/mod/github.com/planetscale/vtprotobuf\@v0.4.0/include -I ./ ./lineentry.proto
generate code:
func (m *LineEntries) ResetVT() {
for _, mm := range m.LineEntries {
mm.ResetVT()
}
m.Reset()
}
func (m *LineEntries) ReturnToVTPool() {
if m != nil {
m.ResetVT()
vtprotoPool_LineEntries.Put(m)
}
}
generate code have wrong ResetVT.
Proposal: new extension called scrub
which adds a function Scrub()
to messages.
Similar to Reset()
, except it recursively overwrites all buffers & fields in the message with zeros.
First off, wanted to say thanks for such a great project. While the issue title may sound like a feature request, the code generated by vtprotobuf already works with TinyGo. You can see an example of that here: https://github.com/kyleconroy/go-wasm-plugins
What I'd like to ask about is making TinyGo support explicit via a CI job. Is this something you'd be interested in supporting? If so I can take a first pass.
Upstream proto has a "deterministic" mode: https://github.com/protocolbuffers/protobuf-go/blob/f221882bfb484564f1714ae05f197dea2c76898d/proto/encode.go#L50
I think this is more-strict than marshal_strict (but I could be wrong) in that it also sorts maps.
Interestingly, there is a "Stable" field here:
vtprotobuf/features/marshal/marshalto.go
Line 43 in 96ede25
It might be nice to add this option, or if its already possible documentation around it.
I'm using protobuf and trying this plugin, but seems like it's lacking the support for extensionFields 🤔? Am I right, or misunderstanding something here?
Unfortunately when using buf.build, #19 pops up again, except since buf is doing the work there's no easy way just run a single protoc.
I'm not sure if this is something vtprotobuf can work around, or if this requires a feature in buf but it's a shame that these two tools don't play nicely together.
Generated pool code currently uses sync.Pool which is nice because there are no external dependencies.
However, there is a small dependency we use instead that avoids a known allocation issue with sync.Pool (and introduces type safety, but it doesn't really matter since this is generated code).
When func (p *YourProto) ReturnToVTPool()
is called, children of YourProto
that implement method ReturnToVTPool()
should also be returned to the pool.
Because the comparison logic for oneof fields relies on the getters for the individual fields, it cannot differentiate between a field not being set, and a field being set to the zero value. While the nil
checks allow distinguishing protos where one has a oneof field set to a zero value, while the other doesn't have any field in the oneof set, the code fails to distinguish protos where different fields in a oneof are set to the respective zero value.
Test case:
func TestEqualVT_Oneof_AbsenceVsZeroValue(t *testing.T) {
a := &TestAllTypesProto3{
OneofField: &TestAllTypesProto3_OneofUint32{
OneofUint32: 0,
},
}
b := &TestAllTypesProto3{
OneofField: &TestAllTypesProto3_OneofString{
OneofString: "",
},
}
aJson, err := protojson.Marshal(a)
require.NoError(t, err)
bJson, err := protojson.Marshal(b)
require.NoError(t, err)
if a.EqualVT(b) {
assert.JSONEq(t, string(aJson), string(bJson))
err := fmt.Errorf("these %T should not be equal:\nmsg = %+v\noriginal = %+v", a, a, b)
require.NoError(t, err)
}
}
This is similar to #48 , but applies to oneofs and exercises different paths in the code generation.
Thanks for putting this library together!
Would it be possible to add support for non nullable objects - similar to gogo annotation gogoproto.nullable
?
Does vtprotobuf provide custom tag support like gogoprotobuf? I can't seem to find any information about this online.
in a .proto file, with gogoprotobuf I can attach customized tags like:
import "github.com/gogo/protobuf/gogoproto/gogo.proto";
option go_package = "events";
option (gogoproto.unmarshaler_all) = true;
option (gogoproto.sizer_all) = true;
option (gogoproto.marshaler_all) = true;
message Event {
string AuctionID = 1 [(gogoproto.moretags) = 'gorm:"column:auction_id;type:VARCHAR(64);primary_key"'];
int64 CampaignID = 2 [(gogoproto.moretags) = 'gorm:"column:campaign_id;type:BIGINT;index"'];
int64 ImpIndex = 3 [(gogoproto.moretags) = 'gorm:"column:imp_index;type:INT"'];
string DomainKey = 5 [(gogoproto.moretags) = 'gorm:"column:domain_key;type:VARCHAR(1024)"'];
Does vtprotobuf provide any support for anything like gogoproto.moretags
?
One of the features (which is on by default if features are not specified) is grpc. A quick look at the code and I can't seem to find any difference between the regular grpc plugin and the code generated vtprotobuf. Is there a plan to extends this to use say use pool for object creation in the future?
All fields in proto2 and fields explicitly marked optional
in proto3 have a presence property, i.e., the field not being set is always different from the field being set, even when set to the zero value. The generated code for equal
correctly checks for equality of this presence property for optional scalar and message fields (i.e., fields with a pointer Go type), but not for bytes (which, in the proto world, is a scalar type, but in Go maps to a nilable reference type).
It is correct to not differentiate between []byte(nil)
and []byte{}
for fields without presence (i.e., fields in a oneof, repeated fields, and fields with neither optional
nor repeated
in proto3), however, for all the other cases, this difference should be taken into account as it indicates presence/absence.
Example test case:
func TestEqualVT_Proto2_BytesPresence(t *testing.T) {
a := &TestAllTypesProto2{
OptionalBytes: nil,
}
b := &TestAllTypesProto2{
OptionalBytes: []byte{},
}
require.False(t, proto.Equal(a, b))
aJson, err := protojson.Marshal(a)
require.NoError(t, err)
bJson, err := protojson.Marshal(b)
require.NoError(t, err)
if a.EqualVT(b) {
assert.JSONEq(t, string(aJson), string(bJson))
err := fmt.Errorf("these %T should not be equal:\nmsg = %+v\noriginal = %+v", a, a, b)
require.NoError(t, err)
}
}
PR coming
Using this proto:
message Parent {
option (vtproto.mempool) = true;
repeated Child children = 1;
Child one = 2;
}
message Child {
option (vtproto.mempool) = true;
uint32 field = 1;
}
When calling ReturnToVTPool()
on Parent it calls ResetVT
on all children and then calls m.Reset()
func (m *Parent) ResetVT() {
for _, mm := range m.Children {
mm.ResetVT()
}
m.One.ReturnToVTPool()
m.Reset()
}
However m.Reset()
allocates a new object and overwrites the existing object entirely:
func (x *Parent) Reset() {
*x = Parent{}
This nils out all fields on the parent throwing away the slice for the GC to handle. Am I missing something? Is there some way to put back into the pool, call ResetVT()
but not call Reset()
?
I am building an application protocol with protobufs, and I'm using vtprotobuf
exclusively to marshal and unmarshal the messages. Currently, I'm experiencing strange behavior I'm not understanding that I think is related to vtprotobuf
.
Here are my message definitions:
message Header {
fixed32 Size = 1; // Size of the next message
fixed32 Checksum = 2; // Checksum of the serialized message
}
message RaftControlPayload {
oneof Types {
GetLeaderIDRequest GetLeaderIdRequest = 1;
GetLeaderIDResponse GetLeaderIdResponse = 2;
IdRequest IdRequest = 3;
IdResponse IdResponse = 4;
IndexState IndexState = 5;
ModifyNodeRequest ModifyNodeRequest = 6;
ReadIndexRequest ReadIndexRequest = 7;
ReadLocalNodeRequest ReadLocalNodeRequest = 8;
RequestLeaderTransferResponse RequestLeaderTransferResponse = 9;
RequestSnapshotRequest RequestSnapshotRequest = 10;
SnapshotOption SnapshotOption = 12;
StopNodeResponse StopNodeResponse = 13;
StopRequest StopRequest = 14;
StopResponse StopResponse = 15;
SysOpState SysOpState = 16;
DBError Error = 17;
}
enum MethodName {
ADD_NODE = 0;
ADD_OBSERVER = 1;
ADD_WITNESS = 2;
GET_ID = 3;
GET_LEADER_ID = 4;
READ_INDEX = 5;
READ_LOCAL_NODE = 6;
REQUEST_COMPACTION = 7;
REQUEST_DELETE_NODE = 8;
REQUEST_LEADER_TRANSFER = 9;
REQUEST_SNAPSHOT = 10;
STOP = 11;
STOP_NODE = 12;
}
MethodName Method = 18;
}
This message serializes to 10 bytes, which I send across a network stream as a header for whatever unknown message payload is coming next. This allows me to simply pass raw protobuf messages across a network stream without having to leverage gRPC or other RPC frameworks.
Sending a message across the network stream is pretty straightforward. I prepare a message, serialize the message, create a header with all of the appropriate values, serialize the header, send the header, then send the message.
idReqPayload := &database.RaftControlPayload{
Method: database.RaftControlPayload_GET_ID,
Types: &database.RaftControlPayload_IdRequest{
IdRequest: &database.IdRequest{},
},
}
payloadBuf, _ := idReqPayload.MarshalVT()
initialHeader := &transportv1.Header{
Size: uint32(len(payloadBuf)),
Checksum: crc32.ChecksumIEEE(payloadBuf),
}
headerBuf, _ := initialHeader.MarshalVT()
stream.Write(headerBuf)
stream.Write(payloadBuf)
Receiving a message on the network stream is also pretty straightforward. I read the header into a buffer, deserialize it, read the next N bytes from the stream based off the Size
field in the header message, and verify some checksums, then serialize the byte array into the equivalent messages.
headerBuf := make([]byte, 10)
if _, err := io.ReadFull(stream, headerBuf); err != nil {
logger.Error().Err(err).Msg("cannot readAndHandle raft control header")
continue
}
// marshall the header
header := &transportv1.Header{}
if err := header.UnmarshalVT(headerBuf); err != nil {
logger.Error().Err(err).Msg("cannot unmarshal header")
return
}
// prep the message buffer
msgBuf := make([]byte, header.Size)
if _, err := io.ReadFull(stream, msgBuf); err != nil {
logger.Error().Err(err).Msg("cannot read message payload")
return
}
// verify the message is intact
checked := crc32.ChecksumIEEE(msgBuf)
if checked != header.GetChecksum() {
logger.Error().Msg("checksums do not match")
}
// unmarshal the payload
msg := &database.RaftControlPayload{}
if err := msg.UnmarshalVT(msgBuf); err != nil {
logger.Error().Err(err).Msg("cannot unmarshal payload")
}
Here's where things start to get confusing. When I serialize idReqPayload
via MarshalVT()
and run a checksum against it, I'll get uint32(1298345897)
; when I send the header as you see here, the Size
field is uint32(5)
and Checksum
is uint32(1298345897)
. When the header message gets deserialized on the receiving end of a localhost connection, it looks very different.
The header message gets deserialized with the Size
field being uint32(5)
and the Checksum
field being uint(1)
. That's the first strange thing.
When I run a checksum against the next 5 bytes of the serialized idReqPayload
payload which followed, it checksums to uint32(737000948)
even though there was no change to the byte array from the time it was serialized to the time it was received. That's the second strange thing.
When I run an equality check against the value of the deserialised header Checksum
field against a local checksum of the serialized idReqPayload
payload with checked := crc32.ChecksumIEEE(msgBuf); if checked != header.GetChecksum() { // ... }
, it passes an equality check - the deserialized header Checksum
field's value is uint(1)
whereas the calculated checksum of the received message is uint32(737000948)
. That's the third strange thing.
When I deserialize the serialized idReqPayload
byte array, it deserializes without an error. However, the message information is incorrectly serialized. When I serialize protobuf with this configuration:
idReqPayload := &database.RaftControlPayload{
Method: database.RaftControlPayload_GET_ID,
Types: &database.RaftControlPayload_IdRequest{
IdRequest: &database.IdRequest{},
},
}
It deserializes into this equivalent:
msg := &database.RaftControlPayload{
Method: database.RaftControlPayload_ADD_NODE,
Types: nil,
}
The Method
field is reset so the enum
is defaulted to 0, and the Types
field is nil.
I'm fairly positive this could partially be related to #51, but I updated my local protoc-gen-go-vtproto
binary to 0ae748f and the problem still persists. I've also eliminated the network stream as it's a localhost network stream, so nothing is intercepting it or modifying it in transit.
Am I doing something wrong or is this a bug of some kind?
"clone" would generates a VTClone() function to duplicate a message (like proto.Clone)
"copy" would generate "VTCopy(dst)" to copy the contents of a message to a target message.
I have a couple of questions regarding pooling and gRPC that I could not fully understand from existing issues or the readme. (happy to do a PR to clarify in the readme after understanding for the next person)
In #16 (comment) it was mentioned, that memory-pooled objects are automatically unmarshaled using objects from a pool, which I could get to work. How is this intended to work for the other way around though? Does it happen automatically somewhere like during marshaling (if so I can't find the code that does this)? If not, is there a recommended or suggested place where this could be done (maybe in the codec?)
My use case is I have a lot of objects that I read from a key-value store that eventually end up in a gRPC response, but after marshaling the response I would like to return the objects to the pool.
Hey all, any chance of cutting a new release?
The latest version is 0.3.0 and a bunch of stuff has been added since then such as some of the documented options (pool and so on). Is the current main stable? Can a new release be made?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.