Giter Site home page Giter Site logo

ttrpc's Introduction

ttrpc

Build Status

GRPC for low-memory environments.

The existing grpc-go project requires a lot of memory overhead for importing packages and at runtime. While this is great for many services with low density requirements, this can be a problem when running a large number of services on a single machine or on a machine with a small amount of memory.

Using the same GRPC definitions, this project reduces the binary size and protocol overhead required. We do this by eliding the net/http, net/http2 and grpc package used by grpc replacing it with a lightweight framing protocol. The result are smaller binaries that use less resident memory with the same ease of use as GRPC.

Please note that while this project supports generating either end of the protocol, the generated service definitions will be incompatible with regular GRPC services, as they do not speak the same protocol.

Protocol

See the protocol specification.

Usage

Create a gogo vanity binary (see cmd/protoc-gen-gogottrpc/main.go for an example with the ttrpc plugin enabled.

It's recommended to use protobuild to build the protobufs for this project, but this will work with protoc directly, if required.

Differences from GRPC

  • The protocol stack has been replaced with a lighter protocol that doesn't require http, http2 and tls.
  • The client and server interface are identical whereas in GRPC there is a client and server interface that are different.
  • The Go stdlib context package is used instead.

Status

TODO:

  • Add testing under concurrent load to ensure
  • Verify connection error handling

Project details

ttrpc is a containerd sub-project, licensed under the Apache 2.0 license. As a containerd sub-project, you will find the:

information in our containerd/project repository.

ttrpc's People

Contributors

akihirosuda avatar austinvazquez avatar chenrui333 avatar cpuguy83 avatar crosbymichael avatar dependabot[bot] avatar dmcgowan avatar elboulangero avatar estesp avatar ethan-lowman-dd avatar fuweid avatar georgethebeatle avatar iceber avatar justincormack avatar kevpar avatar klihub avatar kzys avatar lifupan avatar liggitt avatar mikebrow avatar mxpv avatar pacoxu avatar random-liu avatar samuelkarp avatar saschagrunert avatar shsjshentao avatar stevvooe avatar thajeztah avatar tklauser avatar vbatts avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ttrpc's Issues

server_test.go breaks with go 1.10

These test are run in Debian Sid.

Let's fetch ttrpc

go get github.com/stevvooe/ttrpc

Then run the tests.

With go 1.9 everything is fine

# /usr/lib/go-1.9/bin/go test github.com/stevvooe/ttrpc -v -run ''
=== RUN   TestReadWriteMessage
--- PASS: TestReadWriteMessage (0.00s)
=== RUN   TestMessageOversize
--- PASS: TestMessageOversize (0.01s)
=== RUN   TestServer
--- PASS: TestServer (0.00s)
=== RUN   TestServerNotFound
--- PASS: TestServerNotFound (0.00s)
=== RUN   TestServerListenerClosed
--- PASS: TestServerListenerClosed (0.00s)
=== RUN   TestServerShutdown
--- PASS: TestServerShutdown (0.20s)
=== RUN   TestServerClose
--- PASS: TestServerClose (0.00s)
=== RUN   TestOversizeCall
--- PASS: TestOversizeCall (0.01s)
=== RUN   TestClientEOF
--- PASS: TestClientEOF (0.00s)
=== RUN   TestUnixSocketHandshake
--- PASS: TestUnixSocketHandshake (0.00s)
PASS
ok  	github.com/stevvooe/ttrpc	0.234s

With go 1.10, things break

# /usr/lib/go-1.10/bin/go test github.com/stevvooe/ttrpc -v -run ''
=== RUN   TestReadWriteMessage
--- PASS: TestReadWriteMessage (0.00s)
=== RUN   TestMessageOversize
--- PASS: TestMessageOversize (0.01s)
=== RUN   TestServer
--- PASS: TestServer (0.00s)
=== RUN   TestServerNotFound
--- FAIL: TestServerNotFound (0.00s)
	server_test.go:139: accept unix @TestServerNotFound: use of closed network connection
=== RUN   TestServerListenerClosed
--- PASS: TestServerListenerClosed (0.00s)
=== RUN   TestServerShutdown
--- FAIL: TestServerShutdown (0.20s)
	server_test.go:236: accept unix @TestServerShutdown: use of closed network connection
=== RUN   TestServerClose
--- PASS: TestServerClose (0.00s)
=== RUN   TestOversizeCall
--- FAIL: TestOversizeCall (0.01s)
	server_test.go:298: accept unix @TestOversizeCall: use of closed network connection
=== RUN   TestClientEOF
--- FAIL: TestClientEOF (0.00s)
	server_test.go:329: accept unix @TestClientEOF: use of closed network connection
=== RUN   TestUnixSocketHandshake
--- PASS: TestUnixSocketHandshake (0.00s)
FAIL
exit status 1
FAIL	github.com/stevvooe/ttrpc	0.232s

Deadlock with multiple simultaneous requests from the same client

There is a possible deadlock in the TTRPC server/client interactions when there are multiple simultaneous requests from the same client connection. This causes both the server and client handler goroutines to deadlock.

I've repro'd this on both Linux (with unix sockets as the transport) and Windows (with both unix sockets and named pipes as the transport). It repros more easily when the transport has less buffering, and when there are more goroutines sending requests concurrently from the client.

I intend to look into how this can be fixed, but filing an issue for awareness and in case someone else wants to tackle it in the meantime. :)

Stacks

Server

goroutine 138 [IO wait]:
internal/poll.runtime_pollWait(0x7f63e0790f08, 0x77, 0xffffffffffffffff)
	/go1.14.3/src/runtime/netpoll.go:203 +0x55
internal/poll.(*pollDesc).wait(0xc00018c198, 0x77, 0x31000, 0x31012, 0xffffffffffffffff)
	/go1.14.3/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitWrite(...)
	/go1.14.3/src/internal/poll/fd_poll_runtime.go:96
internal/poll.(*FD).Write(0xc00018c180, 0xc0003e2ff6, 0x31012, 0x31012, 0x0, 0x0, 0x0)
	/go1.14.3/src/internal/poll/fd_unix.go:276 +0x290
net.(*netFD).Write(0xc00018c180, 0xc0003e2ff6, 0x31012, 0x31012, 0x1000, 0x0, 0x0)
	/go1.14.3/src/net/fd_unix.go:220 +0x4f
net.(*conn).Write(0xc000010018, 0xc0003e2ff6, 0x31012, 0x31012, 0x0, 0x0, 0x0)
	/go1.14.3/src/net/net.go:196 +0x8e
bufio.(*Writer).Write(0xc000090000, 0xc0003e2ff6, 0x31012, 0x31012, 0xa, 0xf00032008, 0x2)
	/go1.14.3/src/bufio/bufio.go:623 +0x13b
github.com/containerd/ttrpc.(*channel).send(0xc000090100, 0x20000000f, 0xc0003e2000, 0x32008, 0x32008, 0x0, 0x0)
	/go/pkg/mod/github.com/containerd/[email protected]/channel.go:127 +0xbc
github.com/containerd/ttrpc.(*serverConn).run(0xc000094230, 0x6bf120, 0xc0000a2010)
	/go/pkg/mod/github.com/containerd/[email protected]/server.go:459 +0x64b
created by github.com/containerd/ttrpc.(*Server).Serve
	/go/pkg/mod/github.com/containerd/[email protected]/server.go:127 +0x288

goroutine 139 [select]:
github.com/containerd/ttrpc.(*serverConn).run.func1(0xc0001000c0, 0xc000094230, 0xc000100180, 0xc000090100, 0xc000100120, 0xc000076540)
	/go/pkg/mod/github.com/containerd/[email protected]/server.go:404 +0x69b
created by github.com/containerd/ttrpc.(*serverConn).run
	/go/pkg/mod/github.com/containerd/[email protected]/server.go:332 +0x2c0

Client

goroutine 19 [IO wait]:
internal/poll.runtime_pollWait(0x7fd97990df18, 0x77, 0xffffffffffffffff)
	/go1.14.3/src/runtime/netpoll.go:203 +0x55
internal/poll.(*pollDesc).wait(0xc0000f6098, 0x77, 0x31000, 0x31021, 0xffffffffffffffff)
	/go1.14.3/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitWrite(...)
	/go1.14.3/src/internal/poll/fd_poll_runtime.go:96
internal/poll.(*FD).Write(0xc0000f6080, 0xc0009ecff6, 0x31021, 0x31021, 0x0, 0x0, 0x0)
	/go1.14.3/src/internal/poll/fd_unix.go:276 +0x290
net.(*netFD).Write(0xc0000f6080, 0xc0009ecff6, 0x31021, 0x31021, 0x1000, 0x0, 0x0)
	/go1.14.3/src/net/fd_unix.go:220 +0x4f
net.(*conn).Write(0xc0000a8028, 0xc0009ecff6, 0x31021, 0x31021, 0x0, 0x0, 0x0)
	/go1.14.3/src/net/net.go:196 +0x8e
bufio.(*Writer).Write(0xc0000f8040, 0xc0009ecff6, 0x31021, 0x31021, 0xa, 0x1500032017, 0x1)
	/go1.14.3/src/bufio/bufio.go:623 +0x13b
github.com/containerd/ttrpc.(*channel).send(0xc0000f8080, 0x100000015, 0xc0009ec000, 0x32017, 0x32017, 0x0, 0x0)
	/go/pkg/mod/github.com/containerd/[email protected]/channel.go:127 +0xbc
github.com/containerd/ttrpc.(*Client).send(0xc0000f6280, 0x100000015, 0x63b500, 0xc000022480, 0x1, 0x0)
	/go/pkg/mod/github.com/containerd/[email protected]/client.go:324 +0x86
github.com/containerd/ttrpc.(*Client).run(0xc0000f6280)
	/go/pkg/mod/github.com/containerd/[email protected]/client.go:273 +0x5ab
created by github.com/containerd/ttrpc.NewClient
	/go/pkg/mod/github.com/containerd/[email protected]/client.go:94 +0x1bd

goroutine 6 [select]:
github.com/containerd/ttrpc.(*receiver).run(0xc00000e080, 0x6bf0e0, 0xc0000f8000, 0xc0000f8080)
	/go/pkg/mod/github.com/containerd/[email protected]/client.go:222 +0x241
created by github.com/containerd/ttrpc.(*Client).run
	/go/pkg/mod/github.com/containerd/[email protected]/client.go:262 +0x1fa

Analysis

Basically, the server has a "receiver" goroutine that receives new requests from the transport, and sends a message via channel to the "worker" goroutine. The "worker" goroutine has a loop and a select to handle either a new request message, or a response that needs to be sent to the client. When the deadlock occurs, the server is stuck blocking on a response write to the transport from the "worker" goroutine, while the "receiver" goroutine is stuck trying to send a request message to the "worker" goroutine.

The client side is basically the inverse of this, where the "receiver" goroutine is stuck trying to send a response message received on the transport to the "worker" goroutine via channel. The "worker" goroutine is likewise stuck trying to send a new request to the server via the transport.

This looks like it should only occur when the connection is busy enough that the transport buffer is filled, as otherwise the server and client writes to the transport would simply be fulfilled by the buffer, and would not block waiting for a reader on the other end.

The interesting places in the code where the 4 goroutines are stuck are linked below:
Server receiver sending message to worker: https://github.com/containerd/ttrpc/blob/v1.0.2/server.go#L404
Server worker writing response to transport: https://github.com/containerd/ttrpc/blob/v1.0.2/server.go#L459
Client receiver sending message to worker: https://github.com/containerd/ttrpc/blob/v1.0.2/client.go#L222
Client worker writing request to transport: https://github.com/containerd/ttrpc/blob/v1.0.2/client.go#L273

Sample

I have a repro program here. This program can be run as either a server (go run . server) or client (go run . client). The server implements a very simple TTRPC server that listens for connections. The client spawns multiple goroutines to constantly send requests to the server and print their ID each time they get a response. Each request/response has a bunch of junk data added to the message to try to avoid the affects of buffering on the underlying transport. When run, you will generally see a little bit of output from the client, but then it will stop when the deadlock occurs. You can also hit enter on either the server or client to cause them to dump their current goroutine stacks to a file.

Shouldn't ttrpc use UNIMPLEMENTED for methods/services which haven't been implemented?

While gRPC has UNIMPLEMENTED status code

https://grpc.github.io/grpc/core/md_doc_statuscodes.html

The operation is not implemented or is not supported/enabled in this service.

ttrpc uses NOT_FOUND for indicating that requested services/methods are not found.

ttrpc/services.go

Lines 116 to 128 in bfba540

func (s *serviceSet) resolve(service, method string) (Method, error) {
srv, ok := s.services[service]
if !ok {
return nil, status.Errorf(codes.NotFound, "service %v", service)
}
mthd, ok := srv.Methods[method]
if !ok {
return nil, status.Errorf(codes.NotFound, "method %v", method)
}
return mthd, nil
}

I think it is better to use UNIMPLEMENTED since it will be more compatible with gRPC and it allows clients to distinguish errors like "the requested entity (e.g. namespace) is not available" from "the requested method is not implemented".

ttrpc client

Are there any plugins that support other programming languages? e.g C

New branch for removing gogo/protobuf

I've been working on upgrading protobuf packages in containerd. Now Protobuild supports new protoc-gen-go and #96 will add new protoc-gen-go-ttrpc which should be able to generate ttrpc code that works with the new protoc-gen-go.

However, next few PRs will make some backward-incompatible changes, namely

  • Removing gogo/protobuf from ttrpc itself
  • Updating example/

Since ttrpc is used by containerd which has relatively long support policy, we would make some enhancements such as #75 and #94. So I'm reluctant to make ttrpc's main branch incompatible with gogo/protobuf right now.

Instead, could we have a branch to remove gogo/protobuf? If so, do we want to do Go-style branching (e.g. v2) instead of containerd-style (release/1.0)?

ttrpc: end to end timeout support

Support for timing out method calls and other operation on the connection isn't really present. We'll need to add this to be more production ready.

This includes the following:

  • Standard read timeouts on channel operations.
  • Connection timeouts (maybe at a higher level).
  • Context timeout propagation across network boundary.

How much memory usage does ttrpc reduce compared to grpc?

I don't know much about grpc. I'm just looking for a rpc framework in a memory-constrained (about 50mib) environment. Is ttrpc right for me? Or does grpc work well under these conditions? (stream api looks good)

Thanks in advance!

Handle forced connection closed by remote host error on Windows in release/1.1 branch

Problem

When enabling CI on Windows for release/1.1 branch (#163), an edge case was found via unit tests when running on the latest Windows platform with Go 1.20+ toolchain.

=== RUN   TestOversizeCall
--- PASS: TestOversizeCall (0.03s)
=== RUN   TestClientEOF
    server_test.go:365: expected to have a cause of ErrClosed, got read unix @->TestClientEOF: wsarecv: An existing connection was forcibly closed by the remote host.
--- FAIL: TestClientEOF (0.00s)
=== RUN   TestServerRequestTimeout
--- PASS: TestServerRequestTimeout (0.01s)
=== RUN   TestServerConnectionsLeak
--- PASS: TestServerConnectionsLeak (0.10s)
=== RUN   Test_MethodFullNameGeneration
--- PASS: Test_MethodFullNameGeneration (0.00s)
FAIL
	github.com/containerd/ttrpc	coverage: 76.3% of statements
FAIL	github.com/containerd/ttrpc	1.696s
?   	github.com/containerd/ttrpc/cmd/protoc-gen-gogottrpc	[no test files]
?   	github.com/containerd/ttrpc/example	[no test files]
?   	github.com/containerd/ttrpc/example/cmd	[no test files]
?   	github.com/containerd/ttrpc/plugin	[no test files]
FAIL

The root cause is client not able to filter the connection closed by server error on Windows platform with the updated toolchain.
Ref:

ttrpc/client.go

Line 385 in 20c493e

func filterCloseErr(err error) error {

The impact is the client error does not get set to ErrClosed as expected.
Ref:

var ErrClosed = errors.New("ttrpc: closed")

Recommended Solution

The recommended solution would be to filter the error based on platform error using golang.org/x/sys package.

The solution should enable testing on Windows platform and existing TestClientEOF test should be sufficient to determine if issue is resolved.

Alternative solutions considered

Filter io.ErrClosedPIpe (Not viable)

This solution was to filter error closed on io.ErrClosedPipe.
Ref: https://github.com/containerd/ttrpc/blame/4a2816be9b4843f2cf40f1e533e7830d23f4489b/client.go#L450

Opened #165 and found this approach did not resolve the issue.

Filter based on error string

This solution would filter error closed based on the error string.

panic: protobuf tag not enough fields in Status.state

Go 1.15 beta 1 , with github.com/gogo/protobuf 1.3.1
Not sure if I amdoung something wrong.

Testing    in: /builddir/build/BUILD/ttrpc-1.0.1/_build/src
         PATH: /builddir/build/BUILD/ttrpc-1.0.1/_build/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/sbin
       GOPATH: /builddir/build/BUILD/ttrpc-1.0.1/_build:/usr/share/gocode
  GO111MODULE: off
      command: go test -buildmode pie -compiler gc -ldflags "-X github.com/containerd/ttrpc/version=1.0.1 -extldflags '-Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld '"
      testing: github.com/containerd/ttrpc
github.com/containerd/ttrpc
panic: protobuf tag not enough fields in Status.state: 
goroutine 54 [running]:
github.com/gogo/protobuf/proto.(*unmarshalInfo).computeUnmarshalInfo(0x4000120a00)
	/usr/share/gocode/src/github.com/gogo/protobuf/proto/table_unmarshal.go:341 +0x1238
github.com/gogo/protobuf/proto.(*unmarshalInfo).unmarshal(0x4000120a00, 0x4000f1a000, 0x4000f16002, 0xf, 0xf, 0xaaaacb74a848, 0x4000026000)
	/usr/share/gocode/src/github.com/gogo/protobuf/proto/table_unmarshal.go:138 +0xb50
github.com/gogo/protobuf/proto.makeUnmarshalMessagePtr.func1(0x4000f16002, 0x10, 0x10, 0x4000158420, 0x2, 0x7, 0x4000f87ae8, 0xaaaacb6ebb08, 0x4000f87bc0, 0x756ea1e68af79e)
	/usr/share/gocode/src/github.com/gogo/protobuf/proto/table_unmarshal.go:1826 +0xd0
github.com/gogo/protobuf/proto.(*unmarshalInfo).unmarshal(0x4000120960, 0x4000158420, 0x4000f16000, 0x11, 0x11, 0xc, 0x4000f87c48)
	/usr/share/gocode/src/github.com/gogo/protobuf/proto/table_unmarshal.go:175 +0x5f4
github.com/gogo/protobuf/proto.(*InternalMessageInfo).Unmarshal(0x4000f0e040, 0xaaaacbbfdde0, 0x4000158420, 0x4000f16000, 0x11, 0x11, 0x4000f87c01, 0x0)
	/usr/share/gocode/src/github.com/gogo/protobuf/proto/table_unmarshal.go:63 +0x58
github.com/gogo/protobuf/proto.(*Buffer).Unmarshal(0x4000f23ce0, 0xaaaacbbfdde0, 0x4000158420, 0x0, 0x0)
	/usr/share/gocode/src/github.com/gogo/protobuf/proto/decode.go:424 +0x1c4
github.com/gogo/protobuf/proto.Unmarshal(0x4000f16000, 0x11, 0x11, 0xaaaacbbfdde0, 0x4000158420, 0x4000f022b8, 0x4000f87f68)
	/usr/share/gocode/src/github.com/gogo/protobuf/proto/decode.go:342 +0x110
github.com/containerd/ttrpc.(*Client).recv(0x4001008580, 0x4000158420, 0x4000f18000, 0x0, 0x0)
	/builddir/build/BUILD/ttrpc-1.0.1/_build/src/github.com/containerd/ttrpc/client.go:323 +0x13c
github.com/containerd/ttrpc.(*Client).run(0x4001008580)
	/builddir/build/BUILD/ttrpc-1.0.1/_build/src/github.com/containerd/ttrpc/client.go:273 +0x354
created by github.com/containerd/ttrpc.NewClient
	/builddir/build/BUILD/ttrpc-1.0.1/_build/src/github.com/containerd/ttrpc/client.go:92 +0x160
exit status 2
FAIL	github.com/containerd/ttrpc	0.514s

non Linux support

I am working on the non Linux support; it requires upstreaming changes to x/sys/unix for peercred support which turns out to be very tedious as it is all generated code.

Support socket control messages for rpc calls

I'd like to be able to have rpc's that can pass file descriptors over transports which support it.

Today there is some support for an interceptor with a custom handshaker to pass along unix rights messages on connection initiation.
The method of passing file descriptors is similar (same mechanism, different message).

The difference here is it would need to be supported per request instead of at connection time (I think), so I'm not sure we'd have access to a handshaker.

ttrpc: received message for unknown channel

time="2022-05-10T01:54:22.789465877+08:00" level=error msg="RemoveContainer for "7f4387d5fe055630e7d36e5a5b264943abab5231e63105d19217ad94009fb5c9" failed" error="rpc error: code = NotFound desc = get container info: container "7f4387d5fe055630e7d36e5a5b264943abab5231e63105d19217ad94009fb5c9" in namespace "k8s.io": not found"
time="2022-05-10T01:55:22.792246352+08:00" level=info msg="RemoveContainer for "7f4387d5fe055630e7d36e5a5b264943abab5231e63105d19217ad94009fb5c9""
time="2022-05-10T01:55:22.792408430+08:00" level=error msg="RemoveContainer for "7f4387d5fe055630e7d36e5a5b264943abab5231e63105d19217ad94009fb5c9" failed" error="rpc error: code = NotFound desc = get container info: container "7f4387d5fe055630e7d36e5a5b264943abab5231e63105d19217ad94009fb5c9" in namespace "k8s.io": not found"
time="2022-05-10T01:56:03.806857280+08:00" level=error msg="ttrpc: received message for unknown channel 615"
time="2022-05-10T01:56:05.807363109+08:00" level=error msg="get state for 0b6b1632e84960dd2c65507db05bffa8fdec4f47a285d5682ca1f0add3806dbb" error="context deadline exceeded: unknown"
time="2022-05-10T01:56:05.807539853+08:00" level=warning msg="unknown status" status=0

The kubelet keeps deleting the container, and the following error occurs during the process: ttrpc: received message for unknown channel, I want to know under what circumstances will it go to this error branch.

TestServerShutdown is flaky on Windows

TestServerShutdown has been intermittently failing in the pipeline due to leftover connections not being closed.

Current shutdown logic will issue close on idle connections and pause for 200 ms before reevaluating. Shutdown should be waiting for each connection to be closed but that does not appear to be the case.

TestServerRequestTimeout is flaky on Windows

TestServerRequestTimeout has been intermittently failing in PRs due to differences in deadline comparisons.

Example output from a PR run

🇩 coverage
--- FAIL: TestServerRequestTimeout (0.01s)
    server_test.go:406: expected deadline 2023-02-25 00:20:34.5471163 +0000 GMT m=+601.181318201, actual: 1677284434547116900
FAIL
coverage: 64.8% of statements
FAIL	github.com/containerd/ttrpc	1.385s
FAIL
mingw32-make: *** [makefile:155: coverage] Error 1
Error: Process completed with exit code 1.

Server-side goroutine leak on receive message error

Hey folks, while tracking down a production error I inadvertently found a rare error in the receive message logic for server connections.

To set some context the receive message logic relies on two goroutines working together.
goroutine A: https://github.com/containerd/ttrpc/blob/main/server.go#L136
goroutine B: https://github.com/containerd/ttrpc/blob/main/server.go#L369-L487

A [built-in Go] error channel is used for communication of errors between goroutines A and B.
https://github.com/containerd/ttrpc/blob/main/server.go#L339

The first thing to note is on message header read error the channel returns the underlying error to goroutine B. Based on the current implementation, these errors can only be from io.ReadFull. io.ReadFull can return io.EOF or io.ErrUnexpectedEOF. Reference: https://pkg.go.dev/io#ReadFull
https://github.com/containerd/ttrpc/blob/main/channel.go#L73-L76

This is fine because goroutine A has error handling to check for these errors and stop execution if they occur.
https://github.com/containerd/ttrpc/blob/main/server.go#L545-L555

The issue begins when considering errors on message read by goroutine B. In this branch, goroutine B has successfully read a message header for a message with some length > 0; however, when reading the message an error has occurred by io.ReadFull either io.EOF or io.ErrUnexpectedEOF.

What I have found is for these errors we wrap the error with more information to clarify the error (opposed to on message header read) occurred on message read.
https://github.com/containerd/ttrpc/blob/main/channel.go#L138

However, goroutine A's error handling logic is not currently setup to check for wrapped errors.
https://github.com/containerd/ttrpc/blob/main/server.go#L550-L554

So the effect is goroutine B is stopped due to a [non-gRPC] receive error, but goroutine A is never stopped resulting in leaked goroutine and server connection.
Stop goroutine B on receive error: https://github.com/containerd/ttrpc/blob/main/server.go#L380-L385
Again (currently) only stop goroutine A on some receive errors: https://github.com/containerd/ttrpc/blob/main/server.go#L550-L554

Is ttrpc compatible with Grpc?

Description

I have two program A&B to communicate with each other. They all implemented by Rust and communication is handled by ttrpc method.

But now I have the third program implemented in C++, and it is for replacing program B.

Can I achieve this w/o modifing program A's code?

IMO, if ttrpc is compatible with Grpc method, then I could implement this C++-program with Grpc and need not recode program A.

Empty Proto Hangs Server Streaming RPC

This is the golang version of this bug in the rust implementation: containerd/ttrpc-rust#169

If a client makes a request to a server streaming RPC (eg: rpc DivideStream(Sum) returns (stream Part);) using a default request (eg: streaming.Sum{}), the call hangs indefinitely. This can easily be observed by modifying the streaming_test, to send an empty Sum.

The root cause is twofold:

  1. Empty / Default proto messages serialize to a length of zero
  2. The services implementation does not attempt to deliver an empty payload (I think because it assumes it will be receiving DATA messages soon, but the client sends only the request).

The protocol doesn't clearly specify the behaviour of empty payloads on requests. However, based on the Data semantics:

Since ttrpc normally transmits a single object per message, a zero length data message may be interpreted as an empty object. For example, transmitting the number zero as a protobuf message ends up with a data length of zero, but the message is still considered data and should be processed.

It seems to me, for consistency's sake, the same behaviour should apply to an empty payload on a Request as well.

Context canceled error returned instead of ErrClosed

A flaky test was observed in containerd in the 1.1 branch after updating to 1.1.1

=== Failed
=== FAIL: . TestClientTTRPC_Close (0.00s)
    client_ttrpc_test.go:85: assertion failed: context canceled (err *errors.errorString) != ttrpc: closed (ttrpc.ErrClosed *errors.errorString)

godoc: fill out the docs

A number of missing items:

  • Basic type documentation
  • Package level description and usage example
  • Move request/response types out of main package or unexport them

Streaming support

I started spending some time thinking about this and wanted to track this. Have you guys thought of a design on how you guys want this to work? I was thinking of looking into GRPC and taking it and molding it into something that is memory efficient for TTRPC. Was there anything I should look out for?

I also noticed there isn't any use of a transport which seems necessary for stream handling. What was the plan to handle multiplexing on a given stream? net.Conn is used to handle reading and writing data. In GRPC eventually the connection is used to create a transport. However, I do not see any logic surrounding transport in the package. Was this intentionally left out or was there plans to add it later?

Is there any other feature that needs to be implemented to support this?

Implement a new code generator that doesn't use gogo/protobuf

Background

We've been using github.com/gogo/protobuf for serializing protobuf messages. While it has been working for years, the folks behind gogo/protobuf has recently announced that they are looking for new maintainers and they couldn't catch up with the changes that Google is making on google.golang.org/protobuf.

The situation is causing issues like #62. While we can pin some of the packages from Google to keep them compatible with gogo/protobuf. This workaround may not be sustainable in future.

Options

A. Implement a new code generator that uses mostly Google's official packages

Pros

  • Reducing dependencies may reduce the chance of breakage in future.

Cons

B. Implement a new code generator that uses utilizes Vitess's vtprotobuf

Pros

  • We wouldn't lose gogo/protobuf's fast serialize/deserialize performance.

Cons

  • We are replacing gogo/protobuf by planetscale/vtprotobuf. While vtprotobuf would be more maintainable than gogo/protobuf due to the way it extends protobuf, it might break in future.

C. Contribute and possibly own gogo/protobuf

Pros

  • We may be able to keep what we have as is.
  • We can help others who use gogo/protobuf.

Cons

  • We wouldn't be able to keep up with the requests from the protobuf community.
  • containerd's usage would not be representative in the community.

D. Looking for non-protobuf solutions

Pros

  • We might be able to simplify the wire format.

Cons

  • We are making the wire-format change just because we don't like the client library. It may not make much sense.
  • Packages like ttrpc-rust has to follow the change.

client: add synchronize between userCloseFunc and rpc call

It is found that sometimes(especially when userCloseFunc takes much time) restart container will fail because the older container's bundle path still exit.

The root cause is analyzed as below:

when containerd runtime plugin exites abnormally, ttrpc connection will closed
and userCloseFunc will be called to handle cleanup the resources created by
containerd shim. current rpc call will also return err. But these two step are
asynchronous.

after rpc call return err, upper application such as k8s may restart container.
but start may fail due to cleanup not finish, some resources not be released.
and this leaked resources leads to failed inplace-update the pod again.

One of the fixed way is make sure the synchronization between userCloseFunc and rpc call in ttrpc.

#87

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.