Enable Proxyless gRPC Connections Background: What is xDS? <

Thanks for bringing this up <a class="user-mention notranslate" data-hovercard-type="u

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

CFP: Enable Proxyless gRPC Connections to xDS about cilium HOT 6 CLOSED

DerekTBrown commented on June 15, 2024

CFP: Enable Proxyless gRPC Connections to xDS

from cilium.

Comments (6)

youngnick commented on June 15, 2024 1

Thanks for bringing this up @DerekTBrown!

I've got a couple of bits of information for you, and a couple of questions.

Firstly, the way that Envoy and the Cilium agent currently use sockets for communication is by design; it helps to manage the risk that we accept in having a per-node proxy (we definitely think it's worthwhile, but it is something we need to manage). Not having the xDS server listening on a network socket means that we can more easily control where that information is surfaced, and can more safely leave the xDS traffic itself in the clear (since it only passes over the domain socket).

Exposing the xDS control plane on a network address instead has a few complications:

because the relationship between control plane and proxy is 1:1 (there's exactly one agent per proxy, and they both talk on the same node), adding networking means adding a bunch of address-management things - where we will need to ensure there's an address available on every node that means "node localhost for Envoy purposes". Doable, but a bunch of work to get to the same place.
Once you start passing xDS across the network, it's imperative that the xDS traffic be encrypted, since you're sending sensitive (potentially very sensitive in the case of SDS) information over the wire. This also means that now you need to manage keypairs for agents and Envoy, which again, is doable but a lot of work.
Lastly, because Envoy clusters are derived from Services and Endpoints (or EndpointSlices) in the common case, it's possible to craft an ExternalName Service or a manually-managed Endpoint or EndpointSlice to confuse the proxy into talking to whatever address you like. We mitigate these risks generally by not support ExternalName Services, and recommending that cluster admins restrict access to Endpoints and EndpointSlices so that you can't do this, but if the xDS is accessible over the network from within the proxy container, you can use this to connect to the control plane from outside the proxy. It's relatively unlikely and a lot of work, but it is possible. Another reason why we've left things using domain sockets.

Okay, so that's all to explain why things are currently in "use a socket for xDS" mode. Again, this is not a hard requirement, but the tradeoffs of swapping mean that the resultant extra feature would need to be pretty valuable for a lot of folks for us to focus on it.

Next, the questions.

In general, for Cilium Service Mesh, we've been working on making many Service Mesh functions be transparently handled by Cilium. It seems to me like the gRPC xDS support is trading off client complexity for lower latency (by having the client manage the load balancing to avoid a proxy hop). Do you think that's a fair characterization?

I ask because I currently don't see that the tradeoff there is worth it. That's not to say I'm right, or that I am not interested in discussing this more, but my two biggest concerns are:

as I said, pushing routing code into the client seems like all the clients will end up implementing a subset of Envoy functionality, and will likely lag behind Envoy in terms of supported features (the list of supported features is pretty short compared to Envoy's full feature set).
Secondly, many folks use the Service Mesh features to augment the lower-layer NetworkPolicy features of Cilium, and use the proxy hop as a lightweight WAF or similar, only allowing certain layer 7 requests to go through. This approach delegates a lot of that responsibility back to the gRPC client code in the service.

To be clear, it's a really neat idea, and I can see that low-latency use cases would be willing to take the tradeoffs. But it seems like a lot of things to support for something that I haven't heard anyone ask for yet.

I'd be very interested to hear more about your thoughts, please feel free to reach out here or on Slack, or come to a community meeting (and ping me beforehand so I can ensure I'm awake).

from cilium.

howardjohn commented on June 15, 2024

Just curious why you don't go "gRPC Client pod" to "xDS management Server" directly?

from cilium.

DerekTBrown commented on June 15, 2024

Thanks @youngnick for your thorough and thoughtful response! I am planning to attend the Cilium community meeting tomorrow morning, so that hopefully we can bounce some ideas around- I am still early in the process of thinking about how Cilium, xDS and gRPC could fit together.

To perhaps rewind a bit, there are a few reasons that I have been exploring the xDS gRPC client's integration with Cilium:

Latency/Overhead/Complexity - As you mentioned, the idea of being able to skip the Envoy hop is appealing. It is really hard for me to get a sense of how significant this improvement would be (if the Envoy literature is to be believed, the improvement would be <1ms, if the linkerd literature is accurate, then we could be talking about >50ms). I would be especially interested to hear from you (since I am guessing you have a bunch of cool experiences with different Cilium/Envoy installs) if this latency is significant (to where investment in proxyless xDS functionality is worthwhile), or whether this is an over-optimization.

Beyond the latency/cost benefit, I think there could be a complexity benefit if people elected to use xDS-enabled clients instead of Envoy.
Client-to-server Encryption - As I understand it, Cilium/Envoy can only provide full L7 routing capabilities if traffic is unencrypted to the proxy. The current paradigm in our clusters is to perform client-to-server encryption in the gRPC client itself, meaning that Cilium/Envoy can't introspect requests to be able to provide L7 capabilities. gRPC's xDS capabilities was one option we were considering to gain L7 functionality without having to disable TLS in gRPC.
Observability/Debuggability/Extensibility - One interesting aspect of client-side xDS functionality is that the client is aware of the L7 decisions its making. This makes it potentially easier for a service owner to collect, observe and debug service routing behavior. I think it also becomes easier for folks to extend xDS functionality on the client side, because such changes can be made in a fork/wrapper/different implementation of a particular client, as opposed to having to be made and merged into Cilium + Envoy.

Beyond the "proxyless" xDS mode, I am also interested in learning more generally about Cilium's plans for xDS. The way Cilium leverages xDS is very cool, but also very unorthodox (by having agents expose the xDS API, as opposed to a central control plane). I want to understand how this fits in with the general xDS ecosystem (users bring their own xDS APIs to provide features). For instance: if I want to implement my own xDS service, do I pipe that through Cilium Agents, do I inject that into the Envoy config, or is it breaking the Cilium interface to touch xDS configuration at all?

from cilium.

DerekTBrown commented on June 15, 2024

Just curious why you don't go "gRPC Client pod" to "xDS management Server" directly?

In my vision of how this might work, Cilium Agent would implement a subset of xDS APIs (i.e. EDS, CDS, RDS) itself, and then proxy, cache and multiplex requests to upstream xDS APIs (i.e. Ratelimiting Service).

from cilium.

github-actions commented on June 15, 2024

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

from cilium.

github-actions commented on June 15, 2024

This issue has not seen any activity since it was marked stale.
Closing.

from cilium.

CFP: Enable Proxyless gRPC Connections to xDS about cilium HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent