Giter Site home page Giter Site logo

Comments (6)

youngnick avatar youngnick commented on June 15, 2024 1

Thanks for bringing this up @DerekTBrown!

I've got a couple of bits of information for you, and a couple of questions.

Firstly, the way that Envoy and the Cilium agent currently use sockets for communication is by design; it helps to manage the risk that we accept in having a per-node proxy (we definitely think it's worthwhile, but it is something we need to manage). Not having the xDS server listening on a network socket means that we can more easily control where that information is surfaced, and can more safely leave the xDS traffic itself in the clear (since it only passes over the domain socket).

Exposing the xDS control plane on a network address instead has a few complications:

  • because the relationship between control plane and proxy is 1:1 (there's exactly one agent per proxy, and they both talk on the same node), adding networking means adding a bunch of address-management things - where we will need to ensure there's an address available on every node that means "node localhost for Envoy purposes". Doable, but a bunch of work to get to the same place.
  • Once you start passing xDS across the network, it's imperative that the xDS traffic be encrypted, since you're sending sensitive (potentially very sensitive in the case of SDS) information over the wire. This also means that now you need to manage keypairs for agents and Envoy, which again, is doable but a lot of work.
  • Lastly, because Envoy clusters are derived from Services and Endpoints (or EndpointSlices) in the common case, it's possible to craft an ExternalName Service or a manually-managed Endpoint or EndpointSlice to confuse the proxy into talking to whatever address you like. We mitigate these risks generally by not support ExternalName Services, and recommending that cluster admins restrict access to Endpoints and EndpointSlices so that you can't do this, but if the xDS is accessible over the network from within the proxy container, you can use this to connect to the control plane from outside the proxy. It's relatively unlikely and a lot of work, but it is possible. Another reason why we've left things using domain sockets.

Okay, so that's all to explain why things are currently in "use a socket for xDS" mode. Again, this is not a hard requirement, but the tradeoffs of swapping mean that the resultant extra feature would need to be pretty valuable for a lot of folks for us to focus on it.

Next, the questions.

In general, for Cilium Service Mesh, we've been working on making many Service Mesh functions be transparently handled by Cilium. It seems to me like the gRPC xDS support is trading off client complexity for lower latency (by having the client manage the load balancing to avoid a proxy hop). Do you think that's a fair characterization?

I ask because I currently don't see that the tradeoff there is worth it. That's not to say I'm right, or that I am not interested in discussing this more, but my two biggest concerns are:

  • as I said, pushing routing code into the client seems like all the clients will end up implementing a subset of Envoy functionality, and will likely lag behind Envoy in terms of supported features (the list of supported features is pretty short compared to Envoy's full feature set).
  • Secondly, many folks use the Service Mesh features to augment the lower-layer NetworkPolicy features of Cilium, and use the proxy hop as a lightweight WAF or similar, only allowing certain layer 7 requests to go through. This approach delegates a lot of that responsibility back to the gRPC client code in the service.

To be clear, it's a really neat idea, and I can see that low-latency use cases would be willing to take the tradeoffs. But it seems like a lot of things to support for something that I haven't heard anyone ask for yet.

I'd be very interested to hear more about your thoughts, please feel free to reach out here or on Slack, or come to a community meeting (and ping me beforehand so I can ensure I'm awake).

from cilium.

howardjohn avatar howardjohn commented on June 15, 2024

Just curious why you don't go "gRPC Client pod" to "xDS management Server" directly?

from cilium.

DerekTBrown avatar DerekTBrown commented on June 15, 2024

Thanks @youngnick for your thorough and thoughtful response! I am planning to attend the Cilium community meeting tomorrow morning, so that hopefully we can bounce some ideas around- I am still early in the process of thinking about how Cilium, xDS and gRPC could fit together.

To perhaps rewind a bit, there are a few reasons that I have been exploring the xDS gRPC client's integration with Cilium:

  • Latency/Overhead/Complexity - As you mentioned, the idea of being able to skip the Envoy hop is appealing. It is really hard for me to get a sense of how significant this improvement would be (if the Envoy literature is to be believed, the improvement would be <1ms, if the linkerd literature is accurate, then we could be talking about >50ms). I would be especially interested to hear from you (since I am guessing you have a bunch of cool experiences with different Cilium/Envoy installs) if this latency is significant (to where investment in proxyless xDS functionality is worthwhile), or whether this is an over-optimization.

    Beyond the latency/cost benefit, I think there could be a complexity benefit if people elected to use xDS-enabled clients instead of Envoy.

  • Client-to-server Encryption - As I understand it, Cilium/Envoy can only provide full L7 routing capabilities if traffic is unencrypted to the proxy. The current paradigm in our clusters is to perform client-to-server encryption in the gRPC client itself, meaning that Cilium/Envoy can't introspect requests to be able to provide L7 capabilities. gRPC's xDS capabilities was one option we were considering to gain L7 functionality without having to disable TLS in gRPC.

  • Observability/Debuggability/Extensibility - One interesting aspect of client-side xDS functionality is that the client is aware of the L7 decisions its making. This makes it potentially easier for a service owner to collect, observe and debug service routing behavior. I think it also becomes easier for folks to extend xDS functionality on the client side, because such changes can be made in a fork/wrapper/different implementation of a particular client, as opposed to having to be made and merged into Cilium + Envoy.

Beyond the "proxyless" xDS mode, I am also interested in learning more generally about Cilium's plans for xDS. The way Cilium leverages xDS is very cool, but also very unorthodox (by having agents expose the xDS API, as opposed to a central control plane). I want to understand how this fits in with the general xDS ecosystem (users bring their own xDS APIs to provide features). For instance: if I want to implement my own xDS service, do I pipe that through Cilium Agents, do I inject that into the Envoy config, or is it breaking the Cilium interface to touch xDS configuration at all?

from cilium.

DerekTBrown avatar DerekTBrown commented on June 15, 2024

Just curious why you don't go "gRPC Client pod" to "xDS management Server" directly?

In my vision of how this might work, Cilium Agent would implement a subset of xDS APIs (i.e. EDS, CDS, RDS) itself, and then proxy, cache and multiplex requests to upstream xDS APIs (i.e. Ratelimiting Service).

from cilium.

github-actions avatar github-actions commented on June 15, 2024

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

from cilium.

github-actions avatar github-actions commented on June 15, 2024

This issue has not seen any activity since it was marked stale.
Closing.

from cilium.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.