Giter Site home page Giter Site logo

Comments (16)

yuvipanda avatar yuvipanda commented on June 12, 2024 1

We've done this, so am going to close this for now. Great job, @minrk!

from mybinder.org-deploy.

yuvipanda avatar yuvipanda commented on June 12, 2024

I agree! We can do some limiting on the VPC level (disallow all outbound connections on port 25 for example) and some at the Kubernets NetworkPolicy level (pods aren't allowed to talk to each other).

from mybinder.org-deploy.

choldgraf avatar choldgraf commented on June 12, 2024

from mybinder.org-deploy.

minrk avatar minrk commented on June 12, 2024

The network policy I would like to start with:

  1. whitelist ports 80, 443 (all other ports blocked until someone has a legitimate need for other ports)
  2. throttle outbound bandwidth (not sure what is a reasonable value). Ideally, I would like to throttle this excluding the notebook port, so we aren't throttling the user experience, only sending data elsewhere. I'm not sure if that's feasible.

Possible stretch goals:

  1. apply a network quota per container, so that if/when it exceeds the limit it gets killed
  2. track usage (for a short period, only for abuse detection) of client ip addresses, so that we can ban ips if need be

I'd like the tracking of pod creations by ip in general, so we can see if an ip address/range is responsible for traffic spikes.

I'll try to do some reading to see what our options are here. istio appears to be a candidate.

from mybinder.org-deploy.

yuvipanda avatar yuvipanda commented on June 12, 2024

More info at #36.

Also https://kubernetes.io/docs/concepts/services-networking/network-policies/ should be useful!

from mybinder.org-deploy.

choldgraf avatar choldgraf commented on June 12, 2024

I think another thing we can do is track non-user-specific network data etc (via prometheus?). It'll be pretty easy to run analytics on that to see if there are outlier users in general, and to decide on limits etc that hit most usecases just fine.

from mybinder.org-deploy.

minrk avatar minrk commented on June 12, 2024

I tested out a network policy at https://github.com/minrk/binderhub/blob/network-policy/helm-chart/binderhub/templates/network.yaml and it seems simple enough and works pretty well. I think locking things down with network policy is a good idea, but we'll need to be thoughtful about making things sufficiently configurable for jupyterhub -> binderhub -> mybinder.org. Should NetworkPolicy be something we have in base zero-to-jupyterhub, binderhub, or only here at the specific deployment level (or more than one of the above)?

Unfortunately, until GKE supports egress rules in network policies, that part of it has no effect, and egress policies do not appear to support bandwidth throttling or total caps. I'm going to investigate implementing the egress monitoring with VPC and istio. So far, my current understanding of the landscape:

Highlights about using NetworkPolicy alone:

  • simplest, most official API
  • fits in our helm setup very well
  • doesn't support throttling
  • egress not available on GKE yet, so simply isn't available for now

VPC:

  • works now, supports egress
  • outside kubernetes, use gcloud API or console
  • can apply rules at node-level, but not to pods (we could only apply this to user pods if we ensured that 'our' pods ran on a different node or node pool, but this seems undesirable).
  • doesn't need additional alpha-level kubernetes features
  • doesn't support throttling

istio:

  • works at the pod level, not node level, so we can apply policies directly to user pods and nothing else
  • works in kubernetes, so would primarily be part of our helm deployment, rather than a side-channel talking to gcloud
  • could get us monitoring/reports in prometheus
  • supports throttling

So what I think we should do right now is:

  1. use the gcloud API to set firewall rules (block everything but 80,443 outbound)
  2. investigate network policies in general to lock down internal communication (ingress only, for now)

and a stretch goal to investigate istio for traffic monitoring and throttling.

from mybinder.org-deploy.

yuvipanda avatar yuvipanda commented on June 12, 2024

@minrk that's a great overview! Thank you for writing it up :)

My personal preference for egress rules is to just wait for that to be available on GKE. Does it already not work with 1.8 clusters? (rather than 1.7). My understanding of GKE's networkpolicy is that they're simply running Calico, and that already supports egress. So if it isn't there yet, it should be there soon. This also keeps us as independent of the underlying cloud provider as possible, and also allows users outside of Google Cloud to re-use it.

I'm wary of Istio in our setup, since it's a fairly complex full-service mesh that injects a sidecar container into every single pod that hijacks the network. A much simpler way would be to run an initcontainer with our pods that runs a bunch of tc commands to limit the network output from that particular network namespace. This also allows us to run iptables rules there if we want more complex control than networkpolicy provides us. What do you think?

from mybinder.org-deploy.

minrk avatar minrk commented on June 12, 2024

I'm pretty concerned about abuse, since we are advertising free, anonymous, unrestricted compute resources. Plus, egress abuse can eat up our budget pretty easily, which is why I think it's more important than other cases. I think we should have VCS firewall rules immediately as a simple starting point, and then move to egress via networkpolicy as soon as it is available. My biggest concern right now is that if a pod starts doing something problematic egress-wise, I think we currently have no way of even identifying the offending pod (while active or even post-mortem), let alone mitigating the issue. Is that accurate?

Does it already not work with 1.8 clusters? (rather than 1.7).

I set up a 1.8 cluster and adding egress rules had no effect, unless I misunderstood how they work. Kubernetes' failure mode seems to be "silently ignore this rule" instead of raising an error somewhere. Or maybe it was somewhere in the calico's pod logs. The ingress rules were definitely working in the same networkpolicy, though, so I think that on GKE egress rules are still unsupported even with kubernetes 1.8. Maybe it's waiting for a calico update? I have no idea how to verify more rigorously that other than trying to make an egress rule and seeing if it does what I expect. Is there a way to directly query "will egress rules work?" I'll try again tomorrow, though.

I'm wary of Istio in our setup

I, too, am wary of istio. It looks pretty complicated and seems like too much for what we need right now. It's just the only link I've found for kubernetes + bandwidth throttling, so that's where I started reading. What I really want is for network-policy rules to have bandwidth features.

What do you think?

Whatever is the simplest path to bandwidth throttling (and ideally monitoring) is my goal. If a tc initcontainer is a good way to accomplish that, it sounds great to me. I'm not sure how to do that, though. If you can point me in the right direction, I'll give it a go. Does KubeSpawner need any modifications to allow initcontainers?

from mybinder.org-deploy.

minrk avatar minrk commented on June 12, 2024

I'm not sure how I didn't find this before, but kubernetes has bandwidth limits via annotations:

  annotations:
    kubernetes.io/ingress-bandwidth: 1G
    kubernetes.io/egress-bandwidth: 1M

I'll test if these work. It would be awesome if it's that simple, but initial testing seems to suggest that egress-bandwidth annotation has no effect (ingress is fine). I'm not sure on what window it's measured, though.

from mybinder.org-deploy.

minrk avatar minrk commented on June 12, 2024

From the calico docs, egress network policy requires calico 2.6.1, but inspecting the calico-node pods on v1.8.3-gke.0, they are running calico 2.5.1, which is likely the reason. So we can watch for gke updates to see when they gain egress policy support. I assume we shouldn't be upgrading calico ourselves, right?

from mybinder.org-deploy.

minrk avatar minrk commented on June 12, 2024

More reading today!

  1. calico does not appear to support traffic shaping, which means that the bandwidth annotations have no effect when calico is in use (this bears out in testing). That's disappointing. Doubly so when again, Kubernetes' behavior for unsupported features is silently ignoring input rather than communicating anything.
  2. As a result, I've been exploring tc init containers and it appears to work for outbound, if not anything else. But that's the most important.

So I think we can start with:

  • tc-init initContainers for throttling egress (on both build and user pods)
  • VPC rules for blocking all but http[s] to the world

with a goal of:

  • replacing VPC with network policy when gke supports egress network policies
  • replace tc-init if we can figure out how to run Kubernetes with a network implementation that supports the official bandwidth API

from mybinder.org-deploy.

choldgraf avatar choldgraf commented on June 12, 2024

@minrk thanks for doing all of this research! FWIW, I think it is super important that we protect ourselves against some fairly-likely abuse scenarios before we regret it, so TYVM :-)

from mybinder.org-deploy.

willingc avatar willingc commented on June 12, 2024

What are the specific cases where we would want egress for mybinder.org? In other words, what are the cases that we want to allow? All others should be locked down, and all egress activity logged.

from mybinder.org-deploy.

betatim avatar betatim commented on June 12, 2024

Use cases on my mind:

  • fetching data via HTTP;
  • installing the occasional extra package; and
  • git push to save changes made.

from mybinder.org-deploy.

yuvipanda avatar yuvipanda commented on June 12, 2024

@minrk I love the initcontainer stuff you've done! We already support initcontainers in kubespawner and the chart, so you're good to go there

from mybinder.org-deploy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.