angelin01 / pod-director Goto Github PK

View Code? Open in Web Editor NEW

3.0 2.0 0.0 68 KB

A simple kubernetes utility to help in specific namespaces run in specific nodes

License: MIT License

Rust 98.80% Dockerfile 1.20%

kubernetes pod rust

pod-director's Introduction

Pod Director

A simple kubernetes utility to make pods in specific namespaces run in specific nodes

pod-director's People

Contributors

Stargazers

Watchers

pod-director's Issues

Custom errors for Axum

Right now, a lot of the handler code for /mutate ends up following the pattern:

let something = match some_func() {
    None => {
        return (SomeHttpStatus, AdmissionResponse::something())
    }
    Some(v) => v
};

Or worse, where the Some(v) block has even further nested blocks. This is analogous for Result types and similar.

We could make use of Axum's Impl IntoResponse and implement custom errors that implement such trait, allowing us to write a lot more code like this:

let something = some_func()?;

WIP

This issue is under construction and will be updated as we go

This issue serves to track the projects design.

Problem

The problem we are trying to solve is as such:

Sometimes, we wants pods running in specific namespaces to run on certain nodes.
Kubernetes has this feature as a built-in controller, however, in managed environments, one might not be able to enable this.

Projects with similar solutions

Both of these try to solve it using a Mutating Webhook Controller:

While the second appears abandoned, the first seems more active. It, however, has a few issues that'd we like to tackle better. See the requirements section.

Kyverno also allows doing this through its policies. Kyverno, however, is a massive solution with a lot of complexity, and we aim for something simpler.

Requirements

This is a simple application, with a simple purpose. We expand features around the idea of controlling where pods based on their namespace, but nothing else.
We should minimize configuration whenever possible. For example, if we use a label, it must contain the most amount of information possible.
Ideally, the configuration for the namespace should be in the namespace. This follow's Kubernetes design, where most resources allow configuration exactly where they are. For example: you don't need to change the Ingress Controller to configure an Ingress object. This might be hard for this case, but we should still strive to keep it as concentrated on the namespace as possible.
Since this is a Kubernetes project, we should package it in a way it's easy to install in Kubernetes, and we should not worry about other cases as much.
We should strive to affect as little of the cluster as possible.
We should strive to test as much as possible. What we are doing is "dangerous" in the sense that it can literally break pods. We can't afford to push updates willy-nilly.

Proposal

Kubernetes

Similarly to other projects, we can work using Kubernetes' Mutating Webhook Controllers to intercept pod creation requests and validate them.

We should be able to use the namespaceSelector to limit operation to only namespaces we want to match. If we use it with the Exists operator, we should even be able to use it to configure "what nodes" the namespace belongs to (more on the Configuration section).

If we use that approach, we should be careful to follow all the best practices:

Idempotence
Handling all pods
Low Latency
Validating the result (maybe a toggle?)
Excluding our own namespace (maybe a toggle?)
No side effects

Configuration

Inspired by some of the similar projects, I propose a configuration based on "groups". One creates groups in Pod Director's namespace and then uses a label to select which group.

For example, a namespace manifest might look like this:

apiVersion: v1
kind: Namespace
metadata:
  name: foo
  labels:
    # label name and format up to debate
    # here we select the "bar" configuration
    pod-director/group: bar

This allows us to use a namespaceSelector like so on the Admission Controllers:

namespaceSelector:
  matchExpressions:
    - key: pod-director/group
      operator: Exists

To configure group "bar", I propose a simple YAML file. As much as I dislike YAML, it is familiar to the Kubernetes community and meshes well with the existing tooling, such as Helm. The configuration for the "bar" and "bazz" groups might look like this:

groups:
  bar:
    nodeSelector:  # node selector labels
    affinity:  # pod affinity
    tolerations:   # to handle taints
  bazz:
    nodeSelector:  # node selector labels
    affinity:  # pod affinity
    tolerations:   # to handle taints

A alternative is to use a list with names:

groups:
  - name: bar
    nodeSelector: ...
    affinity:  ...
    tolerations: ...

But, in our case, it may be simpler to use this as a Map of configurations.

As a side note, and I'm not yet sure how to do this, we can consider a toggle to allow "exclusions", after all, for some reason some pods in a namespace might not need (or must not) run in the same nodes as the global configuration.

Application

A simple HTTP Rest server using:

Should be more than sufficient. We'll handle things like configuring rustfmt, toolchain, editorconfig as we go.

The kube-rs repository has a great example of how to implement an Admission Controller, which is exactly what we need. While they use warp, my general internet searches have pointed words Axum being a more modern/better HTTP server. Either should fulfill our 1 ~ 2 endpoint needs.

Container Image

Since we are running on Kubernetes, a container is required. I propose two images:

A release image, with debug symbols stripped and aiming for minimal size. If we can do a distroless image, wonderful, but it's not necessary.
A debug image, with full debug information included. This can have the same tag as above, only with a -debug suffix.

Helm

A Helm Chart should be the main way to deploy the application. If a user wants to use their own chart, manifests or wants to run this off a lambda somewhere, that is their imperative.

The Chart could following the application's versioning, even if there were no changes. That way, every time we release the application, we release the chart. I see no issue with this since our utility is closely tied to k8s.

The Chart should also contain helpers for generating the certificates necessary to deploy an admission controller. I propose two:

Simple, helm or init container generated certificates, for quick and dirty testing
A better configuration, to be used with cert-manger

Testing

Testing the application is the most important part. The axum library has a few testing libraries we can use, and should be pretty simple to run.

The Helm Chart can include some unittests to help.

Ideally, when we reach a 1.0.0, we should have some testing done against an actual cluster. I believe GHA has a "kind" action we can use to actually spin up a cluster, maybe even with cert-manager.

Ideally, we should also host the chart somewhere. Github Pages is probably sufficient.

Nice to Haves

The list of nice to haves is massive. I'd like to list some, in an order in which I think they are most important:

A "enforce" mode and a "log mode" . The idea is that a cluster administrator can first enable only the logging mode to see what the application would do.
Structured logging
Configuration hot-reloading
Being able to exclude some pods in a namespace, but only if the global configuration allows
A "look at all namespaces" mode, which should include a "default" setting
Automatically testing against newer k8s versions as they come out
Prometheus metrics
Distroless image
Benchmark tests
Renovate integration, so we keep our dependencies up to date

Support for affinity

Add support for affinity injection.

This will include the configuration format for the node selectors.

Out of the three main things we will modify, affinity is probably the hardest to implement as we have nodeAffinity,
podAffinity and podAntiAffinity, all of which our user could configure. Consult the Kubernetes Reference for
more detail.

All the affinity types use node selector terms, either in pure form or array form. As such, we can probably reuse the
validation code for the nodeSelector and toleration features, and they should be implemented earlier.

TODO: List all the possible cases we need to test for.

As a suggestion, we can probably ignore podAffinity and podAntiAffinity for the time being, as the use cases for "making pods in this namespace run in the same/different nodes than these other pods" are waaay less frequent/likely than our simple initial case of "making pods in this namespace run in these nodes".

Support for node selectors

Implement the /mutate and /validate endpoints and add support for nodeSelector injection.

This will include the configuration format for the node selectors.

This must be idempotent! A Pod mutate by the /mutate endpoint twice should come out exactly
the same! As such, I can think of these scenarios which we should translate into tests:

Pod without any node selector labels
Pod with existing node selector labels, but none match pod-director's config
Pod with existing node selector labels, and some match pod-director's config
Pod with existing node selector labels, and all match pod-director's config
Pod with existing node selector labels, all of pod-director's are present and extra are also present.

Initial Configuration Module

Initial Configuration module

We'll need a way to configure the application.

I have some experience with Figment combined with Serde. I think it should
suffice.

The initial requirements are:

Read a YAML file
It must support the groups structure we established:

groups:
  bar:
    nodeSelector:  # node selector labels
    affinity:  # pod affinity
    tolerations:   # to handle taints
  bazz:
    nodeSelector:  # node selector labels
    affinity:  # pod affinity
    tolerations:   # to handle taints

Until we define the actual format of the nodeSelector, affinity and tolerations blocks (and their final names),
the value can be a placeholder list of strings
The default configuration file path should be either an established well known path or something easy to remember,
like a specifically named file in the current working directory.
One should be able to change which file is read using an environment variable
All configurations should also be configurable by environment variables (Figment handles this), which have precedence
over the config file
Return a struct that holds all of the configuration for the entire application

We'll create further settings, like for log levels, mode (enforce or log only, if we implement this), etc as we go.

In a future task we can also consider hot reloading.

Auto reload TLS configs

However unlikely, it's possible that our certificates are renewed while the server is up. We could restart the entire server, but it seems like reloading the TLS certificates is not that hard.

As such, to avoid downtime with expiring certificates in long running pods (or just unlucky pods), we should make sure the server hot-reloads this.

As a side note: if using something like inotify, we need to make sure what events are triggered by Kubernetes replacing the secret in the filesystem, as I am 99% sure it uses symlinks.

Helm Chart

Create a Helm Chart for the application.

This Helm Chart has special requirements, which may be broken into extra issues for ease of review:

The Chart must be capable of being deployed "by itself". What this means is: when we deploy the
MutatingWebhookController and the ValidatingWebhookController, they must have their certificates configured.
The selector for the helm chart must match all the namespaces with pod-director/group labels.
The Chart must offer support for cert-manager in place of the self-created CA and certificates.
In the first version, since config hot-reloading is not implemented, some kind of mechanism (like checksums)
must force a rollout of the pods when the configuration file is updated.
The Chart must include unittests using helm-unittests whenever there is logic applied.
The main workflow, or secondary workflow, should include helm templating and testing. Consider running it only if the Helm Chart code is changed.

Container Image

Setup a simple Dockerfile (or Containerfile, if we are feeling petty) for a Rust project and running the application.
If we can find a way to perform a smoke test, excellent!

Bonus points for:

Tini as the entrypoint
Minimal size
Alpine based images

Extra bonus points for:

Debug image (compiled with debug instead of release)

Extra extra bonus points for:

Distroless

The building of the image should be included in the main workflow.

Github Actions

Github actions setup requires a pipeline that manages all of the following, and thus these features are pre-requisites:

Rust build (a minimal working project)
Rust test (at least one test)
Image build (a containerfile)
Image push (requires a repository)
Helm Chart template and test
Helm Chart build

Extra fluff:

Auto release creation
Auto changelog generation (semantic pull requests, maybe?)

We can split this issue into multiple to implement the pipeline as we implement the other things

Basic HTTP Server

Setup a basic HTTP server using the tools described in #1.

For now, a simple setup with a GET /health route that returns 200 always, plus the code structure for handling the
other two routes: mutate and validate is enough. Since Kubernetes enforces the use of HTTPS for webhooks, we should
also do that by default. Disabling HTTPS should be opt-in.

In addition, investigating a simple library to help with unit tests and the implementation of at least one dummy
unit test would be ideal.

Configure conflict behaviour

Configure what happens when pod-director encounters a conflict when assigning a selector to a pod that already has a key that pod-director wants to change.

For example, let's say a pod has a "foo=bar" selector, but the config says it should have a "foo=xyzzy" selector, what do we do then?
Well, let's make this behaviour configurable!

The config should have a "conflict" or "onConflict" field, or something similar.
Some ideas for the conflict resolution behaviour:

Just override the value
Ignore the value and keep going
Crash and burn when encountering a conflict and reject any changes

Support arbitrary labels for matching namespaces and groups

Currently, we only look at the label pod-director/group
To support multiple installations in the same cluster, we should support looking at different labels.
To do this, we must:
- Make it configurable in the application
- Make it configurable in the Helm Chart

Support for tolerations

Add support for tolerations injection.

This will include the configuration format for the tolerations.

This must be idempotent! A Pod mutate by the /mutate endpoint twice should come out exactly
the same! As such, I can think of these scenarios which we should translate into tests:

Pod without any tolerations
Pod with existing tolerations, but none match pod-director's config
Pod with existing tolerations, and some match pod-director's config
Pod with existing tolerations, and all match pod-director's config
Pod with existing tolerations, all of pod-director's are present and extra are also present.

This issue looks nearly identical to #8, but one must remember that tolerations are a list and not
a key-value pair, so they must be handled a little differently.

Configuration hot reload

In Kubernetes, ConfigMaps, Secrets and similar are "hot swapped" when updated (as long as they are mounted properly).

Since our application is mostly stateless, we could make use of this to hot reload the configuration, or at least the part that makes sense (such as the group configs), without restarting the entire pod. This would allow for 0 downtime updates to configuration even with a single pod instance.

One may find inspiration in the ArcSwap implementation in the RustlsConfig struct used for reloading the certificates.

Cache Kubernetes API requests

Currently, the program simply queries the API every time for the namespace group.

We should cache these results and only query when necessary. Kube-rs offers a few solutions to this, like Watchers and Reflectors, which should keep track of Kubernetes' state for us through events.

Dependency auto updates

Setting up integration with Dependabot or Renovate would be interesting, it might allow us to keep up to date with
newer k8s versions automatically.

One requirement, though: no opening 100 PRs for each dependency, that's just spammy.

Note: this issue depends on us having a half functional application with proper tests so we can validate that the
dependency updates are working.

Support for cert-manager in the Helm Chart

Add support for cert-manager in the Helm Chart
Two modes:
- Use an existing Issuer or ClusterIssuer
- Create an local Issuer automatically
Using cert-manager should be the recommended mode of operation

Create ValidatingWebhookController in the Helm Chart

To use the endpoint created in #24, we must create the equivalent ValidatingWebhookController in the Helm Chart

Create validate endpoint

According to Kubernetes documentation, pods can be mutated by more than one Mutating Webhook Controller, so if we are to enforce configurations, we must also have a Validate Controller. To do so, we'll need a validate endpoint. It must enforce similar rules to the mutate endpoint, for node selectors, tolerations and affinity.

angelin01 / pod-director Goto Github PK

pod-director's Introduction

Pod Director

pod-director's People

Contributors

Stargazers

Watchers

pod-director's Issues

WIP

Problem

Projects with similar solutions

Requirements

Proposal

Kubernetes

Configuration

Application

Container Image

Helm

Testing

Nice to Haves

Support for affinity

Support for node selectors

Initial Configuration module

Helm Chart

Container Image

Github Actions

Basic HTTP Server

Support for tolerations

Dependency auto updates

Recommend Projects

Recommend Topics

Recommend Org