Giter Site home page Giter Site logo

nixos-ha-kubernetes's Introduction

Toy highly-available Kubernetes cluster on NixOS

About

A recipe for a cluster of virtual machines managed by Terraform, running a highly-available Kubernetes cluster, deployed on NixOS using Colmena.

Motivation

NixOS provides a Kubernetes module, which is capable of running a master or worker node. The module even provides basic PKI, making running simple clusters easy. However, HA support is limited (see, for example, this comment and an empty section for "N masters" in NixOS wiki).

This project serves as an example of using the NixOS Kubernetes module in an advanced way, setting up a cluster that is highly-available on all levels.

Architecture

External etcd topology, as described by Kubernetes docs, is implemented. The cluster consists of:

  • 3 etcd nodes
  • 3 controlplane nodes, running kube-apiserver, kube-controller-manager, and kube-scheduler.
  • 2 worker nodes, running kubelet, kube-proxy, coredns, and a CNI network (currently flannel).
  • 2 loadbalancer nodes, running keepalived and haproxy, which proxies to the Kubernetes API.

Goals

  • All infrastructure declaratively managed by Terraform and Nix (Colmena). Zero kubectl apply -f foo.yaml invocations required to get a functional cluster.
  • All the infrastructure-level services run directly on NixOS / systemd. Running k get pods -A after the cluster is spun up lists zero pods.
  • Functionality. The cluster should be able to run basic real-life deployments, although 100% parity with high-profile Kubernetes distributions is unlikely to be reached.
  • High-availability. A failure of a single service (of any kind) or a single machine (of any role) shall not leave the cluster in a non-functional state.

Non-goals

  • Production-readiness. I am not an expert in any of: Nix, Terraform, Kubernetes, HA, etc.
  • Perfect security (see the above point). Some basic measures are taken: NixOS firewall is left turned on (although some overly permissive rules may be in place), Kubernetes uses ABAC and RBAC, and TLS auth is used between the services.

Trying it out

Prerequisites

  • Nix (only tested on NixOS, might work on other Linux distros).
  • Libvirtd running. For NixOS, put this in your config:
    {
      virtualisation.libvirtd.enable = true;
      users.users."yourname".extraGroups = [ "libvirtd" ];
    }
  • At least 6 GB of available RAM.
  • At least 15 GB of available disk space.
  • 10.240.0.0/24 IPv4 subnet available (as in, not used for your home network or similar). This is used by the "physical" network of the VMs.

Running

$ nix-shell
$ make-boot-image # Build the base NixOS image to boot VMs from
$ ter init        # Initialize terraform modules
$ ter apply       # Create the virtual machines
$ make-certs      # Generate TLS certificates for Kubernetes, etcd, and other daemons.
$ colmena apply   # Deploy to your cluster

Most of the steps can take several minutes each when running for the first time.

Verifying

$ ./check.sh                # Prints out diagnostic information about the cluster and tries to run a simple pod.
$ k run --image nginx nginx # Run a simple pod. `k` is an alias of `kubectl` that uses the generated admin credentials.

Modifying

The number of servers of each role can be changed by editing terraform.tfvars and issuing the following commands afterwards:

$ ter apply     # Spin up or spin down machines
$ make-certs    # Regenerate the certs, as they are tied to machine IPs/hostnames
$ colmena apply # Redeploy

Destroying

$ ter destroy   # Destroy the virtual machines
$ rm boot/image # Destroy the base image

Tips and tricks

  • After creating and destroying the cluster many times, your .ssh/known_hosts will get polluted with many entries with the virtual machine IPs. Due to this, you are likely to run into a "host key mismatch" errors while deploying. I use :g/^10.240.0./d in Vim to clean it up. You can probably do the same with sed or similar software of your choice.

Contributing

Contributions are welcome, although I might reject any that conflict with the project goals. See TODOs in the repo for some rough edges you could work on.

Make sure the ci-lint script succeeds. Make sure the check.sh script succeeds after a deploying a fresh cluster.

Acknowledgements

Both Kubernetes The Hard Way and Kubernetes The Hard Way on Bare Metal helped me immensely in this project.

nixos-ha-kubernetes's People

Contributors

justinas avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.