Giter Site home page Giter Site logo

lukechannings / kube-config Goto Github PK

View Code? Open in Web Editor NEW
13.0 3.0 4.0 834 KB

Luke's kubeconfig

Home Page: https://lukechannings.github.io/kube-config/

License: MIT License

Shell 70.26% JavaScript 29.74%
k3s kubernetes argocd sealed-secrets cert-manager kubernetes-homelab-configuration prometheus smart-home

kube-config's Introduction

Luke's kubeconfig

Lint Status

All of these scripts and configurations are specific to my home cluster. Do not expect any configurations to "just work" if you plan on using them.

This repo contains the Argo app-of-apps configuration, which installs Argo projects and apps. See apps/apps.

Scripts

  • create-cluster.sh: Installs and configures k3s on all nodes.
  • destroy-cluster.sh: Uninstalls k3s from all nodes. (I have had to rebuild the cluster many, many times.)
  • install-metallb.sh: Installs metallb
  • uninstall-metallb.sh: Uninstalls metallb
  • install-argo.sh: Installs Argo CD
  • uninstall-argo.sh: Uninstalls Argo CD

k3s

k3s is installed with as little as possible. There is no Traefik (we will use our own ingress controller) or servicelb (we're using metallb) installed.

metallb

I'm using metallb instead of servicelb because I use a LoadBalancer service with a loadBalancerIP, which is unsupported by servicelb. It's also very convenient to have a virtual IP address for external services.

Q&A

Upgrading k3s

Re-run create-cluster.sh.

Upgrading Argo

Re-run install-argo.sh.

Upgrading Helm charts

To check which charts are out-of-date, run ./scripts/helm-tools/compare-helm-versions.js.

The process:

  • Manually bump the chart dependency version
  • Run helm dependencies update && helm dependencies build to create an updated Chart.lock
  • Make one PR per updated dependency and roll out changes on-by-one

Troubleshooting

Inter-node flannel communication

As soon as you set up a cluster, do yourself a favour and test inter-node communication works. The cluster can appear to be working initially, but big and confusing issues prop up if this is broken and you don't know.

I test with:

  • krun nicolaka/netshoot -H snowkube
  • krun nicolaka/netshoot -H suplex
  • krun ubuntu -H sentinel (nicolaka/netshoot isn't available on arm64)

Run ip a and note the IP address, and then run iperf -s on one of the pods. Use iperf -c <IP> on all other nodes. They should all be communicating at roughly network speeds.

Note: krun is a custom fish script.

If you find there is no communication between nodes, try:

  1. restarting k3s: sudo systemctl restart k3s on the main node
  2. Restarting all nodes
  3. Destroying the cluster and starting again
  4. Ensure routes are correctly set up on the nodes, iptables is configured, etc. See also: k3s known issues

Exec format error

Caused by an x86 image running on ARM.

Unfortunately ARM is a second class citizen in the k8s world and there are many images that are not supported. You can either build your own ARM image, or use the following to de-select ARM machines from scheduling:

nodeSelector:
  kubernetes.io/arch: amd64

mlock error

I experienced the Argo Application Controller in a crash loop, tailing the logs I found:

runtime: mlock of signal stack failed: 12
runtime: increase the mlock limit (ulimit -l) or
runtime: update your kernel to 5.3.15+, 5.4.2+, or 5.5+
fatal error: mlock failed

Manually upgrading the kernel to 5.4.28 appears to have fixed the issue.

For Ubuntu, download and dpkg -i *.deb: https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.4.28/ the filed listed under Build for amd64 succeeded, except lowlatency labelled packages.

Nodes

Hostname Arch OS CPU RAM Storage
Suplex x86_64 Arch Linux E3-1245 v3 @ 3.40GHz 32GB 458GB SSD, 30TB spinning rust (ZFS)
Snowkube x86_64 Ubuntu Server 20.04 LTS i7-8700B CPU @ 3.20GHz 22GB 200GB SSD
Sentinel aarch64 Ubuntu Server 20.04 LTS ARM Cortex-A72 @ 1.50GHz 2GB 59GB MicroSD

kube-config's People

Contributors

lukechannings avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

kube-config's Issues

HAProxy Ingress: CrashLoopBackoff (readiness probe fails) when more than one instance is running

I woke up to services being inaccessible and this:

NAME                                                              READY   STATUS             RESTARTS   AGE     IP             NODE       NOMINATED NODE   READINESS GATES
haproxy-ingress-kubernetes-ingress-default-backend-5b895fb7mg8v   1/1     Running            0          2d13h   172.30.0.12    suplex     <none>           <none>
haproxy-ingress-kubernetes-ingress-659b898986-bpw22               1/1     Terminating        0          15h     172.30.1.166   snowkube   <none>           <none>
haproxy-ingress-kubernetes-ingress-659b898986-ldcmh               0/1     CrashLoopBackOff   37         110m    172.30.0.67    suplex     <none>           <none>

It seems that Snowkube had run out of storage and the container was being migrated to Suplex, but because two instances cannot run simultaneously this resulted in a failed migration.

Can be reproduced easily with:

k scale --replicas=2 deployment/haproxy-ingress-kubernetes-ingress -n haproxy-ingress

The second replica will never be ready, and doesn't start listening, resulting in failed health checks (quite rightly).

Argo CD: Apps (and aoas) stuck in "Progressing"

Despite being healthy, some resources are marked as "Progressing". These include:

  • Ingress: networking.k8s.io/v1beta1

Resource Health

Will need to implement a custom health check that does not check status.loadBalancer.ingress, or wait for haproxy ingress to implement it (if ever).

Argo CD: Prometheus stuck in "Syncing" status

Despite all resources being synced and health, Prometheus is still in a "Syncing" status.

From the Argo App Controller logs:

"No operation updates necessary to 'prometheus'. Skipping patch

So, Argo seems to know it is synced, but still shows "Syncing", which needs to be manually terminated. When terminated, status goes to "Sync failed"

Possibly, fixing #5 will fix this?

Screenshot 2020-05-04 at 08 47 53

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.