mattmattox / kubernetes-master-class Goto Github PK
View Code? Open in Web Editor NEWKubernetes Master Class
License: Apache License 2.0
Kubernetes Master Class
License: Apache License 2.0
kubectl run r00t --restart=Never -it --rm --image alpine:latest --overrides \
'{ "spec": { "hostPID": true, "containers": [{ \
"name": "1", "image": "alpine", "command": [ \
"nsenter", "--mount=/proc/1/ns/mnt", "--", "/bin/bash" \
], "stdin": true, "tty": true, "securityContext": { \
"privileged": true }]}}'
etcd
membersdocker exec -e ETCDCTL_ENDPOINTS=$(docker exec etcd /bin/sh -c "etcdctl member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ','") etcd etcdctl endpoint status --write-out table
nsenter --mount=/host/proc/1/ns/mnt --net=/host/proc/1/ns/net /bin/bash
In your database class, you don't talk much about backups. How do I backup a database in Kubernetes?
rke etcd snapshot-restore
failuresIf an error is encountered while performing an etcd restoration from a snapshot, it's possible for failures to persist even after remediating the root cause. To avoid this behavior, it's recommended to clean up one or all the pods that were created during the rke etcd snapshot-restore
process prior to performing subsequent restore attempts. Following is a list of applicable pods that are created during the etcd restoral process:
Thank you for doing the class on troubleshooting Kubernetes.
Could you do an updated class on this subject?
rke etcd snapshot-restore
When performing an rke etcd snapshot-restore
, its strongly recommended that node configuration for the cluster is maintained until the restore is complete. Modifying the nodes prior to completing a restore can cause etcd certificate issues or have other unexpected, deleterious results that will impede your ability to restore your etcd snapshot.
Extracting clusters.yaml & S3 creds from a local backup
It seems that there is dead link to a script to recover the Cluster YAML and rkestate files.
It is the second script in this section of the rancher-k8s-upgrades guide: https://github.com/mattmattox/Kubernetes-Master-Class/tree/main/rancher-k8s-upgrades#resolution
The broken link is: https://raw.githubusercontent.com/rancherlabs/support-tools/master/how-to-retrieve-cluster-yaml-from-custom-cluster/cluster-yaml-recovery.sh
ACE is a good starting point but what to do when that's not enabled and user doesn't have another?
It's worth going over the different cluster types/deployment methods and etc
Even though I have deleted the nodes I added from Rancher UI, those nodes where still in Longhorn configuration. Hence, longhorn-manager where not able to schedule again because of that.
Here is the procedure I performed:
cloud@rebond-oks:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
winoks-rlk-eu-west-0a-master1 Ready controlplane,etcd 240d v1.18.10
winoks-rlk-eu-west-0a-worker1 Ready worker 240d v1.18.10
winoks-rlk-eu-west-0b-master1 Ready controlplane,etcd 240d v1.18.10
winoks-rlk-eu-west-0b-worker1 Ready worker 240d v1.18.10
winoks-rlk-eu-west-0c-master1 Ready controlplane,etcd 240d v1.18.10
winoks-rlk-eu-west-0c-worker1 Ready worker 240d v1.18.10
cloud@rebond-oks:~$ kubectl -n longhorn-system get [nodes.longhorn.io](https://nodes.longhorn.io/)
NAME READY ALLOWSCHEDULING SCHEDULABLE AGE
winoks-rlk-eu-west-0a-worker1 True true True 240d
winoks-rlk-eu-west-0a-worker2 True false True 6h38m
winoks-rlk-eu-west-0b-worker1 True true True 240d
winoks-rlk-eu-west-0b-worker2 True false True 6h33m
winoks-rlk-eu-west-0c-worker1 True true True 240d
winoks-rlk-eu-west-0c-worker2 True false True 6h46m
As you can see, I still have xxx-worker2 in longhorn nodes.
Then
cloud@rebond-oks:~$ kubectl -n longhorn-system delete [nodes.longhorn.io](https://nodes.longhorn.io/) winoks-rlk-eu-west-0a-worker2
cloud@rebond-oks:~$ kubectl -n longhorn-system delete [nodes.longhorn.io](https://nodes.longhorn.io/) winoks-rlk-eu-west-0b-worker2
cloud@rebond-oks:~$ kubectl -n longhorn-system delete [nodes.longhorn.io](https://nodes.longhorn.io/) winoks-rlk-eu-west-0c-worker2
then I redeployed longhorn-manager and everything is coming back well. Replicas are rebuilding.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.