There are potential failure cases, where you have permanently lost all schedulers and/

Simple scheduler & controller-manager disaster recovery about bootkube HOT 23 CLOSED

kubernetes-retired commented on August 23, 2024

Simple scheduler & controller-manager disaster recovery

from bootkube.

Comments (23)

aaronlevy commented on August 23, 2024 5

One low-hanging fruit is that we should be deploying multiple copies of the controller-manager/scheduler. In that case you would be doing a rolling-update of the component, verifying that the new functionality works before destroying all of the old copies.

However, there are still situations where we have a loss of all schedulers and/or controller-manager (e.g. maybe a flag change is subtly broken, but the pod is still running so the deployment manager rolls out all broken pods).

You could launch a new master as an option, but if you still have an api-server/etcd running you should be able to recover. Essentially you would need to inject a controller-manager pod into the cluster, then delete it as soon as your existing controller-manager deployment has been scheduled.

For example:

kubectl --namespace=kube-system get deployment scheduler -oyaml

Then take the podSpec section (second indented spec, with a containers field right below):

Something like:

    spec:
      containers:
      - name: kube-controller-manager
        image: quay.io/coreos/hyperkube:v1.4.0_coreos.0
        command:
            [...]

Then wrap that in a pod header, and specify the nodeName it should run on:

apiVersion: v1
kind: Pod
metadata:
  name: recovery-cm
spec:
  nodeName: <a node in your cluster>
  containers:
  - name: kube-controller-manager
    image: quay.io/coreos/hyperkube:v1.4.0_coreos.0
    command:
       [...]

Then inject it in the cluster
kubectl create -f recovery-pod.yaml

What will happen is that this pod will act as controller-manager, convert your existing deployment/controller-manager into pods - then they will be scheduled. After that you can just delete the recovery pod:

kubectl delete -f recovery-pod.yaml

from bootkube.

aaronlevy commented on August 23, 2024 1

Thinking longer term, rather than a separate kube-recover tool, this could be a command in kubectl which knows how to extract a pod from higher-order object -- this could be useful in the case where we have a kubelet-pod-api (so we could use this to natively push pod to kubelet api from a deployment/daemonset object rather than needing the intermediate pod state). See #97 (comment)

from bootkube.

mfburnett commented on August 23, 2024 1

Is this documented anywhere? Could help our users!

from bootkube.

chancez commented on August 23, 2024

👍 I think an interesting idea is to play with re-running bootkube for recovery. I found that generally if I just re-run bootkube nothing too crazy happens and the temporary control plane brings up a scheduler, allowing everything to correct itself. I really like the UX of this approach, even if its not a real solution.

In the static manifest case this problem isnt really a problem because the pods which run aren't scheduled, so the scheduler failing in any scenario always results in it being restarted and able to run (assuming it can grab it's lease).

I wonder if there's a good way to do this, perhaps by check pointing the scheduler to nodes which have role=master would be a good start, as an alternative to your proposed kube-recover strategy? This would be mostly automated, the only issue is how do you run this checkpointer in a way that works when the scheduler runs into this? Well, that's probably a static pod, and that means we can't easily manage it, which is problematic if the master nodes are changing.

from bootkube.

chancez commented on August 23, 2024

Also, definitely agree that a kubelet-pod-api that skips scheduling makes this story way better.

from bootkube.

Raffo commented on August 23, 2024

Hi guys, I was trying bootkube and I had a similar case. In my case, I figured that the controller had no --cloud-provide=aws flag set and I tried editing the controller-manger with kubectl edit... and it was a bad idea. The result was that I can't recover the cluster cause I am not able to schedule anything.
What I'm thinking is: in this case we have etcd running on the master itself, which is just wrong, but I thought that if we run a reliable, separated etcd cluster, this problem would not exist at all. Just run a new master with bootkube and attach it to the etcd cluster and kill the broken one. Would this make sense?

from bootkube.

coresolve commented on August 23, 2024

Just hit this as well as a result of a container linux reboot

from bootkube.

abourget commented on August 23, 2024

In addition, if your scheduler is down, recovering the controller-manager won't be enough. In this case, put a file like this in /etc/kubernetes/manifests/scheduler.yml on one of the nodes, so it passes your kube-scheduler Deployment into Running rather than Pending state:

#
# Add this to a node in `/etc/kubernetes/manifests` to recover your scheduler, and
# schedule the pods needed to run your configured Deployments :)
#
kind: Pod
apiVersion: v1
metadata:
  name: kube-scheduler
  namespace: kube-system
  labels:
    k8s-app: kube-scheduler
spec:
  containers:
  - name: kube-scheduler
    image: quay.io/coreos/hyperkube:v1.5.3_coreos.0
    command:
    - ./hyperkube
    - scheduler
    - --leader-elect=true
    - --kubeconfig=/etc/kubernetes/kubeconfig
    volumeMounts:
    - name: etc-kubernetes
      mountPath: /etc/kubernetes
      readOnly: true
  volumes:
  - name: etc-kubernetes
    hostPath:
      path: /etc/kubernetes

from bootkube.

aaronlevy commented on August 23, 2024

You can do the same steps I outlined above for the scheduler as well (and don't need to actually ssh into a machine to create the static manifest).

from bootkube.

abourget commented on August 23, 2024

But who schedules the recovery scheduler if the scheduler is dead? :-)

from bootkube.

aaronlevy commented on August 23, 2024

Per the steps outlined above you would populate the pod's spec.NodeName so that the pod is pre-assigned to a node - no scheduler needed.

from bootkube.

abourget commented on August 23, 2024

Oh right!! That's great. Thanks :-)

from bootkube.

radhikapc commented on August 23, 2024

Working on it @mfburnett as i have hit with the same issue today while upgrading. Should I be creating a doc defect for this, or would you be doing it for me? cc @aaronlevy

from bootkube.

mfburnett commented on August 23, 2024

Thanks @radhikapc!

from bootkube.

aaronlevy commented on August 23, 2024

Just to track some internal discussions -- another option might be to propose a sub-command to kubectl upstream. Not sure of UX specifics, but maybe something like:

kubectl pod-from deployment/kube-scheduler --target=nodename
kubectl pod-from daemonset/foo --target=nodename
kubectl pod-from podtemplate/foo --target=nodename

from bootkube.

abourget commented on August 23, 2024

Another simple way to hook back a Pod to a Node, when Scheduler + Controller-manager are dead:

kubectl create -f rescue-binding.yaml

with this content:

apiVersion: v1
kind: Binding
metadata:
  name: "kube-dns-2431531914-61pgv"
  namespace: "kube-system"
target:
  apiVersion: v1
  kind: Node
  name: "ip-10-22-4-152.us-west-2.compute.internal"

A Binding is the objected injected in the K8s cluster when the Schedule takes a scheduling decision. You can do the same manually.

from bootkube.

klausenbusk commented on August 23, 2024

Another simple way to hook back a Pod to a Node, when Scheduler + Controller-manager are dead:

Thanks! This prevented me from restarting the whole cluster (again) with bootkube recover and bootkube start.

Someone should properly add this to: https://github.com/kubernetes-incubator/bootkube/blob/master/Documentation/disaster-recovery.md

from bootkube.

redbaron commented on August 23, 2024

Been thinking about this, can't checkpointer help here? mark controller-manager as pod to be checkpointed, then it would be recovering it if it can't find it on the node. There can be edge case when it recovers too many pods, but given that they do leader election that is fine to have few extra running.

As for the scheduler, making it daemonset would allow it to be enough just for controller-manager & apiserver to be alive as daemonsets are scheduled by controller, at least in current 1.8.x release

from bootkube.

aaronlevy commented on August 23, 2024

With the behavior changes introduced in #755 checkpointing the controller-manager / schedule might be possible (before that change we might garbage the checkpoints of those components before replacements have been scheduled). It might still be a little bit racy though.

cc @diegs

from bootkube.

fejta-bot commented on August 23, 2024

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

from bootkube.

fejta-bot commented on August 23, 2024

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

from bootkube.

fejta-bot commented on August 23, 2024

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

from bootkube.

k8s-ci-robot commented on August 23, 2024

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

from bootkube.

Simple scheduler & controller-manager disaster recovery about bootkube HOT 23 CLOSED

Comments (23)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent