From the README: When you start bootkube, you must also

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Support deploying self-hosted etcd about bootkube HOT 22 CLOSED

kubernetes-retired commented on August 23, 2024

Support deploying self-hosted etcd

from bootkube.

Comments (22)

xiang90 commented on August 23, 2024 2

@philips I have thought about this a little bit. And here is the workflow in my mind:

create a one member cluster in bootkube

... k8s is ready...

start etcd controller
etcd controller adds a member into the one member cluster created by bootkube
wait for the new member to sync with the seed one
remove the bootkube etcd member

Now etcd controller fully control the etcd cluster and can grow the size to desired size.

from bootkube.

philips commented on August 23, 2024 2

As an update on the etcd and self-hosted plan we have merged support behind an experimental flag in bootkube: https://github.com/kubernetes-incubator/bootkube/blob/master/cmd/bootkube/start.go#L37

This is self-hosted and self-healing etcd on top of Kubernetes.

from bootkube.

aaronlevy commented on August 23, 2024 1

@jamiehannaford

why does the self-hosted etcd need to wait for certain pods to exist before the data migration happens?

It could likely work in this order as well - but there could be more coordination points (vs just "everything is running - so let the etcd-operator take over"). For example, we would need to make sure to deploy kube-proxy & etcd-operator, then do the etcd pivot, then create the rest of the cluster components. Where right now it's just "create all components that exist in the /manifest dir, wait for some of them, do etcd-pivot" - which initially is easier.

Are there any issues particular to the current order that you've found?

Doesn't this create some kind of crash loop since the API server has nothing to connect to?

Sort of. Really everything pivots around etcd / apiserver addressability. The "real" api-server doesn't immediately take over, because it is unable to bind on 8080/443 (bootkube apiserver is still listening on those ports). The rest of the components don't know if they're talking to bootkube-apiserver or "real" apiserver. It's just an address they expect to reach. So when we're ready to pivot to the self-hosted control-plane, it's just simply exiting the bootkube-apiserver so the ports free up.

You're right that there will be a moment where no api-server is bound to the ports - but it's actually fine in most cases for components to fail/retry - much of Kubernetes is designed this way (including its core components).

However, there currently is an issue where the bootkube-apiserver is still "active", but it expects to only be talking to the static/boot etcd node - however - that node may have already been removed as a member if the etcd cluster. This puts us in a state where the "active" bootkube apiserver can no longer reach the data-store and essentially becomes inactive.

See #372 for more info.

The above issue might be as simple as adding both boot-etcd address and the service IP for self-hosted etcd to the bootkube api-server, I just haven't had a chance to test that assumption.

from bootkube.

aaronlevy commented on August 23, 2024

We need the data that ends up in etcd to persist with the cluster that is launched. If that data lives in the bootkube, then bootkube must continue to run for the lifecycle of the cluster.

Alternatively, we need a way to pivot the etcd data injected during the bootstrap process to the "long-lived" etcd cluster. The long-lived cluster would essentially be a self-hosted etcd cluster launched as k8s components just like the rest of the control plane.

What I'd probably like to see is something along the lines of:

Bootkube runs etcd in-process
k8s objects injected to the api-server end up in the local/in-process etcd
One of those objects is an etcd pod definition, which is started as a self-hosted pod on a node.
The self hosted etcd "joins" the existing bootkube etcd, making a cluster of 2 nodes.
etcd replication copies all state to new joined etcd node
bootkube dies after a self-hosted control-plane is started, removing itself from etcd cluster membership
self-hosted etcd cluster is managed from that point forward as a k8s component.

Another option might be trying to copy the etcd keys from the in-process/local node to the self-hosted node, but this can get a little messy because we would be trying to manually copy (and mirror) data of a live cluster.

Some concerns with this approach:

Managing etcd membership in K8s is not currently a very good story. It's either waiting on petsets or trying to handle this with lifecycle hooks, or relying on external mechanics for membership management.
Pretty unproven and a bit risky from a production perspective to try and run etcd for the cluster, also "in" the cluster. But I can see the value in this from a "get started easily" while we exercise this as a viable option.

from bootkube.

aaronlevy commented on August 23, 2024

@philips what do you think about changing this issue to be "support self-host etcd", and dropping from 0.1.0 milestone ?

from bootkube.

aaronlevy commented on August 23, 2024

Adding notes from a side-discussion:

Another option that was mentioned is just copying keys from bootkube-etcd to cluster-etcd. This would require some coordination points in the bootkube process:

bootkube-apiserver configured to use bootkube-etcd
bootkube only injects objects for self-hosted etcd pods and waits for them to be started
bootkube stops internal api-server (no more changes to local state)
Copy all etcd keys form local to remote (self-hosted) cluster
start bootkube-apiserver again but have it point to the self-hosted etcd
create the rest of the self-hosted objects & finish bootkube run as normal

from bootkube.

stuart-warren commented on August 23, 2024

How do you want the self-hosted apiserver to discover the location of self-hosted etcd?
I tried using an external loadbalancer listening on 2379 with a known address, but the apiserver throws a bunch of:

reflector.go:334] pkg/storage/cacher.go:163: watch of *api.LimitRange ended with: client: etcd cluster is unavailable or misconfigured

v1.3.5 talking to etcd v3.0.3

edit:
These issues were just harmless error messages in the log file from having a 10sec client timeout in the haproxy config constantly breaking watches.

from bootkube.

kalbasit commented on August 23, 2024

I've managed to get this done. Using a separate ETCD cluster where each k8s node (master/minion) is running an ETCD in proxy mode. I'm using Terraform to configure both. The etcd module is available here and the k8s module is available here.

P.S: The master is not volatile and cannot be scaled. If the master node reboots it will not start any of the components again, not sure why but bootkube thinks they are running and quits. Possibly due to having /registry in etcd.

P.P.S: I had few issues doing that but mostly related to me adding --cloud-provider=aws to the kubelet, the controller and the api-server. Issues related bootkube started in a container without /etc/resolv.conf and /etc/ssl/certs/ca-certificates.crt. I'll file separate issues/PR for those.

from bootkube.

philips commented on August 23, 2024

@xiang90 and @hongchaodeng can you put some thoughts together on this in relation to having an etcd controller.

I think there are essentially two paths:

Copy the data from the bootkube etcd to the cluster etcd
Add the bootkube etcd to the cluster etcd, then remove the bootkube etcd once everything is replicated

I think option 2 is better because it means we don't have to worry about cutting over and having split brain. But! How do we do 2 if the cluster only intend to have one etcd member (say in AWS because you will have a single machine cluster backed by EBS).

I think we should try and prototype this out ASAP as this is the last remaining component that hasn't been proven to be self-hostable.

from bootkube.

ethernetdan commented on August 23, 2024

Started some work on this - got a bootkube-hosted etcd cluster up, now working on migrating from the bootkube instance to the etcd-controller managed instance

from bootkube.

pires commented on August 23, 2024

Add the bootkube etcd to the cluster etcd, then remove the bootkube etcd once everything is replicated

@philips what happens if the self-hosted etcd cluster (or the control plane behind it) dies? I believe this is why @aaronlevy mentioned it is:

(...) a bit risky from a production perspective to try and run etcd for the cluster, also "in" the cluster. But I can see the value in this from a "get started easily" while we exercise this as a viable option.

This is exactly the concern I shared in the design proposal.

Can this issue clarify if this concept is simply meant for non-production use-cases?

from bootkube.

philips commented on August 23, 2024

@pires if the self-hosted etcd cluster dies you need to recover using bootkube from a backup. This is really no different than if it died normally and you would have to redeploy the cluster from a backup and restart the API servers again.

from bootkube.

pires commented on August 23, 2024

@philips can you point me to the backup strategy you guys are designing or already implementing?

from bootkube.

xiang90 commented on August 23, 2024

@pires

I believe the backup @philips mentioned is actually the etcd backup. For the etcd-controller, we do a backup:

every X minutes. X is defined by the user
once we upgrade the cluster
user can hit backup/now endpoint to force a backup when there is an expected important event like upgrading k8s master components.

from bootkube.

pires commented on August 23, 2024

I understand the concept and it should work as you say, I'm just looking for more details on:

Where is each etcd member data stored?
Where is the backup data stored?
How is bootkube leveraging the stored data?

Don't take me wrong, I find this really cool and I'm trying to grasp it as much as possible as sig-cluster-lifecycle looks into HA.

from bootkube.

xiang90 commented on August 23, 2024

Where is each etcd member data stored?

The data is stored on local storage. etcd has builtin recovery mechinism. When you have a 3 member etcd cluster, you already have 3 local copies

Where is the backup data stored

Backup is a for extra safety. It helps with rollback + disaster recovery.
It stores on PV, like EBS, Ceph, GlusterFS, etc..

How is bootkube leveraging the stored data

If there is a disaster case or bad upgrade, we recover the cluster from the backup.

from bootkube.

orbatschow commented on August 23, 2024

@philips
What about using the new etcd operator, to run etcd fully managed on top of kubernetes, i think this will simplify maintenance, updates ... alot.

from bootkube.

xiang90 commented on August 23, 2024

@gitoverflow That is the plan.

from bootkube.

aaronlevy commented on August 23, 2024

I am going to close this as initial self-hosted etcd support has been merged. There are follow up issues open for specific tasks:

(documentation): #240
(adding support to all hack/* examples): #337
(iptables checkpointing): #284

from bootkube.

jamiehannaford commented on August 23, 2024

Although bootkube now supports a self-hosted etcd pod for bootstrapping, I can't find any documentation which explains:

How a follow-up etcd controller syncs with the bootkube etcd pod
Whether it's possible for an etcd-operator to manage the lifecycle of the cluster etcd controller itself (as opposed to a user-defined etcd cluster)

from bootkube.

aaronlevy commented on August 23, 2024

@jamiehannaford You're right - and we do need to catch up on Documentation. Some tracking issues:

#240
#311
#302

Regarding your questions:

We need to add a "how it works" section to this repo - but the closest so far might be the youtube link in my comment here: #302 - it briefly goes into how the seed etcd pod is pivoted into the self-hosted etcd cluster (by the etcd-operator).
Yes, the plan is for the etcd-operator to manage the cluster-etcd - so things like re-sizing, backups, updates, etc. (of your cluster etcd) could be managed by the etcd-operator.

from bootkube.

jamiehannaford commented on August 23, 2024

@aaronlevy Thanks for the links. I'm still wrapping my head around the boot-up procedure. It seems the chronology for a self-hosted etcd cluster is:

A static pod for etcd is created
The temp control panel is created
The self-hosted control plane components are created against the temp one
When all the components in 3 are ready, the etcd-operator creates the new self-hosted etcd cluster, migrating all the data from 1

My question is, why does the self-hosted etcd need to wait for certain pods to exist before the data migration happens? I thought the data migration would happen first, then all the final control plane elements would be created.

I looked at the init args for kube-apiserver, and it has the eventual IPv4 of the real etcd (10.3.0.15). This means there's a gap of time between the api-server being created and the real etcd existing. Doesn't this create some kind of crash loop since the API server has nothing to connect to? Or is this gap negligible?

from bootkube.

Support deploying self-hosted etcd about bootkube HOT 22 CLOSED

Comments (22)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent