kube-aws / kube-spot-termination-notice-handler Goto Github PK

A Kubernetes DaemonSet to gracefully delete pods 2 minutes before an EC2 Spot Instance gets terminated

License: Apache License 2.0

Shell 78.68% Dockerfile 6.06% Python 15.26%

kube-spot-termination-notice-handler's Introduction

A Kubernetes DaemonSet to run 1 container per node to periodically polls the EC2 Spot Instance Termination Notices endpoint. Once a termination notice is received, it will try to gracefully stop all the pods running on the Kubernetes node, up to 2 minutes before the EC2 Spot Instance backing the node is terminated.

Installation

Helm

A helm chart has been created for this tool, and at time of writing was in the stable repository.

$ helm install stable/k8s-spot-termination-handler

Available docker images/tags

Tags denotes Kubernetes/kubectl versions. Using the same version for your Kubernetes cluster and spot-termination-notice-handler is recommended. Note that the -1 (or similar) is the revision of this tool, in case we need versioning.

kubeaws/kube-spot-termination-notice-handler:1.8.5-1
kubeaws/kube-spot-termination-notice-handler:1.9.0-1
kubeaws/kube-spot-termination-notice-handler:1.10.11-2
kubeaws/kube-spot-termination-notice-handler:1.11.3-1
kubeaws/kube-spot-termination-notice-handler:1.12.0-2
kubeaws/kube-spot-termination-notice-handler:1.13.7-1

Why use it

So that your kubernetes jobs backed by spot instances can keep running on another instances (typically on-demand instances)

How it works

Each spot-termination-notice-handler pod polls the notice endpoint until it returns a http status 200. That status means a termination is scheduled for the EC2 spot instance running the handler pod, according to my study).

Run kubectl logs against the handler pod to watch how it works.

$ kubectl logs --namespace kube-system spot-termination-notice-handler-ibyo6
This script polls the "EC2 Spot Instance Termination Notices" endpoint to gracefully stop and then reschedule all the pods running on this Kubernetes node, up to 2 minutes before the EC2 Spot Instance backing the node is terminated.
See https://aws.amazon.com/jp/blogs/aws/new-ec2-spot-instance-termination-notices/ for more information.
`kubectl drain minikubevm` will be executed once a termination notice is made.
Polling http://169.254.169.254/latest/meta-data/spot/termination-time every 5 second(s)
Fri Jul 29 07:38:59 UTC 2016: 404
Fri Jul 29 07:39:04 UTC 2016: 404
Fri Jul 29 07:39:09 UTC 2016: 404
Fri Jul 29 07:39:14 UTC 2016: 404
...
Fri Jul 29 hh:mm:ss UTC 2016: 200

Building against a specific version of Kubernetes

Run KUBE_VERSION=<your desired k8s version> make build to specify the version number of k8s/kubectl.

Slack Notifications

Introduced in version 0.9.2 of this application (the @mumoshu version), you are able to setup a Slack incoming web hook in order to send slack notifications to a channel, notifying the users that an instance has been terminated.

Incoming WebHooks require that you set the SLACK_URL environmental variable as part of your PodSpec.

You can also set SLACK_CHANNEL to send message to different slack channel insisted of default slack webhook url's channel.

The URL should look something like: https://hooks.slack.com/services/T67UBFNHQ/B4Q7WQM52/1ctEoFjkjdjwsa22934

Slack Setup:

Show where things are happening by setting the CLUSTER environment variable to whatever you call your cluster. Very handy if you have several clusters that report to the same Slack channel.

Example Pod Spec:

        env:
          - name: POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: SLACK_URL
            value: "https://hooks.slack.com/services/T67UBFNHQ/B4Q7WQM52/1ctEoFjkjdjwsa22934"
          - name: SLACK_CHANNEL
          - value: "#devops"
          - name: CLUSTER
            value: development

Sematext Cloud Event Notifications

The Sematext Cloud event URL is different for Europe and USA and includes the application token for your monitored App.

USA URL: https://event-receiver.sematext.com/APPLICATION_TOKEN/event
Europe URL: https://event-receiver.eu.sematext.com/APPLICATION_TOKEN/event

Sematext Setup:

You get the APPLICATION_TOKEN when you create a Docker monitoring app in Sematext Cloud.
API Docs: https://sematext.com/docs/events/#adding-events

Show where things are happening by setting the CLUSTER environment variable to whatever you call your cluster. Very handy if you have several clusters that report to the same Slack channel.

Example Pod Spec:

        env:
          - name: POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: SEMATEXT_URL
            value: "https://event-receiver.sematext.com/APPLICATION_TOKEN/event"
          - name: CLUSTER
            value: development
          - name: DETACH_ASG
            value: "true"

Wechat Notifications

Incoming WebHooks require that you set the WECHAT_URL and WECHAT_KEY environmental variables as part of your PodSpec.

The URL should look something like: https://pushbear.ftqq.com/sub?key=3488-876437815599e06514b2bbc3864bc96a&text=SpotTermination&desp=SpotInstanceDetainInfo

Wechat Setup:

You get the WECHAT_KEY by pushbear
You bind WECHAT_KEY to a QR code after you create a Wechat Service account.
API Docs: http://pushbear.ftqq.com/admin/ ; https://mp.weixin.qq.com/?lang=en_US

Example Pod Spec:

        env:
          - name: POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: WECHAT_URL
            value: "https://pushbear.ftqq.com/sub"
          - name: WECHAT_KEY
            value: "3488-876437815599e06514b2bbc3864bc96a"
          - name: CLUSTER
            value: development
          - name: DETACH_ASG
            value: "true"

AutoScaling Detachment

This feature currently only supports simple autoscaling - no spot fleet or similar.

If you set the environment variable DETACH_ASG to any value other than false, the handler will detach the instance from the ASG, which may bring a replacement instance up sooner.

The autoscaling group name is automatically detected by the handler.

Credits

kube-spot-termination-notice-handler is a collaborative project to unify @mumoshu and @kylegato's initial work and @egeland's fork with various enhancements and simplifications.

The project is currently maintained by:

@egeland
@kylegato
@mumoshu

kube-spot-termination-notice-handler's People

Contributors

Stargazers

Watchers

Forkers

matthope alexxnica kryndex tuananh dzoeteman vainuio voxxit lftoledo clearbit megastef egeland sudiksha jaimguer zhenyu-aws-lab ellerbrock cloud-architecture kimxogus c3mb0 kforsthoevel mvisonneau omerxx eugenestarchenko shiftio obellagamba drewhemm samaws1 xionzhao duta-inc camilorivera savar mgugino-upstream-stage andy-b-84 getoutreach bootc oba11 followanalytics deepak1100 mpfgomes ljakimczuk dgunjetti t0ny-peng m0rganic mdedonno1337 shencan dasydong grofers anishmourya chanange hongshibao chargepoint igordrnobrega xr0n1ck jijotj liorrozen bala-27 devsisters skondla xiv jamf catalinpan bombbomb whogan00 ergton maksim-paskal stafot barcelos3 julian3xl ekmixon fabriziogaliano hoppinger nuclon marv-devops xiaocongji

kube-spot-termination-notice-handler's Issues

Spot rescheduler

What do you think to integrate this with https://github.com/pusher/k8s-spot-rescheduler/?

Discussion: how many kubernetes versions should we support directly?

As we generate images based both on the kubectl version (directly tied to the Kubernetes release version) and the version of this tool, we need to decide how many versions of Kubernetes we should support.
I mean the MINOR version numbers - so, for example, LATEST is currently 1.13.4, LATEST-1 is 1.12.6 - at time of writing.

My suggestion is to support LATEST, LATEST-1, LATEST-2 at their highest PATCH version level.

This would involve creating version branches for the MINOR k8s, and applying tags in these branches, after backporting new features.
We would create branches 1.13, 1.12, and 1.11 and figure out the easiest way to port changes to them.

Thoughts?

[Question] upgrading aws-cli? Supporting serviceAccountAnnotations?

Hi,

We would like to use the detaching feature.

We use kubectl annotate serviceaccount to provide a service-account for spot-termination-handler pod.
It provides AWS_ROLE_ARN & AWS_WEB_IDENTITY_TOKEN_FILE environment variables.

$ env | grep AWS
AWS_ROLE_ARN=arn:aws:iam::XXXXXXXXXX:role/XXXXXXXXXXXX-eu-west-1-kube-system-spot-termination-handler
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token

AWS cli version 1.16.199 installed in the docker image kubeaws/kube-spot-termination-notice-handler:1.13.7-1 does not support resolving credentials via STS AssumeRoleWithWebIdentity.
There is an error:

An error occurred (AccessDenied) when calling the DescribeAutoScalingInstances operation: User: arn:aws:sts::XXXXXXXXXXX:assumed-role/XXXXXX-eks-worker-eu-west-1/i-xxxxxxxx is not authorized to perform: autoscaling:DescribeAutoScalingInstances

AWS cli ignores AWS_ROLE_ARN & AWS_WEB_IDENTITY_TOKEN_FILE.
This feature was introduced only in the version 1.16.210 - https://github.com/aws/aws-cli/blob/develop/CHANGELOG.rst#116210

Could you please upgrade the aws cli (with version >1.16.210) and build a new docker image.

Additionally, it would be great to have an option to add rbac.serviceAccountAnnotations to the Helm Chart as was done for cluster-autoscaler, for example:
https://github.com/helm/charts/blob/master/stable/cluster-autoscaler/templates/serviceaccount.yaml#L10

In this case we can replace running kubectl annotate serviceaccount and restarting pods manually with setting annotations as Helm values:

--set rbac.serviceAccountAnnotations."eks\.amazonaws\.com/role-arn"=${TF_STATE[cluster_autoscaler_iam_role]} \

Thank you.

Best regards,
Mikalai

1.13.7-1 image not in dockerhub

Not really an issue but noticed when deploying this that the most recent image is not actually in docker hub. I created my own image for the time being but wanted to note that it's not building these images on tag.

https://hub.docker.com/r/kubeaws/kube-spot-termination-notice-handler/tags

Is this available for managed node groups and how this handle the pvc

I want to know if this repo is available when using managed node groups, also when this tool gracefully terminate the pods, how it handle the pvc which pods attached to

General question, why no need to authenticate `kubectl`?

This might be a stupid but short question. In the script and dockerfile I don't see anyplace where the user enters the credentials of the cluster, yet still kubectl is able to drain a node. What kind of authentication is behind this? Is it using some intrinsic k8s feature? Please educate me.

Thanks.

Helm Chart repo deprecated

The Helm Chart repo stable/k8s-spot-termination-handler is deprecated now.
Is there any new location to fetch the Chart from?

[Question] Support IAM roles for service account ? What are required IAM policies ?

Hi,

We removed most of IAM policy from instance role of worker node since EKS support IAM roles for service account and some apps like cluster-autoscaler already support it.

but we got below error log

[k8s-spot-termination-handler] An error occurred (AccessDenied) when calling the DescribeAutoScalingInstances operation: User: arn:aws:sts::xxxxx:assumed-role/sxxxx/i-xxxx is not authorized to perform: autoscaling:DescribeAutoScalingInstances

I think this is because we turned the ASG detach feature, so I'm wondering if spot-termination-handler support IAM roles for service account ? If not, what are the minimum required IAM policies for it so I can added it back to instance role? It would be appreciated if you can document on README,

Thanks.

Automatic build in Docker Hub is not working

I think there's a permissions issue - maybe I don't have the right access level to set up the automatic build?

Can someone with access to the Settings for this repo please check if there's an integration with Docker set up? Alternatively, bump my access so I can fix it. 😃

We need one set up to trigger the builds.
See https://hub.docker.com/r/kubeaws/kube-spot-termination-notice-handler/~/settings/automated-builds/

this project is still running?

This project is active? if yes, I would like to contribute. saw the version is quite older.

Let me know if need help here or I should use another tool

thanks!

How to remove --force from drain command ?

Hi,

Due to --force drain - application is closing connection in Nginx containers ( Using all spot instances for Kubernetes cluster )

I would like to remove --force from below command in entrypoint.sh.

kubectl drain "${NODE_NAME}" --force --ignore-daemonsets --delete-local-data --grace-period="${GRACE_PERIOD}"

please let me know possible solution.

Thanks

Add support for MS Teams notification

Hi Team,

Please add suport for MS Teams notification WebHook

https://docs.microsoft.com/en-us/microsoftteams/platform/webhooks-and-connectors/how-to/connectors-using

Thanks in advance

ARM Support

Is your feature request related to a problem? Please describe.
ARM support for AWS gravtion2 processors

Describe the solution you'd like
should be able to be achieved within the build pipeline with the following:

docker buildx install
docker buildx create --use
docker buildx build --platform linux/amd64,linux/arm64 . -t kubeaws/kube-spot-termination-notice-handler:latest --push

where is the right place to report issue with the helm chart?

We need to auto-generate Docker images

Acceptance Criteria

A latest image is generated from the master branch, on commit.
A version-tagged image is generated when a tag is made to the git repo.
The README is updated to indicate where the image repository is.

kube-aws / kube-spot-termination-notice-handler Goto Github PK

kube-spot-termination-notice-handler's Introduction

Installation

Helm

Available docker images/tags

Why use it

How it works

Building against a specific version of Kubernetes

Slack Notifications

Sematext Cloud Event Notifications

Wechat Notifications

AutoScaling Detachment

Credits

kube-spot-termination-notice-handler's People

Contributors

Stargazers

Watchers

Forkers

kube-spot-termination-notice-handler's Issues

Acceptance Criteria

Recommend Projects

Recommend Topics

Recommend Org