Giter Site home page Giter Site logo

kube-aws / kube-spot-termination-notice-handler Goto Github PK

View Code? Open in Web Editor NEW
378.0 9.0 77.0 90 KB

A Kubernetes DaemonSet to gracefully delete pods 2 minutes before an EC2 Spot Instance gets terminated

License: Apache License 2.0

Shell 78.68% Dockerfile 6.06% Python 15.26%

kube-spot-termination-notice-handler's Introduction

A Kubernetes DaemonSet to run 1 container per node to periodically polls the EC2 Spot Instance Termination Notices endpoint. Once a termination notice is received, it will try to gracefully stop all the pods running on the Kubernetes node, up to 2 minutes before the EC2 Spot Instance backing the node is terminated.

Installation

Helm

A helm chart has been created for this tool, and at time of writing was in the stable repository.

$ helm install stable/k8s-spot-termination-handler

Available docker images/tags

Tags denotes Kubernetes/kubectl versions. Using the same version for your Kubernetes cluster and spot-termination-notice-handler is recommended. Note that the -1 (or similar) is the revision of this tool, in case we need versioning.

  • kubeaws/kube-spot-termination-notice-handler:1.8.5-1
  • kubeaws/kube-spot-termination-notice-handler:1.9.0-1
  • kubeaws/kube-spot-termination-notice-handler:1.10.11-2
  • kubeaws/kube-spot-termination-notice-handler:1.11.3-1
  • kubeaws/kube-spot-termination-notice-handler:1.12.0-2
  • kubeaws/kube-spot-termination-notice-handler:1.13.7-1

Why use it

  • So that your kubernetes jobs backed by spot instances can keep running on another instances (typically on-demand instances)

How it works

Each spot-termination-notice-handler pod polls the notice endpoint until it returns a http status 200. That status means a termination is scheduled for the EC2 spot instance running the handler pod, according to my study).

Run kubectl logs against the handler pod to watch how it works.

$ kubectl logs --namespace kube-system spot-termination-notice-handler-ibyo6
This script polls the "EC2 Spot Instance Termination Notices" endpoint to gracefully stop and then reschedule all the pods running on this Kubernetes node, up to 2 minutes before the EC2 Spot Instance backing the node is terminated.
See https://aws.amazon.com/jp/blogs/aws/new-ec2-spot-instance-termination-notices/ for more information.
`kubectl drain minikubevm` will be executed once a termination notice is made.
Polling http://169.254.169.254/latest/meta-data/spot/termination-time every 5 second(s)
Fri Jul 29 07:38:59 UTC 2016: 404
Fri Jul 29 07:39:04 UTC 2016: 404
Fri Jul 29 07:39:09 UTC 2016: 404
Fri Jul 29 07:39:14 UTC 2016: 404
...
Fri Jul 29 hh:mm:ss UTC 2016: 200

Building against a specific version of Kubernetes

Run KUBE_VERSION=<your desired k8s version> make build to specify the version number of k8s/kubectl.

Slack Notifications

Introduced in version 0.9.2 of this application (the @mumoshu version), you are able to setup a Slack incoming web hook in order to send slack notifications to a channel, notifying the users that an instance has been terminated.

Incoming WebHooks require that you set the SLACK_URL environmental variable as part of your PodSpec.

You can also set SLACK_CHANNEL to send message to different slack channel insisted of default slack webhook url's channel.

The URL should look something like: https://hooks.slack.com/services/T67UBFNHQ/B4Q7WQM52/1ctEoFjkjdjwsa22934

Slack Setup:

Show where things are happening by setting the CLUSTER environment variable to whatever you call your cluster. Very handy if you have several clusters that report to the same Slack channel.

Example Pod Spec:

        env:
          - name: POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: SLACK_URL
            value: "https://hooks.slack.com/services/T67UBFNHQ/B4Q7WQM52/1ctEoFjkjdjwsa22934"
          - name: SLACK_CHANNEL
          - value: "#devops"
          - name: CLUSTER
            value: development

Sematext Cloud Event Notifications

The Sematext Cloud event URL is different for Europe and USA and includes the application token for your monitored App.

Sematext Setup:

Show where things are happening by setting the CLUSTER environment variable to whatever you call your cluster. Very handy if you have several clusters that report to the same Slack channel.

Example Pod Spec:

        env:
          - name: POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: SEMATEXT_URL
            value: "https://event-receiver.sematext.com/APPLICATION_TOKEN/event"
          - name: CLUSTER
            value: development
          - name: DETACH_ASG
            value: "true"

Wechat Notifications

Incoming WebHooks require that you set the WECHAT_URL and WECHAT_KEY environmental variables as part of your PodSpec.

The URL should look something like: https://pushbear.ftqq.com/sub?key=3488-876437815599e06514b2bbc3864bc96a&text=SpotTermination&desp=SpotInstanceDetainInfo

Wechat Setup:

Example Pod Spec:

        env:
          - name: POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: WECHAT_URL
            value: "https://pushbear.ftqq.com/sub"
          - name: WECHAT_KEY
            value: "3488-876437815599e06514b2bbc3864bc96a"
          - name: CLUSTER
            value: development
          - name: DETACH_ASG
            value: "true"

AutoScaling Detachment

This feature currently only supports simple autoscaling - no spot fleet or similar.

If you set the environment variable DETACH_ASG to any value other than false, the handler will detach the instance from the ASG, which may bring a replacement instance up sooner.

The autoscaling group name is automatically detected by the handler.

Credits

kube-spot-termination-notice-handler is a collaborative project to unify @mumoshu and @kylegato's initial work and @egeland's fork with various enhancements and simplifications.

The project is currently maintained by:

  • @egeland
  • @kylegato
  • @mumoshu

kube-spot-termination-notice-handler's People

Contributors

drewhemm avatar dzoeteman avatar egeland avatar jbrehm avatar kforsthoevel avatar kimxogus avatar kylegato avatar m0rganic avatar matthope avatar max-rocket-internet avatar megastef avatar mpfgomes avatar mumoshu avatar mvisonneau avatar omerxx avatar savar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kube-spot-termination-notice-handler's Issues

Discussion: how many kubernetes versions should we support directly?

As we generate images based both on the kubectl version (directly tied to the Kubernetes release version) and the version of this tool, we need to decide how many versions of Kubernetes we should support.
I mean the MINOR version numbers - so, for example, LATEST is currently 1.13.4, LATEST-1 is 1.12.6 - at time of writing.

My suggestion is to support LATEST, LATEST-1, LATEST-2 at their highest PATCH version level.

This would involve creating version branches for the MINOR k8s, and applying tags in these branches, after backporting new features.
We would create branches 1.13, 1.12, and 1.11 and figure out the easiest way to port changes to them.

Thoughts?

[Question] upgrading aws-cli? Supporting serviceAccountAnnotations?

Hi,

We would like to use the detaching feature.

  1. We use kubectl annotate serviceaccount to provide a service-account for spot-termination-handler pod.
  2. It provides AWS_ROLE_ARN & AWS_WEB_IDENTITY_TOKEN_FILE environment variables.
$ env | grep AWS
AWS_ROLE_ARN=arn:aws:iam::XXXXXXXXXX:role/XXXXXXXXXXXX-eu-west-1-kube-system-spot-termination-handler
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
  1. AWS cli version 1.16.199 installed in the docker image kubeaws/kube-spot-termination-notice-handler:1.13.7-1 does not support resolving credentials via STS AssumeRoleWithWebIdentity.
    There is an error:
An error occurred (AccessDenied) when calling the DescribeAutoScalingInstances operation: User: arn:aws:sts::XXXXXXXXXXX:assumed-role/XXXXXX-eks-worker-eu-west-1/i-xxxxxxxx is not authorized to perform: autoscaling:DescribeAutoScalingInstances

AWS cli ignores AWS_ROLE_ARN & AWS_WEB_IDENTITY_TOKEN_FILE.
This feature was introduced only in the version 1.16.210 - https://github.com/aws/aws-cli/blob/develop/CHANGELOG.rst#116210

Could you please upgrade the aws cli (with version >1.16.210) and build a new docker image.


Additionally, it would be great to have an option to add rbac.serviceAccountAnnotations to the Helm Chart as was done for cluster-autoscaler, for example:
https://github.com/helm/charts/blob/master/stable/cluster-autoscaler/templates/serviceaccount.yaml#L10

In this case we can replace running kubectl annotate serviceaccount and restarting pods manually with setting annotations as Helm values:

--set rbac.serviceAccountAnnotations."eks\.amazonaws\.com/role-arn"=${TF_STATE[cluster_autoscaler_iam_role]} \

Thank you.

Best regards,
Mikalai

General question, why no need to authenticate `kubectl`?

This might be a stupid but short question. In the script and dockerfile I don't see anyplace where the user enters the credentials of the cluster, yet still kubectl is able to drain a node. What kind of authentication is behind this? Is it using some intrinsic k8s feature? Please educate me.

Thanks.

Helm Chart repo deprecated

The Helm Chart repo stable/k8s-spot-termination-handler is deprecated now.
Is there any new location to fetch the Chart from?

[Question] Support IAM roles for service account ? What are required IAM policies ?

Hi,

We removed most of IAM policy from instance role of worker node since EKS support IAM roles for service account and some apps like cluster-autoscaler already support it.

but we got below error log

[k8s-spot-termination-handler] An error occurred (AccessDenied) when calling the DescribeAutoScalingInstances operation: User: arn:aws:sts::xxxxx:assumed-role/sxxxx/i-xxxx is not authorized to perform: autoscaling:DescribeAutoScalingInstances

I think this is because we turned the ASG detach feature, so I'm wondering if spot-termination-handler support IAM roles for service account ? If not, what are the minimum required IAM policies for it so I can added it back to instance role? It would be appreciated if you can document on README,

Thanks.

this project is still running?

This project is active? if yes, I would like to contribute. saw the version is quite older.

Let me know if need help here or I should use another tool

thanks!

How to remove --force from drain command ?

Hi,

Due to --force drain - application is closing connection in Nginx containers ( Using all spot instances for Kubernetes cluster )

I would like to remove --force from below command in entrypoint.sh.

kubectl drain "${NODE_NAME}" --force --ignore-daemonsets --delete-local-data --grace-period="${GRACE_PERIOD}"

please let me know possible solution.

Thanks

ARM Support

Is your feature request related to a problem? Please describe.
ARM support for AWS gravtion2 processors

Describe the solution you'd like
should be able to be achieved within the build pipeline with the following:

docker buildx install
docker buildx create --use
docker buildx build --platform linux/amd64,linux/arm64 . -t kubeaws/kube-spot-termination-notice-handler:latest --push

We need to auto-generate Docker images

Acceptance Criteria

  • A latest image is generated from the master branch, on commit.
  • A version-tagged image is generated when a tag is made to the git repo.
  • The README is updated to indicate where the image repository is.

Why is the asg detach only for regular ASG's?

I'm running spot fleets currently and testing out the commands that will detach the ASG seems to work. I'm curious if there was something behind the scenes that I was unaware of that would cause problems?

Won't drain pods with local storage

Many of our pods use local storage but can still be drained. We should be able to (optionally?) specify the --delete-local-storage option on drain

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.