clowdhaus / eksup Goto Github PK

View Code? Open in Web Editor NEW

110.0 3.0 9.0 4.63 MB

EKS cluster upgrade guidance

Home Page: https://clowdhaus.github.io/eksup/

License: Apache License 2.0

Rust 100.00%

eks eks-cluster kubernetes-upgrade upgrade cluster-upgrade

eksup's People

Contributors

Stargazers

Watchers

Forkers

mbeacom chiragbhatia8 bryanasdev000 anashakt nayanen starlightromero agurgel-te sonurudra levu74 ubante

eksup's Issues

[K8S005] Ensure either `.spec.affinity.podAntiAffinity` or `.spec.topologySpreadConstraint` is set

Use case

Ensure either .spec.affinity.podAntiAffinity or .spec.topologySpreadConstraint is set to avoid multiple pods from being scheduled on the same node. https://kubernetes.io/docs/concepts/configuration/assign-pod-node/

Prefer topology hints over affinity

Inter-pod affinity and anti-affinity require substantial amount of processing which can slow down scheduling in large clusters significantly. We do not recommend using them in clusters larger than several hundred nodes.

https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity

Solution/User Experience

Report on any workload constructs that do not have either .spec.affinity.podAntiAffinity or .spec.topologySpreadConstraint specified

Alternative solutions

None

[K8S009] Report on `podSecurityPolicy` use which has been deprecated and removed in `v1.25`

Use case

podSecurityPolicy was deprecated in v1.21, and removed in v1.25 - uses need to ensure they have removed their use of podSecurityPolicy and migrated to a suitable replacement prior to upgrading to v1.25

https://kubernetes.io/docs/concepts/security/pod-security-policy/

Solution/User Experience

Report on the use of podSecurityPolicy and advise to switch to pod security admission https://kubernetes.io/docs/concepts/security/pod-security-admission/

Alternative solutions

None

[AWS004]/[AWS005] Report on EBS service limits

Use case

As a user, I want to understand if I am at risk of encountering EBS volume service limits that could affect a cluster upgrade (i.e. - unable to launch new instances during the surge, rolling-update process do to an EBS volume service limit breach)

# GP2
aws support describe-trusted-advisor-check-result --check-id dH7RR0l6J9
# GP3
aws support describe-trusted-advisor-check-result --check-id dH7RR0l6J3

Solution/User Experience

Report on EBS volume service limit and provide feedback on whether changes are recommended or required prior to starting the upgrade process

Alternative solutions

None

Ability to convert from one API version schema to another

Use case

Given a manifest, convert the manifest to the next, stable API version. Some resources only need the API version changed, others will require the schema to be modified to match the new API version

Solution/User Experience

Users should be able to provide a command, either on a per-file basis, or across a directory of files (recursive), searching for the deprecated API version and updating to the next stable version including any schema changes required (where applicable if a mapping is possible)

Possible command(s):

eksup migrate apiextensions.k8s.io/v1beta1 --dir . --recursive
eksup migrate apiextensions.k8s.io/v1beta1 --file manifest.yaml
eksup migrate apiextensions.k8s.io/v1beta1 --dir manifests --recursive --dry-run

Alternative solutions

None

Cluster side CronJob for multi-cluster reporting

Use case

When running numerous clusters, it is challenging to run eksup from a CLI on each cluster to track and report on upgrade worthiness. Instead, I would like eksup to run on the cluster periodically and have the results sent to a central local for tracking and reporting.

Solution/User Experience

A CronJob manifest that will run eksup periodically
Ability to log results to stdout for a logging system to pick up
Ability to save results to S3 directly; useful for low costs aggregation and reporting with AWS Glue and AWS Athena

Alternative solutions

Running CLI per cluster - not really scalable for more than 30+ clusters

[AWS001] Report number of available IPs in the data plane subnets

Use case

I want to know the number of available IPs in my data plane subnets both as a whole (the entire data plane) as well as individually (per nodegroup/Fargate profile) to better understand if I may face any restrictions or issues when upgrading data plane components

Solution/User Experience

Report the total available IPs in the data plane
Report the number of available IPs per nodegroup or Fargate profile. Each nodegroup or Fargate profile will have 1+ subnet(s) associated which users should be aware of any IP restrictions under these individual groups

Alternative solutions

None

How to use aws config credentials ;multi aws account how to switch with --profile

Expected Behaviour

How to use aws config credentials ;multi aws account how to switch with --profile

Current Behaviour

eksup analyze -r us-west-2 -c cluster-name

Code snippet

ERROR eksup::eks::resources: Cluster k8s-devops not found

Possible Solution

No response

Steps to Reproduce

How to use aws config credentials ;multi aws account how to switch with --profile

eksup version

latest

Operating system

macOS x86_64

Error output

No response

K8S002: allow configuration for 2 replicas

Use case

There are many applications that use leader election (e.g. controllers) to achieve redundency where there's not much practical benefit to running 3 pods over 2. This is of course different to qorum based HA where minimum 3 pods are appropriate

when running eksup, pods with 2 replicas are erroneously reported as violating K8S002 and therefore are not highly available

Solution/User Experience

some kind of configurable way to override or edit rules for specific workloads would be great, but I appreciate this means introducing a config file and that adds a lot of complication

An ideal solution would allow me to say "workload X should have minimum Y replicas" so that we don't accidentally green light a workload with a single replica

Alternative solutions

The override could also be a cli flag, but I can see this getting very large for large clusters

Report on deprecated API usage

Use case

Detecting and reporting on deprecated/removed Kubernetes API versions is one of the largest concerns of upgrading Kubernetes clusters. While users may be aware of what APIs are deprecated or removed, identifying if any of those APIs are in use in the cluster is a much more challenging task.

Solution/User Experience

Use the apiserve_requested_deprecated_apis metric to detect usage of deprecated APIs
- https://kubernetes.io/blog/2020/09/03/warnings/
- https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1693-warnings
- kube-rs/kube#492 for implementation

Alternative solutions

For now, pluto or kubent are recommended to check for deprecated APIs
- Add section on how those tools work, what to watch out for (asking the API Server is not trustworthy, scanning manifests directly is the most accurate)
- https://github.com/clowdhaus/r8s#r8s

Feature request: Requesting 'profile' option to be added to the tool

Use case

Currently eksup takes in cluster name and region information when performing the analysis, it should also consider the profile (from aws config) to perform analysis as well.

Solution/User Experience

eksup analyze --cluster clustername --region regionname --profile awsprofilename

Alternative solutions

When performing eksup analyze if no profile information is provided, it should take in information from KUBECONFIG variable or ~/.kube/config file.

[K8S007] `pod.Spec.TerminationGracePeriodSeconds` > 0 for `StatefulSets`

Use case

The practice of setting a pod.Spec.TerminationGracePeriodSeconds of 0 seconds is unsafe and strongly discouraged for StatefulSet Pods. Graceful deletion is safe and will ensure that the Pod shuts down gracefully before the kubelet deletes the name from the apiserver.

https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#deployment-and-scaling-guarantees

Solution/User Experience

Report on StatefulSets where pod.Spec.TerminationGracePeriodSeconds == 0

Alternative solutions

None

Error: Launch template not found, launch configuration is not supportedFeature request: TITLE

Use case

checking to see if there is workaround for it, as analyze stops going further.

Solution/User Experience

Please skip the ASG check

Alternative solutions

No response

Bug: connection: timed out while working on ReplicaSets

Expected Behaviour

I'm testing the tool in a non-production cluster, and I'm experiencing some timeouts.

I wonder if the solution would be to add some timeout configurations, or if the tool is not intended to run on a cluster with a certain number of resources.

k get replicasets | wc -l
+ exec kubectl get replicasets --context xxx --namespace yyy
    4944

k get pods | wc -l
+ exec kubectl get pods --context xxx --namespace yyy
     972

Listing the replica sets via kubectl take around 17s.

Current Behaviour

The tool fails with a timeout error.

Code snippet

N/A

Possible Solution

No response

Steps to Reproduce

eksup analyze --cluster xxx --region us-east-1

eksup version

latest

Operating system

macOS x86_64

Error output

DEBUG hyper::proto::h1::conn: incoming body decode error: timed out
    at /Users/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-0.14.28/src/proto/h1/conn.rs:321

  TRACE hyper::proto::h1::conn: State::close()
    at /Users/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-0.14.28/src/proto/h1/conn.rs:948

  TRACE hyper::proto::h1::conn: flushed({role=client}): State { reading: Closed, writing: Closed, keep_alive: Disabled }
    at /Users/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-0.14.28/src/proto/h1/conn.rs:731

  TRACE hyper::proto::h1::conn: shut down IO complete
    at /Users/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-0.14.28/src/proto/h1/conn.rs:738

  TRACE tower::buffer::worker: worker polling for next message
    at /Users/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-0.4.13/src/buffer/worker.rs:108

  TRACE tower::buffer::worker: buffer already closed
    at /Users/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-0.4.13/src/buffer/worker.rs:62

...
  TRACE hyper::client::pool: pool closed, canceling idle interval
    at /Users/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-0.14.28/src/client/pool.rs:759

Error: Failed to list ReplicaSets

Caused by:
    0: HyperError: error reading a body from connection: error reading a body from connection: timed out
    1: error reading a body from connection: error reading a body from connection: timed out
    2: error reading a body from connection: timed out
    3: timed out

Feature request: I want this

Use case

something

Solution/User Experience

something

Alternative solutions

No response

Provide ability to configure the amount of output reported

Use case

Users may be interested in different levels of output. For example, some users may want to see only the required checks that failed and ignore the recommended checks. Others may want to see all reported results, even if there are no required/recommended changes.

Solution/User Experience

Configure output levels

    --quiet - suppress all output
    (default, no flags) - show failed checks on hard requirements
    --warn - in addition to failed, show warnings (low number of IPs available for nodes/pods, addon version older than current default, etc.)
    --info - in addition to failed and warnings, show informational notices (number of IPs available for nodes/pods, addon version relative to current default and latest, etc.)

Alternative solutions

No response

Unable to connect to the cluster

Expected Behaviour

eksup should analyze the cluster

Current Behaviour

Unable to connect to cluster. Ensure kubeconfig file is present and updated to connect to the cluster.

Code snippet

➜  foo AWS_REGION=eu-central-1 aws eks update-kubeconfig --name foo-bar-baz
Updated context arn:aws:eks:eu-central-1:1234567890:cluster/foo-bar-baz in /Users/lorem/.kube/config
➜  foo eksup analyze --cluster foo-bar-baz --region eu-central-1 -v        
Error: Unable to connect to cluster. Ensure kubeconfig file is present and updated to connect to the cluster.
      Try: aws eks update-kubeconfig --name foo-bar-baz

Possible Solution

read AWS creds from environment variables if provided

Steps to Reproduce

Set AWS creds as environment variables, try to run the aws eks update-kubeconfig --name <cluster-name>
run the command: eksup analyze --cluster <cluster name> --region <aws region>

eksup version

0.2.0-alpha3

Operating system

macOS x86_64

Error output

Error: Unable to connect to cluster. Ensure kubeconfig file is present and updated to connect to the cluster.
      Try: aws eks update-kubeconfig --name foo-bar-baz

Bug: Fail when using AWS profile with SSO with sso-session

Expected Behaviour

Supports SSO profiles using sso-session for authentication.

Current Behaviour

Credential chain fails

Code snippet

[sso-session SESSION_NAME]
sso_start_url = https://REDACTED.awsapps.com/start
sso_region = eu-west-1
sso_registration_scopes = sso:account:access

[profile PROFILE_NAME]
sso_account_id = 123456789000
sso_role_name = Administrator
region = eu-west-1
sso_session = SESSION_NAME

Possible Solution

Update aws-sdk-rust 😄

Steps to Reproduce

Install eksup
Setup your aws config like code snippet
Set AWS_PROFILE to your SSO profile
Try analyze like eksup analyze -c cluster-v2 -r eu-west-1

eksup version

latest

Operating system

Linux x86_64

Error output

eksup analyze -c cluster-v2 -r eu-west-1 -v
   WARN aws_config::profile::parser::normalize: profile `sso-session SESSION_NAME` ignored because `sso-session SESSION_NAME` was not a valid identifier
    at /cargo/registry/src/index.crates.io-6f17d22bba15001f/aws-config-1.1.3/src/profile/parser/normalize.rs:87

   WARN aws_config::meta::credentials::chain: provider failed to provide credentials, provider: Profile, error: the credentials provider was not properly configured: ProfileFile provider could not be built: profile `PROFILE_NAME` was not defined: `sso_region` was missing (InvalidConfiguration(InvalidConfiguration { source: "ProfileFile provider could not be built: profile `PROFILE_NAME` was not defined: `sso_region` was missing" }))
    at /cargo/registry/src/index.crates.io-6f17d22bba15001f/aws-config-1.1.3/src/meta/credentials/chain.rs:90

Bug: Getting an error while running analyze command for eks cluster

Expected Behaviour

eksup supposed to display the analysis results for the eks cluster

Current Behaviour

Its showing below error although i have setup kubecontext correclty.

Error: ApiError: the server could not find the requested resource: NotFound (ErrorResponse { status: "Failure", message: "the server could not find the requested resource", reason: "NotFound", code: 404 })

Caused by:
the server could not find the requested resource: NotFound

Code snippet

Command being used:
eksup analyze -c eks-cluster-test -r us-east-1

Possible Solution

No response

Steps to Reproduce

Run command below for any eks cluster
eksup analyze -c eks-cluster-test -r us-east-1

eksup version

latest

Operating system

macOS x86_64

Error output

Error: ApiError: the server could not find the requested resource: NotFound (ErrorResponse { status: "Failure", message: "the server could not find the requested resource", reason: "NotFound", code: 404 })

Caused by:
    the server could not find the requested resource: NotFound

Bug: EKS requires managed nodegroups and Fargate profiles to align with control plane before upgrade

Expected Behaviour

EKS requires nodes created by managed nodegroups and Fargate profiles to align with the control plane version (minor versions to be the same) before it will allow the control plane to upgrade

Current Behaviour

Currently, eksup reports only on results as they related to the Kubernetes version skew support. This means that nodes created by a managed nodegroup or Fargate profile that are 1 minor version behind the control plane version are shown in the results as a recommended remediation (upgrade to align on versions), and not a required. Per EKS requirements, this should be shown as a required remediation since users will not be able to upgrade until they align the node versions to the control plane

Code snippet

N/A

Possible Solution

Any nodes created by managed nodegroups or Fargate profiles should report as required remediation if they do not match the control plane version

Steps to Reproduce

N/A

eksup version

latest

Operating system

macOS x86_64

Error output

No response

Feature request: Add support for `ReplicaSet`

Use case

Add support for ReplicaSet resource that was not created by a Deployment (does not have a ownerReferences)

Solution/User Experience

Report on standalone ReplicaSet resources that are not created by a higher-order resource

Alternative solutions

None

Docs: use MUST/SHOULD/etc. when describing checks

What were you searching in the docs?

K8S001 uses the indicative mood to specify a degraded state:

The version skew between the control plane (API Server) and the data plane (kubelet) violates the Kubernetes version skew policy [...]
There is a version skew between the control plane (API Server) and the data plane (kubelet).

But K8S002 uses it to specify a desired state:

There are at least 3 replicas specified for the resource.

This led to some confusion when I first read about the checks.

Is this related to an existing documentation section?

https://clowdhaus.github.io/eksup/info/checks/

How can we improve?

Use the keywords specified in Best Current Practice 14 and incorporate the phrase specified by RFC 8174 near the top of the page.

Got a suggestion in mind?

I would rewrite K8S001 as follows:

The version skew between the control plane (API Server) and the data plane (kubelet) MUST NOT violate the Kubernetes version skew policy, either currently or after the control plane is upgraded. [Suggestions welcome on the "will violate after upgrade" part, which was difficult to recast.]

And K8S002 as follows:

There MUST be at least 3 replicas specified for the resource.

And version-dependent checks like K8S008 as follows:

With target version < v1.24, Pod volumes SHOULD NOT mount the docker.sock file.
With target version >= v1.24, Pod volumes MUST NOT mount the docker.sock file.

And all other checks accordingly.

Acknowledgment

I understand the final update might be different from my proposed suggestion, or refused.

Feature request: Capability to pass custom kubeconfig path

Use case

A user may want to provide a custom kubeconfig path as opposed to strictly targeting: ~/.kube/config

Solution/User Experience

This could subsequently be used by any end user:

eksup analyze [OPTIONS] --cluster <CLUSTER> --kubeconfig /tmp/123abc-generated-config

The default workflow of:

eksup analyze [OPTIONS] --cluster <CLUSTER> would result in automatic use of: --kubeconfig ~/.kube/config (or rather, default to ~/.kube/config)

Additionally, the user could set the KUBECONFIG environment variable, resulting in the equivalent of --kubeconfig:

KUBECONFIG=/tmp/123abc-generated-config eksup analyze [OPTIONS] --cluster <CLUSTER>

Alternative solutions

No response

Bug: eksup failed to run in EKS v1.26

Expected Behaviour

Successfully generate output for below analyze command.

eksup analyze --cluster $CLUSTER_NAME --region $AWS_REGION --output analysis.txt

Current Behaviour

Failed to list PodSecurityPolicies

$ eksup analyze --cluster $CLUSTER_NAME --region $AWS_REGION --output analysis.txt
Error: Failed to list PodSecurityPolicies

Caused by:
    0: ApiError: the server could not find the requested resource: NotFound (ErrorResponse { status: "Failure", message: "the server could not find the requested resource", reason: "NotFound", code: 404 })
    1: the server could not find the requested resource: NotFound

Code snippet

eksup analyze --cluster $CLUSTER_NAME --region $AWS_REGION --output analysis.txt

Possible Solution

No response

Steps to Reproduce

Prepare an EKS v1.26 cluster;

Run below command:
eksup analyze --cluster $CLUSTER_NAME --region $AWS_REGION --output analysis.txt

eksup version

latest

Operating system

Linux x86_64

Error output

Error: Failed to list PodSecurityPolicies

Caused by:
    0: ApiError: the server could not find the requested resource: NotFound (ErrorResponse { status: "Failure", message: "the server could not find the requested resource", reason: "NotFound", code: 404 })
    1: the server could not find the requested resource: NotFound

Feature request: Cross account ENIs is more complex than just 5 free IPs

Use case

Currently, its required to have at least 5 free IPs for the control plane cross account ENIs to facilitate an upgrade. However, this isn't the full story since the cross account ENIs will be created in at least 2 different availability zones.

Solution/User Experience

Change EKS001 check to ensure there are at least two subnets in different AZs with at least 4 available IPs each for the control plane cross account ENIs - awsdocs/amazon-eks-user-guide#688

Alternative solutions

Use current guidance of 5 free IPs but this is mis-leading (if you only have 5 free IPs, but all in one subnet, the upgrade would fail)

Feature request: Add support for mixed instances policy

Use case

Add support to analyze EKS clusters that use a MixedInstancePolicy.

eksup analyze --cluster foo --region us-west-2 
Error: Launch template not found, launch configuration is not supported

Note: The error message provided about is misleading or limiting. The cluster being scanned does not use launch configurations.

Solution/User Experience

Enable eksup to autodetect a mixed instance policy.

Alternative solutions

Add support to disable ASG checks.

Add code snippets for common cluster deployment tools/frameworks

Use case

With eksctl being the official CLI for EKS, and terraform-aws-eks being a popular method for deploying clusters - it would be helpful to show relevant code snippets for the commands/changes required to make with these tools to facilitate an upgrade

Solution/User Experience

Provide relevant code snippets for eksctl and terraform-aws-eks for performing the upgrade

Alternative solutions

No response

Bug: Handle `unrap()`s gracefully

Expected Behaviour

When running the CLI against clusters of various configurations, the CLI should not panic nor show panic output to users

Current Behaviour

Using unwrap() is throwing panic errors back to the users instead of handling gracefully within the flow of execution, or a more useful feedback message to the user

Code snippet

N/A - remove this from the template

Possible Solution

No response

Steps to Reproduce

N/A

eksup version

latest

Operating system

macOS x86_64

Error output

No response

Bug: Error message when unable to connect to cluster is vague

Expected Behaviour

When running eksup ... and the CLI session is unable to successfully connect to the cluster, typically to expired credentials or missing kubeconfig, the error message returned to notify the user what the specific error was and how to remediate (aws eks update-kubeconfig ..., get AWS credentials, etc.)

Current Behaviour

eksup fails and returns a vague error that exposes the lower level details of the internals of eksup that is not helpful to users

Code snippet

eksup analyze -c <cluster> -r <region> (without a kubeconfig or AWS credentials)

Possible Solution

No response

Steps to Reproduce

See above

eksup version

latest

Operating system

all

Error output

Error: HyperError: error trying to connect: dns error: failed to lookup address information: Name or service not known

Caused by:
    0: error trying to connect: dns error: failed to lookup address information: Name or service not known
    1: dns error: failed to lookup address information: Name or service not known
    2: failed to lookup address information: Name or service not known

[K8S008] Detect docker socket use (1.24+ affected)

Use case

The Dockershim has been removed starting with Kubernetes v1.24 and users who are mounting the docker.sock in their pods will be impacted if they upgrade to v1.24

Solution/User Experience

Report on the use of workloads that mount the docker.sock which requires users to remediate prior to upgrading to v1.24 - https://github.com/aws-containers/kubectl-detector-for-docker-socket

Alternative solutions

None

Bug: Jobs created by CronJob resources should now show up in reported results

Expected Behaviour

When interrogating resources for upgrade readiness, the Job definitions created by CronJobs should be excluded from the results since their definition is covered already by the CronJob spec

Current Behaviour

Currently, the Job specs created by CronJobs are reported in the findings in addition to the CronJob that defines them

Code snippet

N/A

Possible Solution

Filter out Jobs that have an ownerReferences

apiVersion: batch/v1
kind: Job
metadata:
  creationTimestamp: "2023-02-24T17:15:00Z"
  generation: 1
  labels:
    controller-uid: 079d519d-72ba-4a77-a0d7-bb134ea425b0
    job-name: bad-cron-27954315
  name: bad-cron-27954315
  namespace: cronjob
  ownerReferences:
  - apiVersion: batch/v1
    blockOwnerDeletion: true
    controller: true
    kind: CronJob
    name: bad-cron
    uid: 1f1e67a0-198b-4cc1-961b-e2baff5f21b0

Steps to Reproduce

N/A

eksup version

latest

Operating system

macOS x86_64

Error output

N/A

Feature request: New check for `kube-proxy` compatibility

Use case

From the EKS documentation for kube-proxy addon:

Solution/User Experience

Ensure eksup is validating against these requirements and reporting the necessary information back to users

Alternative solutions

None

[AWS003] Report on EC2 service limits

Use case

As a user, I want to understand if I am at risk of encountering EC2 instance service limits that could affect a cluster upgrade (i.e. - unable to launch new instances during the surge, rolling-update process do to an EC2 instance service limit breach)

aws support describe-trusted-advisor-check-result --check-id 0Xc6LMYG8P

Solution/User Experience

Report on EC2 instance service limit and provide feedback on whether changes are recommended or required prior to starting the upgrade process

Alternative solutions

None

Formtat output for stdout when reporting from a CLI interface

Use case

As a user who uses the CLI to analyze their cluster for upgrade readiness, I want to have the results formatted so that I can quickly and easily understand what checks have passed or failed

Solution/User Experience

A tabular format is commonly used in this scenario, provide the number of columns are kept to a minimum to fit within a 120 character width windwo

Alternative solutions

JSON format - this will also be supported but is not as readable as a table and is intended more for machines rather than users

Bug: Error: Launch template not found, launch configuration is not supported

Expected Behaviour

Expected to generate a report for cluster upgrade using

eksup analyze --cluster <cluster-name> --region <region>

Current Behaviour

getting error

Error: Launch template not found, launch configuration is not supported

Code snippet

aws eks update-kubeconfig --name <cluster> --region <region>
eksup analyze --cluster <cluster-name> --region <region>

Possible Solution

No response

Steps to Reproduce

aws eks update-kubeconfig --name --region
eksup analyze --cluster --region

eksup version

latest

Operating system

macOS arm64

Error output

Error: Launch template not found, launch configuration is not supported

[K8S006] Ensure `.spec.containers[*].readinessProbe` is set

Use case

Ensure that .spec.containers[*].readinessProbe is set to provide the appropriate feedback data to the control plane when performing rolling upgrades to minimize the potential for service disruption

Solution/User Experience

Ensure that .spec.containers[*].readinessProbe is set

.spec.containers[*].livenessProbe , if set, is NOT the same as .spec.containers[*].readinessProbe
.spec.containers[*].startupProbe is set if .spec.containers[*].livenessProbe is set

Alternative solutions

None

[K8S010] Migrate In-tree storage plugin to CSI driver

Use case

Starting in Kubernetes v1.17, the in-tree storage plugin was marked as deprecated and it will be removed in EKS v1.23. Users need to install to the EBS CSI driver prior to upgrading to EKS v1.23

The in-tree Amazon EBS storage provisioner is deprecated. If you are upgrading your cluster to version 1.23, then you must first install the Amazon EBS driver before updating your cluster. For more information, see Amazon EBS CSI migration frequently asked questions. If you have pods running on a version 1.22 or earlier cluster, then you must install the Amazon EBS driver before updating your cluster to version 1.23 to avoid service interruption. https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi-migration-faq.html

Solution/User Experience

Report if the cluster does not have the EBS CSI driver installed
Report on the usage of kubernetes.io/aws-ebsStorageClass and recommend users update to the new API resource groups provided by the CSI

Alternative solutions

None

[K8S004] Ensure `podDisruptionBudget` set & at least one of `minAvailable` or `maxUnavailable` is provided

Use case

To ensure services are configured for high-availability to reduce the chance of disruption or downtime during an upgrade, users should have podDisruptionBudget set and at least one of minAvailable or maxUnavailable is provided for each workload construct (Deployment, ReplicaSet, ReplicationController, StatefulSet)

Solution/User Experience

Report on any workload constructs that do not have an associated podDisruptionBudget or if the associated podDisruptionBudget does not have minAvailable or maxUnavailable configured

Alternative solutions

None

CLI progress indicator when running `eksup`

Use case

When running eksup from the CLI, users lack context as to what is happening or how much is left until the results are returned. Its common practice to provide some sort of indication as to the progress of the execution

Solution/User Experience

Add progress indicator https://github.com/console-rs/indicatif for a quality of life improvement

Alternative solutions

No progress indicator