Giter Site home page Giter Site logo

oslokommune / okctl Goto Github PK

View Code? Open in Web Editor NEW
45.0 8.0 7.0 47.6 MB

Opinionated and effortless infrastructure and application management

Home Page: https://okctl.io

License: Other

Makefile 0.31% Go 98.40% CSS 0.08% HTML 0.14% Dockerfile 0.05% Python 0.68% Shell 0.33%
golang kubernetes aws eksctl kubectl okctl helm

okctl's Introduction

Contributor Covenant codecov Security Rating Vulnerabilities Go Report Card Nightly build

okctl - Opinionated and effortless infrastructure and application management

okctl

Installation

To download the latest release, run the command matching your operating system:

# Linux
curl --silent --location "https://github.com/oslokommune/okctl/releases/latest/download/okctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp
sudo mv /tmp/okctl /usr/local/bin

# macOS
brew tap oslokommune/tap
brew install oslokommune/tap/okctl

Getting started

The following is a guide for how to create an environment that contains the elements described in Compare and contrast.

1. Create a new GitHub repository

Go to github.com/oslokommune and create a new private or internal git repository. No credentials are stored in this repository, we want it private as a precaution until we are certain it is safe to have it set as public.

ℹ️ This repository will be used by okctl to store infrastructure-as-code, which means files containing various configuration for your up-and-coming cluster.

Now, run:

# Clone the repository you just made
$ git clone [email protected]:oslokommune/<the new repository>.git
$ cd <the new repository>

2. Create a cluster

A "cluster" is a Kubernetes cluster with many addons and integrations, creating a production grade environment as described in Functionality.

You will soon be running okctl apply cluster, which will ask you for the following information:

  • Username and password: This is your Oslo Kommune AD organization username (e.g., oooXXXXX) and its password.
  • Multi factor token (MFA): The same one you use to login to AWS. If you haven't set up MFA yet, you can do that here.
  • AWS account ID: This identifies which account you want to use. You can see which accounts you have access to just after logging in to AWS:

okctl

# Scaffold a cluster. Format:
okctl scaffold cluster -f cluster.yaml
# <edit cluster.yaml>
okctl apply cluster -f cluster.yaml

Follow the instructions.

When done, verify that you have a working cluster by running

$ okctl venv -c cluster.yaml
$ kubectl get service

The last command should show something like

NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.100.0.1   <none>        443/TCP   1h

Common commands

# Get help for any command
$ okctl --help

# Run a sub shell with environment variables from the above command and a custom command prompt (PS1)
$ okctl venv -c cluster.yaml

# Delete the cluster
$ okctl delete cluster -c cluster.yaml

Functionality

The core cluster is up and running, and we are currently working on building a seamless experience on top of this cluster with integrations for common functionality:

Core cluster

Application lifecycle

  • Postgres for creating and integrating a postgres database with your application
  • Amazon Elastic Container Registry for creating and assisting with the publication of container images for use in the cluster
  • Reference application that demonstrates how to use the cluster and its integrations

Compare and contrast

The intended purpose of okctl is to be an opinionated solver of infrastructure problems, this includes setting up CI/CD pipelines et al.

The following table is not present to determine what tool is better or worse, but rather how these tools compare to okctl and the problems we are focused on solving.

okctl eksctl kubectl serverless.tf
Defines a CI/CD scheme
Creates a Kubernetes cluster
Facilitates with application creation
Integrates with Github (actions, packages, oauth)
Sets up monitoring

Inspiration

We have begged, borrowed and stolen various ideas from the following CLIs:

okctl's People

Contributors

bsek avatar deifyed avatar dependabot[bot] avatar eide avatar endremm avatar frankorigo avatar fredriv avatar haavardeide avatar ivaruf avatar kielo87 avatar kmoberg avatar olovholm avatar paulbes avatar yngvark avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

okctl's Issues

[BUG]okctl create application: overwrites existing application.yaml

Describe the bug
If I have a existing application.yaml file in ./´ and run okctl create application dev` then the checked-in application.yaml file is updated and replaced with boilerplate application.yaml content

To Reproduce
See description

Expected behavior
If a application.yaml file exists I would either expect to get a prompt asking me to confirm that okctl can overwrite the content of the file, or that okctl bails out with a error message

could also be fixed with additional output parameter: okctl create application dev --out application-api.yaml

Additional context
If I have many applications in a git repo: what should be the best practice of storing application.yaml? In the root folder of the git repo or within the infrastructure/base/application/my-application-name

upgrade.sh: Om fargate poddene kjører en versjon bak den man oppgraderer fra, feiler oppgradering

Eks: Fargate poddene kjører på 1.19, mens EKS skal oppgraderes fra 1.20 til 1.21

$ ./upgrade.sh ... 1.21 ...
...
cb-91f7801fe5fe, InvalidRequestException: Kubelet version of Fargate pods must be updated to match cluster version 1.20 before updating cluster version; Please recycle all offending pod replicas

Command failed with error code 1: /tmp/eks-upgrade/1-21/eksctl upgrade cluster --name julius --version 1.21 --approve
Aborting.

$ k get node -o wide

...
fargate-ip-192-168-4-216.eu-west-1.compute.internal   Ready    <none>   6d5h   v1.19.16-eks-6ae7ca2   192.168.4.216   <none>        Amazon Linux 2   4.14.287-215.504.amzn2.x86_64   containerd://1.4.13
...

Workaround
Slett podder med 1.19:

k delete pod -A -l 'eks.amazonaws.com/fargate-profile=fp-default'

Finne ut av OOM issue for barnehagepris

Barnehagepris opplevde at en av applikasjonene sine svarte 503. Etter litt undersøkelse fant vi ut at den ene noden ble drept grunnet negative helsesjekker, og helsesjekkene var negative fordi noden hadde gått tom for minne. Etter alt for lang tid løste problemet seg selv. Antall noder hadde blitt skalert opp og syk node hadde blitt byttet ut.

Slack tråd
Tidslinje
OOM reproduksjonseksperimenter


Vi ble enige om følgende ref:

  1. Vente på k8s upgrade for å få riktig konfigurasjon av autoscale groups
  2. Ta i bruk overprovisionering for at Loki kjappere skal komme på igjen
  3. Når Loki kjapt kommer på igjen, eksperimentere med konfig

chunk_retain_period
http read/write timeout // parallelism
Overprovisioning
Cluster overprovisioning in Kubernetes)
Kubernetes Cluster Over-Provisioning: Proactive App Scaling


Checklist

  • Komme med anbefaling etter reproduksjon
  • Researche og vurdere justering av thresholds for oppskalering
  • Undersøke OOM problem med Loki
  • Informer teamet om avgjørelse i møtet

[FEATURE]Tagging all resources with okctl version

Is your feature request related to a problem? Please describe.
When looking at resources in aws I don't know which version of okctl that executed when the resource was created. I can see that when debugging, or trying to recreate a issue it would be benefitial to have a tag with the okctl version that created the cloudformation template.

Describe the solution you'd like
The resources that are possible to tag have the okctl version that executed the command. This can then also be used for possible upgradepaths, or to stop execution based on very old tags.

Container image repository lifecycle management

This issue follows from a discussion with @yngvark and @deifyed, where we were trying to agree on what approach to take. There is probably missing some details towards the end.

With okctl, the aim is to be opinionated and assist the user whenever possible. However, we want to do so without overloading the user cognitively, i.e., provide them with as few and manageable options as possible. The question we are debating is to what degree this statement is true:

As a user, I want okctl to assist with the lifecycle management of container repositories

Container image repository lifecycle management

Creating an AWS Elastic Container Registry repository and making it available for use includes the set of tasks listed below. As always, getting these settings right may be non-trivial, depending on the experience of the users.

Amazon Elastic Container Registry (Amazon ECR) is an AWS-managed container image registry service that is secure, scalable, and reliable. Amazon ECR supports private container image repositories with resource-based permissions using AWS IAM. This is so that specified users or Amazon EC2 instances can access your container repositories and images. You can use your preferred CLI to push, pull, and manage Docker images, Open Container Initiative (OCI) images, and OCI compatible artifacts.

  1. Create a private repository

When creating a repository, the recommended configuration is as follows:

AppRepository: 
  Type: AWS::ECR::Repository
  Properties: 
    RepositoryName: "app/repository"
    ImageScanningConfiguration: 
      ScanOnPush: "true"
   ImageTagMutability: "IMMUTABLE"

Image tags should be immutable, so it is easier to rollback in case of a bad deployment. Mutable tags are also a security risk. Finally, when working with declarative deployments, as we do with ArgoCD this will not work.

We also want to enable scanning the images for vulnerabilities on each push. It is an easy way to ensure that we do not deploy container images with known security vulnerabilities.

We probably want to add a repository lifecycle policy as well, where we keep the last N images (sorted by age, keep the newest).

  1. Update the IAM policy of a service user

We take ownership of a service user; currently, we have requested creating an okctl-service-user in all AWS accounts. This service user is reserved for use with okctl. We try to follow the pattern of least privileges, particularly when we export AWS access credentials to an external service; Github in this particular case. Therefore this user should be limited to writing images only to the ECR repositories we have created. This means that when we add an ECR repository to an AWS account, we also need to update the service user's IAM policy to reflect this change.

  1. Push an image to the ECR repository

At some point in time, we will build our first container image. We need to push this image to ECR, which means we need to fetch docker login credentials compatible with ECR, tag the image with the ECR URL, and upload the built image.

  1. Retrieve the image details

After the image has been pushed to ECR, we need to retrieve the image details, though we should probably return an error if the security scanning failed. If the scan didn't fail, we could return the complete image URL with the sha256 digest.

  1. Update the deployment with the repository/app@digest

Finally, we can update our deployment, probably with a kustomize patch or similar.

  1. Deploy our new image

This we do by pushing the image to git and then let ArgoCD deploy the new image.

Integrating lifecycle management with okctl

Should we assist with the lifecycle management of container image repositories? If the task is sufficiently complicated for the end-user, then yes, we probably should. It follows the mantra of difficult for us, easy for them. Given some non-obvious pitfalls, e.g., image tag mutability, image scan result, IAM access policy, etc., we should take it upon ourselves to create mechanisms that alleviate the load on the end-user.

How should we assist?

Here comes the difficult part, though we have a mechanism that we believe in: the declarative approach.

Extending the cluster.yaml

We start by declaring the repositories in our cluster file:

metadata:
  name: myCluster
repositories:
- myapp/frontend
- myapp/backend
- myapp/backend-migrations

Each repository listed above will expand to the following pattern when referencing a particular image:

aws_account_id.dkr.ecr.region.amazonaws.com/myapp/backend@sha256:eda364...FAKE

In some way, we want to reference these image repositories in our Deployment, ReplicaSets or Pods, could be as easy as using the name directly:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend-deployment
  labels:
    app: backend
spec:
  replicas: 3
  selector:
    matchLabels:
      app: backend
  template:
    metadata:
      labels:
        app: backend
    spec:
      containers:
      - name: backend
        image: myapp/backend # <- just like so
        ports:
        - containerPort: 80

We can then expand these to their full URL, with the digest, by parsing the YAML files, searching for the repositories in question, and building a kustomize patch that replaces the image.

We could do this with a command:

$ okctl deploy ...

Alternatively, we could return some basic information about the image location:

$ okctl show repository ENV REPOSITORY_NAME

delete cluster feiler pga RDS CloudFormation stack ikke vil slettes

Det virker som nightly feiler fordi den ikke får slettet RDS CFN stack. Stacken klager over at RDSOutgoing security groupen ikke kan slettes fordi den må disassosieres fra et network interface først. Klarer ikke helt å se at dette er noe vi har forårsaket

Yngvar 7 Jul 2022

Antakelig får dette også nightly til å feile


Håvard 2 Sep 2022

CloudFormation stack fails: https://eu-west-1.console.aws.amazon.com/cloudformation/home?region=eu-west-1#/stacks/stackinfo?filteringStatus=active&filteringText=&viewNested=true&hideStacks=false&stackId=arn%3Aaws%3Acloudformation%3Aeu-west-1%3A853850742759%3Astack%2Fokctl-rdspostgres-okctl-nightly-nightlydb%2Fcae17850-277e-11ed-9f5d-02d05c00a183

resource sg-052cf453a28849f96 has a dependent object

pkg/cfn/components/securitygroup/securitygroup.go - func NewPostgresOutgoing() is created but dependent object looks like the incoming group: NewPostgresIncoming

Adopt the XDG Base Directory Specification

Background

The XDG base directory specification defines a set of environment variables for controlling where configuration and cache files should be written to.

This makes is possible for the end users to better control where the okctl artefacts are written to. This is useful for ensuring that important configuration files are backed up automatically, while cached binaries are not.

Details

$XDG_DATA_HOME

Defines the base directory relative to which user specific data files should be stored. If $XDG_DATA_HOME is either not set or empty, a default equal to $HOME/.local/share should be used.

$XDG_CONFIG_HOME

Defines the base directory relative to which user specific configuration files should be stored. If $XDG_CONFIG_HOME is either not set or empty, a default equal to $HOME/.config should be used.

Fjern TLS 1.0 og 1.1 fra Grafana og ArgoCD

Det er en kjapp endring å fjerne TLS 1.0/1.1 fra nye clustre, se #1004

For å spare tid lar vi være å lage upgrade for dette. Det kan løses ved å reinstallere grafana og argocd (sette til false i cluster manifest, og kjøre okctl aply cluster, deretter true).

Add support for saving secrets in a local keyring

Background

Afraid you won't have anything fun to do this weekend? Well, here is a neat issue to keep you occupied.

We want to be able to store secrets in a local keyring, such as macOS's keychain, Pass, etc. The primary purpose is to store a user's AWS password in that keyring so that they don't have to enter it for every new session.

Details

The most viable library for implementing this feature appears to be: https://github.com/99designs/keyring.

  • Determine if the user has a compatible keyring on their system
  • Make it possible to store their choice in the application config
  • Expand the interactive application configurator and ask if they wish to save their password in the keyring

[FEATURE] Add support for additional prometheus scrape config

Due to our use of envoy proxies in our cluster we need some custom scrape config in prometheus as the proxies don't get scraped by default.

We have currently worked around this issue by applying the changes described here:
https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/additional-scrape-config.md to the CRD provided by okctl.

But it would be nice if the block:

additionalScrapeConfigs: name: additional-scrape-configs key: prometheus-additional.yaml

was already included in the prometheus CRD when we configure a new cluster.

[BUG]Bad feedback to user when trying to apply application to non-existing environment

Describe the bug
When trying to apply a application to a non-existing environment you get a error message that is cryptic: Error: failed to authenticate with aws: no valid credentials: authenticator[0]: failed to populate required fields: AwsAccountID: cannot be blank

To Reproduce
Only have dev environment set up and try to apply to prod: okctl apply application prod -f application.yaml

Expected behavior
A error message with a clearer text: Cannot apply application to non-existing environment

[BUG]Better output message if gpg fails

Describe the bug
Running okctl results in the output gpg: dekryptering mislyktes: No secret key without describing what this is and how to fix it.

To Reproduce
Running any okctl command will result in this output if nothing is set up (and there is no documentation as far as I can see on how I should set up pass for this to work)

Bytte fra Ingress ingress networkingv1beta1 til v1

Ikke mulig å lage ingresser med networkingv1beta1 fra og med 1.22.

https://kubernetes.io/docs/reference/using-api/deprecation-guide/

Denne featuren må inn når Okctl skal støtte EKS 1.22. Ikke før. Fordi den medfører også at ALB controller må bumpes, som har en viss risiko at noe feiler. Og pga fristen på 1.21 end of support fra AWS nærmer seg, kan vi ikke risikere at noe feiler, vi må ha en lett og feilfri oppgradering til EKS 1.21.

Yngvar 18 Aug 2022

Mulig det er avhengighet til Bumpe AWS Load Balancer controller - så den må gjøres sammen med denne tasken. Pga issues med pathType:

Branch: 22Q2-44-use_ingress_v1

Research upgrade failing due to SecurityGroup used by old node

Description

When attempting to upgrade EKS to 1.21 recently, the following happened:

% ./upgrade.sh cluster-dev.yaml eu-west-1 1.21 | tee "logs/eks-upgrade-1-21-$(date +"%Y-%m-%dx%H-%M-%S").log"


------------------------------------------------------------------------------------------------------------------------
Verify AWS account
------------------------------------------------------------------------------------------------------------------------

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Running command [2022-09-15 10:01:25]: aws sts get-caller-identity
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

------------------------------------------------------------------------------------------------------------------------
Download dependencies to /tmp/eks-upgrade/1-21
------------------------------------------------------------------------------------------------------------------------
Running: curl --location  https://github.com/weaveworks/eksctl/releases/download/v0.104.0/eksctl_Darwin_amd64.tar.gz | tar xz -C  /tmp/eks-upgrade/1-21
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 29.0M  100 29.0M    0     0  15.0M      0  0:00:01  0:00:01 --:--:-- 18.0M
Running: curl --location  https://dl.k8s.io/release/v1.21.14/bin/darwin/amd64/kubectl  -o  /tmp/eks-upgrade/1-21/kubectl
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   138  100   138    0     0    775      0 --:--:-- --:--:-- --:--:--   797
100 50.4M  100 50.4M    0     0  14.6M      0  0:00:03  0:00:03 --:--:-- 17.1M

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Running command [2022-09-15 10:01:31]: chmod +x /tmp/eks-upgrade/1-21/eksctl
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~


~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Running command [2022-09-15 10:01:31]: chmod +x /tmp/eks-upgrade/1-21/kubectl
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~


~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Running command [2022-09-15 10:01:31]: /tmp/eks-upgrade/1-21/eksctl version -o json
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Running command [2022-09-15 10:01:32]: /tmp/eks-upgrade/1-21/kubectl version --client=true --output=yaml
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

------------------------------------------------------------------------------------------------------------------------
Verify cluster name
------------------------------------------------------------------------------------------------------------------------

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Running command [2022-09-15 10:01:32]: /tmp/eks-upgrade/1-21/eksctl get cluster xxxxxxxxxxxxxx-dev
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
NAME        VERSION STATUS  CREATED         VPC         SUBNETS                                         SECURITYGROUPS      PROVIDER
xxxxxxxxxxxxxx-dev 1.20    ACTIVE  2021-05-27T07:10:47Z    vpc-0ba89374891c80fb8   subnet-00b77c17380b708e2,subnet-042d70465122a5da6,subnet-04b69ed7310e36114,subnet-06359b07b96a5ba3a,subnet-06a4cf61b0f9c9daa,subnet-06a9a2365a2566328   sg-0d45570581c5492c9    EKS

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~


------------------------------------------------------------------------------------------------------------------------
Do these variables look okay?
------------------------------------------------------------------------------------------------------------------------
Upgrading EKS to version: 1.21
Cluster manifest: cluster-dev.yaml
Cluster name: xxxxxxxxx-dev
AWS account: xxxxxxxxxxxxxxxx
AWS region: eu-west-1
Dry run: true

Do these variables look okay? (Y/n) YY


------------------------------------------------------------------------------------------------------------------------
Run upgrade of EKS control plane. Estimated time: 10-15 min.
------------------------------------------------------------------------------------------------------------------------
💡 Tip: You can go to EKS in AWS console to see the status is set to 'Updating'.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Running command [2022-09-15 10:01:35]: /tmp/eks-upgrade/1-21/eksctl upgrade cluster --name xxxxxxxxxxxxxx-dev --version 1.21
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
2022-09-15 10:01:38 [ℹ]  (plan) would upgrade cluster "xxxxxxxxxxxxxx-dev" control plane from current version "1.20" to "1.21"
2022-09-15 10:01:41 [ℹ]  re-building cluster stack "eksctl-xxxxxxxxxxxxxx-dev-cluster"
2022-09-15 10:01:41 [✔]  all resources in cluster stack "eksctl-xxxxxxxxxxxxxx-dev-cluster" are up-to-date
2022-09-15 10:01:44 [!]  stack's status of nodegroup named eksctl-xxxxxxxxxxxxxx-dev-nodegroup-ng-generic is DELETE_FAILED
2022-09-15 10:01:44 [ℹ]  checking security group configuration for all nodegroups
2022-09-15 10:01:44 [ℹ]  all nodegroups have up-to-date cloudformation templates
2022-09-15 10:01:44 [!]  no changes were applied, run again with '--approve' to apply the changes

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~


------------------------------------------------------------------------------------------------------------------------
Replacing node groups, step 1 of 4: Create configuration for new node groups.
------------------------------------------------------------------------------------------------------------------------
2022-09-15 10:01:47 [!]  stack's status of nodegroup named eksctl-xxxxxxxxxxxxxx-dev-nodegroup-ng-generic is DELETE_FAILED

Cause

eksctl was unable to delete the stack used to create the nodegroups for EKS 1.20. The reason is that the stack
refers to security groups that are being used. The security groups in question are RDSPostgresIncoming and
RDSPostgresOutgoing.

Comments

My first thought was that, either we must deattach security groups before attempting to upgrade, and reattach
them afterwards, or we should research if we should be attaching the postgres security groups to something
else in EKS than the nodegroups that is more stable, and do not need deattaching when upgradring.

However, there are yet some things I don't understand with this issue:

  • Why did the execution of the upgrade script exit with an error when running it for this team, but it succeeded
    with the same message for another team?
    • But it probaly doesn't matter. The other team also still have the same CloudFormation stack in state DELETE_FAILED. But everything seems to work fine.
  • Why was this not a problem when upgrading from 1.19 to 1.20? The nodegroups should have been deleted at that time as well.

To do

  • Research what to do about this.
    • Do we need to configure how Security groups are setup? In that case it means to adjust okctl-code and create an okctl-upgrade for it to fix existing setups - or perhaps cleanup manually is faster and just as safe.

Kjøre eksctl create iamidentitymapping på cluster.yaml:users for å sikre tilgang til cluster

Behov: Som bruker ønsker man å kunne lage et cluster deklarativt uten å måtte kjøre masse ekstrakommandoer. Det er samme type behov som https://trello.com/c/4qGHeGZs/3-sette-disabletcpearlydemux-i-apply-cluster.

Denne tasken går ut på å gjøre det sånn at man slipper å kjøre det som står under https://www.okctl.io/authenticating-to-aws/#aws-single-sign-on-sso -> "Allow SSO logins to cluster".

Et alternativ til å bruke users-lista i cluster.yaml, er å gi tilgang til rollen slik som dokumentasjonen over beskriver. Da får alle i teamet full aksess til EKS. Det kan være greit nok.

Støtte EKS 1.23 i Okctl

Okctl må støtte gitt versjon av EKS.

Dette er en oversikts-task, man kan delegeere til nye, mindre tasks etter behov.

Bakgrunn

Amazon EKS Kubernetes versions - Amazon EKS
Updating an Amazon EKS cluster Kubernetes version - Amazon EKS
Amazon EKS platform versions - Amazon EKS
Deprecated API Migration Guide

Bumpe EKS versjon

1.23 Spesifikt

  • Melding i EKS panelet: The Container Storage Interface (CSI) migration feature offloads management operations of persistent volumes provisioned with the in-tree EBS storage plugin to the Amazon EBS CSI driver. This feature is enabled by default in Amazon EKS version 1.23 and later. If you are using EBS volumes in your cluster, then you must install the Amazon EBS CSI driver before updating your cluster to version 1.23 to avoid interruptions to your workloads. - https://docs.aws.amazon.com/eks/latest/userguide/platform-versions.html

Diverse

Support new region: eu-north-1

Is your feature request related to a problem? Please describe.
Our team need to be able to deploy our application to eu-north-1 (Stockholm), currently only Ireland and Frankfurt are supported.

Additional context
We rely on a service that is connected to eu-north-1, and as such need to be able to deploy our cluster there

[FEATURE] Consider replacing AWSALBIngressController with AWSLoadBalancerController

We would like to use aws-load-balancer-type: "nlb-ip" which enables us to create network load balancers for use with pods running on Fargate nodes. This functionality is available in AWSLoadBalancerController but not in AWSALBIngressController.

As the new AWSLoadBalancerController is backwards compatible with the existing AWSALBIngressController, this should in theory not create any problems with the existing infrastructure but works as a simple replacement.

[FEATURE]Documentation: how to delete everything if things goes really wrong

Is your feature request related to a problem? Please describe.
If okctl delete cluster dev fails there are no guidelines on how to get unstuck from various cloudformation states

Describe the solution you'd like
Documentation on what to delete manually, in which order, and how to get unstuck from possible problems that might arise when using okctl when setting up a cluster or a application. I got a recipe on slack from Julius, but this should be readily available for everyone

Støtte EKS 1.22 i Okctl

Okctl må støtte gitt versjon av EKS.

Dette er en oversikts-task, man kan delegeere til nye, mindre tasks etter behov.

Bakgrunn

Amazon EKS Kubernetes versions - Amazon EKS
Updating an Amazon EKS cluster Kubernetes version - Amazon EKS
Amazon EKS platform versions - Amazon EKS
Deprecated API Migration Guide


Bumpe EKS versjon


1.22 Spesifikt


1.22 utgåtte ressurser

  • Ingress (brukes i apply application. Andre steder?) - Dekkes av bytte-fra-ingress-ingress-networkingv1beta1-til-v1
  • PodSecurityPolicy? Sjekk om vi har det noe steder.
  • Sjekk lenkene i description for om det er noen andre.

Lage upgrades

  • AWS load balancer controller 2.4.1 eller nyere
  • External secrets controller
  • CSIDriver / storage driver
  • aws-iam-authenticator (hvis vi trenger det, se over)

Diverse


Kommentar fra Yngvar 15 Aug 2022

Noen utgåtte ressurstyper:

. applying cluster: reconciling persistent storage

warnings.go:70] storage.k8s.io/v1beta1 CSIDriver is deprecated in v1.19+, unavailable in v1.22+; use storage.k8s.io/v1 CSIDriver


Noen utgåtte ressurstyper, fra apply cluster. Antakeligvis vil det det å bumpe load balancer controller fikse dette.

.. applying cluster: reconciling AWS Load Balancer controller

warnings.go:70] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition

warnings.go:70] admissionregistration.k8s.io/v1beta1 MutatingWebhookConfiguration is deprecated in v1.16+, unavailable in v1.22+; use admissionregistration.k8s.io/v1 MutatingWebhookConfiguration

warnings.go:70] admissionregistration.k8s.io/v1beta1 ValidatingWebhookConfiguration is deprecated in v1.16+, unavailable in v1.22+; use admissionregistration.k8s.io/v1


Noen utgåtte ressurstyper, fra delete cluster:

deleting cluster: reconciling secrets controller

warnings.go:70] rbac.authorization.k8s.io/v1beta1 ClusterRoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRoleBinding

warnings.go:70] rbac.authorization.k8s.io/v1beta1 ClusterRoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRoleBinding

deleting cluster: reconciling secrets controller

warnings.go:70] rbac.authorization.k8s.io/v1beta1 ClusterRole is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRole


Yngvar Kristiansen 15 Aug at 14:59
Sjekk når podsecuritypolicy som brukes av bl.a. loki blir deprekert og når det slutter å funke

[FEATURE] Documentation: argocd & docker images

Is your feature request related to a problem? Please describe.
Coming into okctl for the first time, and to be able to get up and running quickly with argocd, there should be a best practice/introduction for setting up docker images and secrets.

Describe the solution you'd like
A step by step describing how to use gihub or aws to store docker images, and how to set up keys and include them in the application yaml file so that argocd can get access to them.

[FEATURE]Better documentation - visualize the cluster & application setup

Is your feature request related to a problem? Please describe.
After running okctl and setting up a cluser & application I have no easy overview of what okctl is doing behind the scene when setting up clusters and applications

Describe the solution you'd like
A visual overview of what is being deploy: network, groups, connections, clusters etc. so that it is easy to get new people to understand the infrastructure being set up with okctl. Each version or major upgrade should have a corresponding visual representation of the infrastructure

Add encryption to nodegroup node volumes

Finn ut av om vi egentlig trenger å gjøre dette. Er den viktig nok til å bruke tid på?

Teori, kan kanskje løses av å finne fram til riktig CF template, og sett en encryption: true eller noe sånt

Trengs research:

Hva er konsekvensen av å gjøre dette?
Hva er konsekvensen av å ikke gjøre dette?

Hvor mye jobb er det å oppgradere?
Kan teamene gjøre oppgraderingen selv?

Hvis fixen er så enkel som vi tror, så kan det implementeres i okctl

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.