oslokommune / okctl Goto Github PK

View Code? Open in Web Editor NEW

45.0 8.0 7.0 47.6 MB

Opinionated and effortless infrastructure and application management

Home Page: https://okctl.io

License: Other

Makefile 0.31% Go 98.40% CSS 0.08% HTML 0.14% Dockerfile 0.05% Python 0.68% Shell 0.33%

golang kubernetes aws eksctl kubectl okctl helm

okctl's Introduction

`okctl` - Opinionated and effortless infrastructure and application management

Installation

To download the latest release, run the command matching your operating system:

# Linux
curl --silent --location "https://github.com/oslokommune/okctl/releases/latest/download/okctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp
sudo mv /tmp/okctl /usr/local/bin

# macOS
brew tap oslokommune/tap
brew install oslokommune/tap/okctl

Getting started

The following is a guide for how to create an environment that contains the elements described in Compare and contrast.

1. Create a new GitHub repository

Go to github.com/oslokommune and create a new private or internal git repository. No credentials are stored in this repository, we want it private as a precaution until we are certain it is safe to have it set as public.

ℹ️ This repository will be used by okctl to store infrastructure-as-code, which means files containing various configuration for your up-and-coming cluster.

Now, run:

# Clone the repository you just made
$ git clone [email protected]:oslokommune/<the new repository>.git
$ cd <the new repository>

2. Create a cluster

A "cluster" is a Kubernetes cluster with many addons and integrations, creating a production grade environment as described in Functionality.

You will soon be running okctl apply cluster, which will ask you for the following information:

Username and password: This is your Oslo Kommune AD organization username (e.g., oooXXXXX) and its password.
Multi factor token (MFA): The same one you use to login to AWS. If you haven't set up MFA yet, you can do that here.
AWS account ID: This identifies which account you want to use. You can see which accounts you have access to just after logging in to AWS:

# Scaffold a cluster. Format:
okctl scaffold cluster -f cluster.yaml
# <edit cluster.yaml>
okctl apply cluster -f cluster.yaml

Follow the instructions.

When done, verify that you have a working cluster by running

$ okctl venv -c cluster.yaml
$ kubectl get service

The last command should show something like

NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.100.0.1   <none>        443/TCP   1h

Common commands

# Get help for any command
$ okctl --help

# Run a sub shell with environment variables from the above command and a custom command prompt (PS1)
$ okctl venv -c cluster.yaml

# Delete the cluster
$ okctl delete cluster -c cluster.yaml

Functionality

The core cluster is up and running, and we are currently working on building a seamless experience on top of this cluster with integrations for common functionality:

Core cluster

Application lifecycle

Postgres for creating and integrating a postgres database with your application
Amazon Elastic Container Registry for creating and assisting with the publication of container images for use in the cluster
Reference application that demonstrates how to use the cluster and its integrations

Compare and contrast

The intended purpose of okctl is to be an opinionated solver of infrastructure problems, this includes setting up CI/CD pipelines et al.

The following table is not present to determine what tool is better or worse, but rather how these tools compare to okctl and the problems we are focused on solving.

	okctl	eksctl	kubectl	serverless.tf
Defines a CI/CD scheme	✅	❌	❌	❌
Creates a Kubernetes cluster	✅	✅	❌	❌
Facilitates with application creation	✅	❌	❌	✅
Integrates with Github (actions, packages, oauth)	✅	❌	❌	❌
Sets up monitoring	✅	❌	❌	❌

Inspiration

We have begged, borrowed and stolen various ideas from the following CLIs:

okctl's People

Contributors

Stargazers

Watchers

Forkers

ivaruf bsek eide morinangel paulbes olovholm kmoberg

okctl's Issues

[BUG]okctl create application: overwrites existing application.yaml

Describe the bug
If I have a existing application.yaml file in ./´ and run okctl create application dev` then the checked-in application.yaml file is updated and replaced with boilerplate application.yaml content

To Reproduce
See description

Expected behavior
If a application.yaml file exists I would either expect to get a prompt asking me to confirm that okctl can overwrite the content of the file, or that okctl bails out with a error message

could also be fixed with additional output parameter: okctl create application dev --out application-api.yaml

Additional context
If I have many applications in a git repo: what should be the best practice of storing application.yaml? In the root folder of the git repo or within the infrastructure/base/application/my-application-name

upgrade.sh: Om fargate poddene kjører en versjon bak den man oppgraderer fra, feiler oppgradering

Eks: Fargate poddene kjører på 1.19, mens EKS skal oppgraderes fra 1.20 til 1.21

$ ./upgrade.sh ... 1.21 ...
...
cb-91f7801fe5fe, InvalidRequestException: Kubelet version of Fargate pods must be updated to match cluster version 1.20 before updating cluster version; Please recycle all offending pod replicas

Command failed with error code 1: /tmp/eks-upgrade/1-21/eksctl upgrade cluster --name julius --version 1.21 --approve
Aborting.

$ k get node -o wide

...
fargate-ip-192-168-4-216.eu-west-1.compute.internal   Ready    <none>   6d5h   v1.19.16-eks-6ae7ca2   192.168.4.216   <none>        Amazon Linux 2   4.14.287-215.504.amzn2.x86_64   containerd://1.4.13
...

Workaround
Slett podder med 1.19:

k delete pod -A -l 'eks.amazonaws.com/fargate-profile=fp-default'

Undersøk dependabot alerts

https://github.com/oslokommune/okctl/security/dependabot

Vi har en high pga aws-iam-authenticator.

Det ser ut som den high-en er patchet automatisk av AWS, men dobbeltsjekk dette. AWS IAM Authenticator for Kubernetes AccessKeyID Validation Bypass | cloudvulndb.org

Sette DISABLE_TCP_EARLY_DEMUX i apply cluster

Diskutert i møte 29 Aug 2022

Denne er billig nok til å kunne gjøres i denne perioden, og er forventet funksjonalitet fra teamenes side.

Fjern TLS 1.0 og 1.1 for ALB-er fra okctl apply application

Anbefalt tiltak fra EY.

Denne oppgaven går ut på å legge til dette i default oppsettet fra apply application.

Oppgraderingspath er å kjøre okctl apply application på nytt. Eller bare legge på

alb.ingress.kubernetes.io/ssl-policy: "ELBSecurityPolicy-TLS-1-2-2017-01"

i appens ingress.

Linker:

https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-security-policy-table.html

Finne ut av OOM issue for barnehagepris

Barnehagepris opplevde at en av applikasjonene sine svarte 503. Etter litt undersøkelse fant vi ut at den ene noden ble drept grunnet negative helsesjekker, og helsesjekkene var negative fordi noden hadde gått tom for minne. Etter alt for lang tid løste problemet seg selv. Antall noder hadde blitt skalert opp og syk node hadde blitt byttet ut.

Slack tråd
Tidslinje
OOM reproduksjonseksperimenter

Vi ble enige om følgende ref:

Vente på k8s upgrade for å få riktig konfigurasjon av autoscale groups
Ta i bruk overprovisionering for at Loki kjappere skal komme på igjen
Når Loki kjapt kommer på igjen, eksperimentere med konfig

chunk_retain_period
http read/write timeout // parallelism
Overprovisioning
Cluster overprovisioning in Kubernetes)
Kubernetes Cluster Over-Provisioning: Proactive App Scaling

Checklist

Komme med anbefaling etter reproduksjon
Researche og vurdere justering av thresholds for oppskalering
Undersøke OOM problem med Loki
Informer teamet om avgjørelse i møtet

[FEATURE]Tagging all resources with okctl version

Is your feature request related to a problem? Please describe.
When looking at resources in aws I don't know which version of okctl that executed when the resource was created. I can see that when debugging, or trying to recreate a issue it would be benefitial to have a tag with the okctl version that created the cloudformation template.

Describe the solution you'd like
The resources that are possible to tag have the okctl version that executed the command. This can then also be used for possible upgradepaths, or to stop execution based on very old tags.

Container image repository lifecycle management

This issue follows from a discussion with @yngvark and @deifyed, where we were trying to agree on what approach to take. There is probably missing some details towards the end.

With okctl, the aim is to be opinionated and assist the user whenever possible. However, we want to do so without overloading the user cognitively, i.e., provide them with as few and manageable options as possible. The question we are debating is to what degree this statement is true:

As a user, I want okctl to assist with the lifecycle management of container repositories

Container image repository lifecycle management

Creating an AWS Elastic Container Registry repository and making it available for use includes the set of tasks listed below. As always, getting these settings right may be non-trivial, depending on the experience of the users.

Amazon Elastic Container Registry (Amazon ECR) is an AWS-managed container image registry service that is secure, scalable, and reliable. Amazon ECR supports private container image repositories with resource-based permissions using AWS IAM. This is so that specified users or Amazon EC2 instances can access your container repositories and images. You can use your preferred CLI to push, pull, and manage Docker images, Open Container Initiative (OCI) images, and OCI compatible artifacts.

Create a private repository

When creating a repository, the recommended configuration is as follows:

AppRepository: 
  Type: AWS::ECR::Repository
  Properties: 
    RepositoryName: "app/repository"
    ImageScanningConfiguration: 
      ScanOnPush: "true"
   ImageTagMutability: "IMMUTABLE"

Image tags should be immutable, so it is easier to rollback in case of a bad deployment. Mutable tags are also a security risk. Finally, when working with declarative deployments, as we do with ArgoCD this will not work.

We also want to enable scanning the images for vulnerabilities on each push. It is an easy way to ensure that we do not deploy container images with known security vulnerabilities.

We probably want to add a repository lifecycle policy as well, where we keep the last N images (sorted by age, keep the newest).

Update the IAM policy of a service user

We take ownership of a service user; currently, we have requested creating an okctl-service-user in all AWS accounts. This service user is reserved for use with okctl. We try to follow the pattern of least privileges, particularly when we export AWS access credentials to an external service; Github in this particular case. Therefore this user should be limited to writing images only to the ECR repositories we have created. This means that when we add an ECR repository to an AWS account, we also need to update the service user's IAM policy to reflect this change.

Push an image to the ECR repository

At some point in time, we will build our first container image. We need to push this image to ECR, which means we need to fetch docker login credentials compatible with ECR, tag the image with the ECR URL, and upload the built image.

Retrieve the image details

After the image has been pushed to ECR, we need to retrieve the image details, though we should probably return an error if the security scanning failed. If the scan didn't fail, we could return the complete image URL with the sha256 digest.

Update the deployment with the repository/app@digest

Finally, we can update our deployment, probably with a kustomize patch or similar.

Deploy our new image

This we do by pushing the image to git and then let ArgoCD deploy the new image.

Integrating lifecycle management with okctl

Should we assist with the lifecycle management of container image repositories? If the task is sufficiently complicated for the end-user, then yes, we probably should. It follows the mantra of difficult for us, easy for them. Given some non-obvious pitfalls, e.g., image tag mutability, image scan result, IAM access policy, etc., we should take it upon ourselves to create mechanisms that alleviate the load on the end-user.

How should we assist?

Here comes the difficult part, though we have a mechanism that we believe in: the declarative approach.

Extending the `cluster.yaml`

We start by declaring the repositories in our cluster file:

metadata:
  name: myCluster
repositories:
- myapp/frontend
- myapp/backend
- myapp/backend-migrations

Each repository listed above will expand to the following pattern when referencing a particular image:

aws_account_id.dkr.ecr.region.amazonaws.com/myapp/backend@sha256:eda364...FAKE

In some way, we want to reference these image repositories in our Deployment, ReplicaSets or Pods, could be as easy as using the name directly:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend-deployment
  labels:
    app: backend
spec:
  replicas: 3
  selector:
    matchLabels:
      app: backend
  template:
    metadata:
      labels:
        app: backend
    spec:
      containers:
      - name: backend
        image: myapp/backend # <- just like so
        ports:
        - containerPort: 80

We can then expand these to their full URL, with the digest, by parsing the YAML files, searching for the repositories in question, and building a kustomize patch that replaces the image.

We could do this with a command:

$ okctl deploy ...

Alternatively, we could return some basic information about the image location:

$ okctl show repository ENV REPOSITORY_NAME

delete cluster feiler pga RDS CloudFormation stack ikke vil slettes

Det virker som nightly feiler fordi den ikke får slettet RDS CFN stack. Stacken klager over at RDSOutgoing security groupen ikke kan slettes fordi den må disassosieres fra et network interface først. Klarer ikke helt å se at dette er noe vi har forårsaket

Yngvar 7 Jul 2022

Antakelig får dette også nightly til å feile

Håvard 2 Sep 2022

CloudFormation stack fails: https://eu-west-1.console.aws.amazon.com/cloudformation/home?region=eu-west-1#/stacks/stackinfo?filteringStatus=active&filteringText=&viewNested=true&hideStacks=false&stackId=arn%3Aaws%3Acloudformation%3Aeu-west-1%3A853850742759%3Astack%2Fokctl-rdspostgres-okctl-nightly-nightlydb%2Fcae17850-277e-11ed-9f5d-02d05c00a183

resource sg-052cf453a28849f96 has a dependent object

pkg/cfn/components/securitygroup/securitygroup.go - func NewPostgresOutgoing() is created but dependent object looks like the incoming group: NewPostgresIncoming

Adopt the XDG Base Directory Specification

Background

The XDG base directory specification defines a set of environment variables for controlling where configuration and cache files should be written to.

This makes is possible for the end users to better control where the okctl artefacts are written to. This is useful for ensuring that important configuration files are backed up automatically, while cached binaries are not.

Details

$XDG_DATA_HOME

Defines the base directory relative to which user specific data files should be stored. If $XDG_DATA_HOME is either not set or empty, a default equal to $HOME/.local/share should be used.

$XDG_CONFIG_HOME

Defines the base directory relative to which user specific configuration files should be stored. If $XDG_CONFIG_HOME is either not set or empty, a default equal to $HOME/.config should be used.

Research (and possibly bump) Grafana high severity vulnerability

ref
CVE-2022-31176

This task is to figure out how severe the vulnerability is, how it affects us, and how much / when we need to prioritize it

This vulnerability affects a plugin called Grafana Image Renderer, which is optional (and not bundled in our version, ref https://grafana.julius.oslo.systems/plugins for your cluster)

No action needed for now

Fjern TLS 1.0 og 1.1 fra Grafana og ArgoCD

Det er en kjapp endring å fjerne TLS 1.0/1.1 fra nye clustre, se #1004

For å spare tid lar vi være å lage upgrade for dette. Det kan løses ved å reinstallere grafana og argocd (sette til false i cluster manifest, og kjøre okctl aply cluster, deretter true).

Add support for saving secrets in a local keyring

Background

Afraid you won't have anything fun to do this weekend? Well, here is a neat issue to keep you occupied.

We want to be able to store secrets in a local keyring, such as macOS's keychain, Pass, etc. The primary purpose is to store a user's AWS password in that keyring so that they don't have to enter it for every new session.

Details

The most viable library for implementing this feature appears to be: https://github.com/99designs/keyring.

Determine if the user has a compatible keyring on their system
Make it possible to store their choice in the application config
Expand the interactive application configurator and ask if they wish to save their password in the keyring

Fikse nightly

[FEATURE] Add support for additional prometheus scrape config

Due to our use of envoy proxies in our cluster we need some custom scrape config in prometheus as the proxies don't get scraped by default.

We have currently worked around this issue by applying the changes described here:
https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/additional-scrape-config.md to the CRD provided by okctl.

But it would be nice if the block:

additionalScrapeConfigs: name: additional-scrape-configs key: prometheus-additional.yaml

was already included in the prometheus CRD when we configure a new cluster.

[BUG]Bad feedback to user when trying to apply application to non-existing environment

Describe the bug
When trying to apply a application to a non-existing environment you get a error message that is cryptic: Error: failed to authenticate with aws: no valid credentials: authenticator[0]: failed to populate required fields: AwsAccountID: cannot be blank

To Reproduce
Only have dev environment set up and try to apply to prod: okctl apply application prod -f application.yaml

Expected behavior
A error message with a clearer text: Cannot apply application to non-existing environment

[BUG]Better output message if gpg fails

Describe the bug
Running okctl results in the output gpg: dekryptering mislyktes: No secret key without describing what this is and how to fix it.

To Reproduce
Running any okctl command will result in this output if nothing is set up (and there is no documentation as far as I can see on how I should set up pass for this to work)

Bytte fra Ingress ingress networkingv1beta1 til v1

Ikke mulig å lage ingresser med networkingv1beta1 fra og med 1.22.

https://kubernetes.io/docs/reference/using-api/deprecation-guide/

Denne featuren må inn når Okctl skal støtte EKS 1.22. Ikke før. Fordi den medfører også at ALB controller må bumpes, som har en viss risiko at noe feiler. Og pga fristen på 1.21 end of support fra AWS nærmer seg, kan vi ikke risikere at noe feiler, vi må ha en lett og feilfri oppgradering til EKS 1.21.

Yngvar 18 Aug 2022

Mulig det er avhengighet til Bumpe AWS Load Balancer controller - så den må gjøres sammen med denne tasken. Pga issues med pathType:

kubernetes-sigs/aws-load-balancer-controller: Issue #1702

kubernetes-sigs/aws-load-balancer-controller: Pull Request 1772

kubernetes-sigs/aws-load-balancer-controller: Issue #2066

Branch: 22Q2-44-use_ingress_v1

Research upgrade failing due to SecurityGroup used by old node

Description

When attempting to upgrade EKS to 1.21 recently, the following happened:

% ./upgrade.sh cluster-dev.yaml eu-west-1 1.21 | tee "logs/eks-upgrade-1-21-$(date +"%Y-%m-%dx%H-%M-%S").log"


------------------------------------------------------------------------------------------------------------------------
Verify AWS account
------------------------------------------------------------------------------------------------------------------------

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Running command [2022-09-15 10:01:25]: aws sts get-caller-identity
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

------------------------------------------------------------------------------------------------------------------------
Download dependencies to /tmp/eks-upgrade/1-21
------------------------------------------------------------------------------------------------------------------------
Running: curl --location  https://github.com/weaveworks/eksctl/releases/download/v0.104.0/eksctl_Darwin_amd64.tar.gz | tar xz -C  /tmp/eks-upgrade/1-21
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 29.0M  100 29.0M    0     0  15.0M      0  0:00:01  0:00:01 --:--:-- 18.0M
Running: curl --location  https://dl.k8s.io/release/v1.21.14/bin/darwin/amd64/kubectl  -o  /tmp/eks-upgrade/1-21/kubectl
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   138  100   138    0     0    775      0 --:--:-- --:--:-- --:--:--   797
100 50.4M  100 50.4M    0     0  14.6M      0  0:00:03  0:00:03 --:--:-- 17.1M

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Running command [2022-09-15 10:01:31]: chmod +x /tmp/eks-upgrade/1-21/eksctl
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~


~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Running command [2022-09-15 10:01:31]: chmod +x /tmp/eks-upgrade/1-21/kubectl
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~


~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Running command [2022-09-15 10:01:31]: /tmp/eks-upgrade/1-21/eksctl version -o json
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Running command [2022-09-15 10:01:32]: /tmp/eks-upgrade/1-21/kubectl version --client=true --output=yaml
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

------------------------------------------------------------------------------------------------------------------------
Verify cluster name
------------------------------------------------------------------------------------------------------------------------

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Running command [2022-09-15 10:01:32]: /tmp/eks-upgrade/1-21/eksctl get cluster xxxxxxxxxxxxxx-dev
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
NAME        VERSION STATUS  CREATED         VPC         SUBNETS                                         SECURITYGROUPS      PROVIDER
xxxxxxxxxxxxxx-dev 1.20    ACTIVE  2021-05-27T07:10:47Z    vpc-0ba89374891c80fb8   subnet-00b77c17380b708e2,subnet-042d70465122a5da6,subnet-04b69ed7310e36114,subnet-06359b07b96a5ba3a,subnet-06a4cf61b0f9c9daa,subnet-06a9a2365a2566328   sg-0d45570581c5492c9    EKS

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~


------------------------------------------------------------------------------------------------------------------------
Do these variables look okay?
------------------------------------------------------------------------------------------------------------------------
Upgrading EKS to version: 1.21
Cluster manifest: cluster-dev.yaml
Cluster name: xxxxxxxxx-dev
AWS account: xxxxxxxxxxxxxxxx
AWS region: eu-west-1
Dry run: true

Do these variables look okay? (Y/n) YY


------------------------------------------------------------------------------------------------------------------------
Run upgrade of EKS control plane. Estimated time: 10-15 min.
------------------------------------------------------------------------------------------------------------------------
💡 Tip: You can go to EKS in AWS console to see the status is set to 'Updating'.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Running command [2022-09-15 10:01:35]: /tmp/eks-upgrade/1-21/eksctl upgrade cluster --name xxxxxxxxxxxxxx-dev --version 1.21
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
2022-09-15 10:01:38 [ℹ]  (plan) would upgrade cluster "xxxxxxxxxxxxxx-dev" control plane from current version "1.20" to "1.21"
2022-09-15 10:01:41 [ℹ]  re-building cluster stack "eksctl-xxxxxxxxxxxxxx-dev-cluster"
2022-09-15 10:01:41 [✔]  all resources in cluster stack "eksctl-xxxxxxxxxxxxxx-dev-cluster" are up-to-date
2022-09-15 10:01:44 [!]  stack's status of nodegroup named eksctl-xxxxxxxxxxxxxx-dev-nodegroup-ng-generic is DELETE_FAILED
2022-09-15 10:01:44 [ℹ]  checking security group configuration for all nodegroups
2022-09-15 10:01:44 [ℹ]  all nodegroups have up-to-date cloudformation templates
2022-09-15 10:01:44 [!]  no changes were applied, run again with '--approve' to apply the changes

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~


------------------------------------------------------------------------------------------------------------------------
Replacing node groups, step 1 of 4: Create configuration for new node groups.
------------------------------------------------------------------------------------------------------------------------
2022-09-15 10:01:47 [!]  stack's status of nodegroup named eksctl-xxxxxxxxxxxxxx-dev-nodegroup-ng-generic is DELETE_FAILED

Cause

eksctl was unable to delete the stack used to create the nodegroups for EKS 1.20. The reason is that the stack
refers to security groups that are being used. The security groups in question are RDSPostgresIncoming and
RDSPostgresOutgoing.

Comments

My first thought was that, either we must deattach security groups before attempting to upgrade, and reattach
them afterwards, or we should research if we should be attaching the postgres security groups to something
else in EKS than the nodegroups that is more stable, and do not need deattaching when upgradring.

However, there are yet some things I don't understand with this issue:

Why did the execution of the upgrade script exit with an error when running it for this team, but it succeeded
with the same message for another team?
- But it probaly doesn't matter. The other team also still have the same CloudFormation stack in state DELETE_FAILED. But everything seems to work fine.
Why was this not a problem when upgrading from 1.19 to 1.20? The nodegroups should have been deleted at that time as well.

To do

Research what to do about this.
- Do we need to configure how Security groups are setup? In that case it means to adjust okctl-code and create an okctl-upgrade for it to fix existing setups - or perhaps cleanup manually is faster and just as safe.

Kjøre eksctl create iamidentitymapping på cluster.yaml:users for å sikre tilgang til cluster

Behov: Som bruker ønsker man å kunne lage et cluster deklarativt uten å måtte kjøre masse ekstrakommandoer. Det er samme type behov som https://trello.com/c/4qGHeGZs/3-sette-disabletcpearlydemux-i-apply-cluster.

Denne tasken går ut på å gjøre det sånn at man slipper å kjøre det som står under https://www.okctl.io/authenticating-to-aws/#aws-single-sign-on-sso -> "Allow SSO logins to cluster".

Et alternativ til å bruke users-lista i cluster.yaml, er å gi tilgang til rollen slik som dokumentasjonen over beskriver. Da får alle i teamet full aksess til EKS. Det kan være greit nok.

Fjern code coverage fra github actions

Undersøke sikkerhetshull i Grafana

ref

Støtte EKS 1.23 i Okctl

Okctl må støtte gitt versjon av EKS.

Dette er en oversikts-task, man kan delegeere til nye, mindre tasks etter behov.

Bakgrunn

Amazon EKS Kubernetes versions - Amazon EKS
Updating an Amazon EKS cluster Kubernetes version - Amazon EKS
Amazon EKS platform versions - Amazon EKS
Deprecated API Migration Guide

Bumpe EKS versjon

Knownbinaries: kubectl for gitt EKS verjson
Fiks at eksctl setter opp gitt EKS versjon
Testing: Se https://github.com/oslokommune/ykctl/blob/main/testing.md
Sjekk om noe må gjøres mtp https://github.com/prometheus-operator/kube-prometheus#compatibility

1.23 Spesifikt

Melding i EKS panelet: The Container Storage Interface (CSI) migration feature offloads management operations of persistent volumes provisioned with the in-tree EBS storage plugin to the Amazon EBS CSI driver. This feature is enabled by default in Amazon EKS version 1.23 and later. If you are using EBS volumes in your cluster, then you must install the Amazon EBS CSI driver before updating your cluster to version 1.23 to avoid interruptions to your workloads. - https://docs.aws.amazon.com/eks/latest/userguide/platform-versions.html

Diverse

Oppdatere med påkrevd versjon av Okctl: https://github.com/oslokommune/okctl-upgrade/tree/main/gists/bump-eks#eks-123

Følge opp team vedrørende Grafana vulnerabilities

ref #1057

Support new region: eu-north-1

Is your feature request related to a problem? Please describe.
Our team need to be able to deploy our application to eu-north-1 (Stockholm), currently only Ireland and Frankfurt are supported.

Additional context
We rely on a service that is connected to eu-north-1, and as such need to be able to deploy our cluster there

[BUG]okctl adduser: not able to use oslo.kommune.no address

Describe the bug
Running okctl adduser dev [email protected] results in Error: UserEmail: must be a valid email address

To Reproduce
Run okctl adduser dev [email protected]

Expected behavior
I would expect that using origo.oslo.kommune.no address would be possible,using my gmail.com address works fine

Write post mortem for Loki logging going down

All information is provided in https://oslokommune.slack.com/archives/CV9EGL9UG/p1663762107259069.

ToDo:

Update Slack thread when post mortem is done/merged: https://oslokommune.slack.com/archives/CV9EGL9UG/p1663762107259069

[FEATURE] Consider replacing AWSALBIngressController with AWSLoadBalancerController

We would like to use aws-load-balancer-type: "nlb-ip" which enables us to create network load balancers for use with pods running on Fargate nodes. This functionality is available in AWSLoadBalancerController but not in AWSALBIngressController.

As the new AWSLoadBalancerController is backwards compatible with the existing AWSALBIngressController, this should in theory not create any problems with the existing infrastructure but works as a simple replacement.

[FEATURE]Documentation: how to delete everything if things goes really wrong

Is your feature request related to a problem? Please describe.
If okctl delete cluster dev fails there are no guidelines on how to get unstuck from various cloudformation states

Describe the solution you'd like
Documentation on what to delete manually, in which order, and how to get unstuck from possible problems that might arise when using okctl when setting up a cluster or a application. I got a recipe on slack from Julius, but this should be readily available for everyone

Undersøke død Loki i kjøremiljø-prod

Får ikke sett noen logger i Grafana.

Nina 19 Aug 2022

Venter med denne til Julius er tilstede

Bumpe AWS Load Balancer controller

Release node group volume encryption upgrade

Depends on #1052

Teams should have cleaned up their node groups before releasing this upgrade

Fullføre delete application i delete cluster

[BUG] ExternalSecretsServiceAccountPolicy is missing permissions to use AWS Secrets Manager

Describe the bug
When we try to use ExternalSecrets with AWS Secrets Manager on a newly created cluster, we get the following error:
token-file-web-identity is not authorized to perform: secretsmanager:GetSecretValue on resource

By adding secretsmanager:GetSecretValue permission to the ExternalSecretsServiceAccountPolicy it works as expected

Si ifra på #okctl-viktig hvordan sette opp Cognito MFA

ToDo

Lag dokumentasjon på okctl.io

Follow up on teams cleaning up manual changes to node group security group

Ref #1038

Undersøke Loki som ikke kan slette DynamoDB tabeller

Slack thread

Update fargate quota check to use vCPU quota instead of task count

The Fargate resource quota check in okctl, pkg/servicequota/fargatecheck.go, must be updated to use quota code L-3032A538 instead.

ref Slack-thread

Dokumenter flere apper som bruker samme database

Støtte EKS 1.22 i Okctl

Okctl må støtte gitt versjon av EKS.

Dette er en oversikts-task, man kan delegeere til nye, mindre tasks etter behov.

Bakgrunn

Amazon EKS Kubernetes versions - Amazon EKS
Updating an Amazon EKS cluster Kubernetes version - Amazon EKS
Amazon EKS platform versions - Amazon EKS
Deprecated API Migration Guide

Bumpe EKS versjon

Knownbinaries: kubectl for gitt EKS verjson
Fiks at eksctl setter opp gitt EKS versjon
Test funksjonalitet: Se https://github.com/oslokommune/ykctl/blob/main/testing.md
Sjekk om noe må gjøres mtp https://github.com/prometheus-operator/kube-prometheus#compatibility

1.22 Spesifikt

Bump AWS load balancer controller, ref https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html#update-1.22 (ctrl+f "you must update it to version 2.4.1")
Bumpe external secrets controller (pga utgåtte ressurstyper)
Bump CSIDriver (reconciler: persistent storage)
Trenger vi bumpe aws-iam-authenticator mon tro? https://stackoverflow.com/a/71319893/915441
Melding i EKS panelet i AWS console: The Kubernetes BoundServiceAccountTokenVolume feature introduced an expiration time to service account tokens. This feature is enabled by default in EKS v1.21 and later clusters. You may have to update your application dependencies to refetch service account tokens to avoid API server request errors.Learn more https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html#kubernetes-1.21

1.22 utgåtte ressurser

Ingress (brukes i apply application. Andre steder?) - Dekkes av bytte-fra-ingress-ingress-networkingv1beta1-til-v1
PodSecurityPolicy? Sjekk om vi har det noe steder.
Sjekk lenkene i description for om det er noen andre.

Lage upgrades

AWS load balancer controller 2.4.1 eller nyere
External secrets controller
CSIDriver / storage driver
aws-iam-authenticator (hvis vi trenger det, se over)

Diverse

Oppdatere med påkrevd versjon av Okctl: https://github.com/oslokommune/okctl-upgrade/tree/main/gists/bump-eks#eks-122

Kommentar fra Yngvar 15 Aug 2022

Noen utgåtte ressurstyper:

. applying cluster: reconciling persistent storage

warnings.go:70] storage.k8s.io/v1beta1 CSIDriver is deprecated in v1.19+, unavailable in v1.22+; use storage.k8s.io/v1 CSIDriver

Noen utgåtte ressurstyper, fra apply cluster. Antakeligvis vil det det å bumpe load balancer controller fikse dette.

.. applying cluster: reconciling AWS Load Balancer controller

warnings.go:70] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition

warnings.go:70] admissionregistration.k8s.io/v1beta1 MutatingWebhookConfiguration is deprecated in v1.16+, unavailable in v1.22+; use admissionregistration.k8s.io/v1 MutatingWebhookConfiguration

warnings.go:70] admissionregistration.k8s.io/v1beta1 ValidatingWebhookConfiguration is deprecated in v1.16+, unavailable in v1.22+; use admissionregistration.k8s.io/v1

Noen utgåtte ressurstyper, fra delete cluster:

deleting cluster: reconciling secrets controller

warnings.go:70] rbac.authorization.k8s.io/v1beta1 ClusterRoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRoleBinding

deleting cluster: reconciling secrets controller

warnings.go:70] rbac.authorization.k8s.io/v1beta1 ClusterRole is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRole

Yngvar Kristiansen 15 Aug at 14:59
Sjekk når podsecuritypolicy som brukes av bl.a. loki blir deprekert og når det slutter å funke

EKS: 1.22 upgrade

[FEATURE] Documentation: argocd & docker images

Is your feature request related to a problem? Please describe.
Coming into okctl for the first time, and to be able to get up and running quickly with argocd, there should be a best practice/introduction for setting up docker images and secrets.

Describe the solution you'd like
A step by step describing how to use gihub or aws to store docker images, and how to set up keys and include them in the application yaml file so that argocd can get access to them.

[FEATURE]Better documentation - visualize the cluster & application setup

Is your feature request related to a problem? Please describe.
After running okctl and setting up a cluser & application I have no easy overview of what okctl is doing behind the scene when setting up clusters and applications

Describe the solution you'd like
A visual overview of what is being deploy: network, groups, connections, clusters etc. so that it is easy to get new people to understand the infrastructure being set up with okctl. Each version or major upgrade should have a corresponding visual representation of the infrastructure

Loki chart: In README, add instruction to run README in cloudformation directory

Add encryption to nodegroup node volumes

Finn ut av om vi egentlig trenger å gjøre dette. Er den viktig nok til å bruke tid på?

Teori, kan kanskje løses av å finne fram til riktig CF template, og sett en encryption: true eller noe sånt

Trengs research:

Hva er konsekvensen av å gjøre dette?
Hva er konsekvensen av å ikke gjøre dette?

Hvor mye jobb er det å oppgradere?
Kan teamene gjøre oppgraderingen selv?

Hvis fixen er så enkel som vi tror, så kan det implementeres i okctl

upgrade script: Colors don't show on Mac OS Terminal

The secret used for MFA in Cognito is static. Research how to flag an MFA device as compromised

If the MFA device secret gets exposed, there is no simple way to disable it. It seems like one has to recreate the app client

Manual solution: Reset client manually if this happens. Or test reinstalling Cognito.

oslokommune / okctl Goto Github PK

okctl's Introduction

okctl - Opinionated and effortless infrastructure and application management

Installation

Getting started

1. Create a new GitHub repository

2. Create a cluster

Common commands

Functionality

Core cluster

Application lifecycle

Compare and contrast

Inspiration

okctl's People

Contributors

Stargazers

Watchers

Forkers

okctl's Issues

Checklist

Container image repository lifecycle management

Integrating lifecycle management with okctl

How should we assist?

Extending the cluster.yaml

Background

Details

Background

Details

Description

Cause

Comments

To do

Bakgrunn

Bumpe EKS versjon

1.23 Spesifikt

Diverse

ToDo

Bakgrunn

Bumpe EKS versjon

1.22 Spesifikt

1.22 utgåtte ressurser

Lage upgrades

Diverse

Kommentar fra Yngvar 15 Aug 2022

Recommend Projects

Recommend Topics

Recommend Org

`okctl` - Opinionated and effortless infrastructure and application management

Extending the `cluster.yaml`