aws / eks-distro Goto Github PK

Amazon EKS Distro (EKS-D) is a Kubernetes distribution based on and used by Amazon Elastic Kubernetes Service (EKS) to create reliable and secure Kubernetes clusters.

Home Page: https://distro.eks.amazonaws.com/

License: Apache License 2.0

Makefile 32.31% Shell 43.69% Dockerfile 4.11% Smarty 2.84% Go 17.05%

aws kubernetes eks

eks-distro's People

Contributors

Stargazers

Watchers

Forkers

shaneutt sftim chanwit rishabh96b marceloboeira diegopacheco micahhausler abhinavmpandey08 hutchins jonathan-yin2020 ganasagar fdegir vivek-koppuru gwonsoolee aylei kschumy letsfigureout literalice cristiklein samhays devopstoday11 clix-dev-llc rothgar laghao hakkisagdic sanemguzeller jargelo stevekim-git heegsi jaxesn suhaibaffan gdm klapcsik eedygreen vignesh-goutham rachlenko bnrjee macduff23 terryhowe denis256 hyakuhei njc-gov thhsfdc devpokhariya fangcai120 adovbiy ccamacho lgs nairb774 wowzoo plorent musavi mrajashree aws-paris-bot akshaya-venkatesh8 tuapuikia creatone jyotimahapatra hongjunan pokearu wongma7 roachmj taneyland eks-distro-pr-bot jiayiwang7 benrabaa abhay-krishna devops-edx chrisnegus sudhir-a2000 bryanasdev000 ccannell67 xlogin tharun-330 ahreehong soodabhinav95 stevensu1977 qpc-database ekmixon satyamsingh877 gugas1nwork dsasikumar08 jonathan-conder-sm ni14forgit devops-nigeria ambermehra apertus-dev g-gaston mansfield6 bsjung gugafer sk-telemed nathanawmk kabassociates violethaze74 eternalerrors rubycommunists xuanrgn vanducvo cctvbtx

eks-distro's Issues

Upgrade Prow cluster k8s version to v1.22

Current version is 1.21. Do this for all Prow clusters.

https://code.amazon.com/packages/AwsParisProwInfrastructureCDK/trees/mainline

Update Prow version

What would you like to be added:

Why is this needed:

How to install EKS-D on-premise on ubuntu 20.04

following install procedure is succeeded on ubuntu 18.04.6, but failed on ubuntu 20.04.4
this is based on https://distro.eks.amazonaws.com/users/install/kubeadm-onsite/
what is wrong for ubuntu 20.04?

focal@ubuntu:$ sudo apt-get update
[sudo] password for focal:
Hit:1 https://download.docker.com/linux/ubuntu focal InRelease
Hit:2 http://jp.archive.ubuntu.com/ubuntu focal InRelease
Hit:3 http://jp.archive.ubuntu.com/ubuntu focal-updates InRelease
Hit:4 http://jp.archive.ubuntu.com/ubuntu focal-backports InRelease
Hit:5 http://jp.archive.ubuntu.com/ubuntu focal-security InRelease
Reading package lists... Done
focal@ubuntu:$ sudo apt-get upgrade
Reading package lists... Done
Building dependency tree
Reading state information... Done
Calculating upgrade... Done
The following packages have been kept back:
fwupd
0 upgraded, 0 newly installed, 0 to remove and 1 not upgraded.
focal@ubuntu:~$ docker version
Client: Docker Engine - Community
Version: 20.10.13
API version: 1.41
Go version: go1.16.15
Git commit: a224086
Built: Thu Mar 10 14:07:51 2022
OS/Arch: linux/amd64
Context: default
Experimental: true

Server: Docker Engine - Community
Engine:
Version: 20.10.13
API version: 1.41 (minimum version 1.12)
Go version: go1.16.15
Git commit: 906f57f
Built: Thu Mar 10 14:05:44 2022
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.5.10
GitCommit: 2a1d4dbdb2a1030dc5b01e96fb110a9d9f150ecc
runc:
Version: 1.0.3
GitCommit: v1.0.3-0-gf46b6ba
docker-init:
Version: 0.19.0
GitCommit: de40ad0
focal@ubuntu:$ sudo nano /etc/docker/daemon.json
focal@ubuntu:$ focal@ubuntu:$ sudo systemctl enable docker
Synchronizing state of docker.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable docker
focal@ubuntu:$ sudo systemctl daemon-reload
focal@ubuntu:$ sudo systemctl restart docker
focal@ubuntu:$ sudo apt-get install apt-transport-https
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
apt-transport-https
0 upgraded, 1 newly installed, 0 to remove and 1 not upgraded.
Need to get 4,680 B of archives.
After this operation, 162 kB of additional disk space will be used.
Get:1 http://jp.archive.ubuntu.com/ubuntu focal-updates/universe amd64 apt-transport-https all 2.0.6 [4,680 B]
Fetched 4,680 B in 0s (33.3 kB/s)
Selecting previously unselected package apt-transport-https.
(Reading database ... 72098 files and directories currently installed.)
Preparing to unpack .../apt-transport-https_2.0.6_all.deb ...
Unpacking apt-transport-https (2.0.6) ...
Setting up apt-transport-https (2.0.6) ...
focal@ubuntu:$ sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
focal@ubuntu:$ echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main
focal@ubuntu:$ sudo apt-get update
Hit:1 https://download.docker.com/linux/ubuntu focal InRelease
Hit:2 http://jp.archive.ubuntu.com/ubuntu focal InRelease
Hit:3 http://jp.archive.ubuntu.com/ubuntu focal-updates InRelease
Hit:4 http://jp.archive.ubuntu.com/ubuntu focal-backports InRelease
Hit:5 http://jp.archive.ubuntu.com/ubuntu focal-security InRelease
Get:6 https://packages.cloud.google.com/apt kubernetes-xenial InRelease [9,383 B]
Get:7 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 Packages [54.7 kB]
Fetched 64.1 kB in 1s (55.9 kB/s)
Reading package lists... Done
focal@ubuntu:$ sudo apt-get install kubelet=1.21.9-00 kubeadm=1.21.9-00 kubectl=1.21.9-00
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
conntrack cri-tools ebtables kubernetes-cni socat
Suggested packages:
nftables
The following NEW packages will be installed:
conntrack cri-tools ebtables kubeadm kubectl kubelet kubernetes-cni socat
0 upgraded, 8 newly installed, 0 to remove and 1 not upgraded.
Need to get 77.2 MB of archives.
After this operation, 328 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 http://jp.archive.ubuntu.com/ubuntu focal/main amd64 conntrack amd64 1:1.4.5-2 [30.3 kB]
Get:3 http://jp.archive.ubuntu.com/ubuntu focal/main amd64 ebtables amd64 2.0.11-3build1 [80.3 kB]
Get:4 http://jp.archive.ubuntu.com/ubuntu focal/main amd64 socat amd64 1.7.3.3-2 [323 kB]
Get:2 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 cri-tools amd64 1.23.0-00 [15.3 MB]
Get:5 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubernetes-cni amd64 0.8.7-00 [25.0 MB]
Get:6 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubelet amd64 1.21.9-00 [18.9 MB]
Get:7 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubectl amd64 1.21.9-00 [9,013 kB]
Get:8 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubeadm amd64 1.21.9-00 [8,590 kB]
Fetched 77.2 MB in 14s (5,467 kB/s)
Selecting previously unselected package conntrack.
(Reading database ... 72102 files and directories currently installed.)
Preparing to unpack .../0-conntrack_1%3a1.4.5-2_amd64.deb ...
Unpacking conntrack (1:1.4.5-2) ...
Selecting previously unselected package cri-tools.
Preparing to unpack .../1-cri-tools_1.23.0-00_amd64.deb ...
Unpacking cri-tools (1.23.0-00) ...
Selecting previously unselected package ebtables.
Preparing to unpack .../2-ebtables_2.0.11-3build1_amd64.deb ...
Unpacking ebtables (2.0.11-3build1) ...
Selecting previously unselected package kubernetes-cni.
Preparing to unpack .../3-kubernetes-cni_0.8.7-00_amd64.deb ...
Unpacking kubernetes-cni (0.8.7-00) ...
Selecting previously unselected package socat.
Preparing to unpack .../4-socat_1.7.3.3-2_amd64.deb ...
Unpacking socat (1.7.3.3-2) ...
Selecting previously unselected package kubelet.
Preparing to unpack .../5-kubelet_1.21.9-00_amd64.deb ...
Unpacking kubelet (1.21.9-00) ...
Selecting previously unselected package kubectl.
Preparing to unpack .../6-kubectl_1.21.9-00_amd64.deb ...
Unpacking kubectl (1.21.9-00) ...
Selecting previously unselected package kubeadm.
Preparing to unpack .../7-kubeadm_1.21.9-00_amd64.deb ...
Unpacking kubeadm (1.21.9-00) ...
Setting up conntrack (1:1.4.5-2) ...
Setting up kubectl (1.21.9-00) ...
Setting up ebtables (2.0.11-3build1) ...
Setting up socat (1.7.3.3-2) ...
Setting up cri-tools (1.23.0-00) ...
Setting up kubernetes-cni (0.8.7-00) ...
Setting up kubelet (1.21.9-00) ...
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /lib/systemd/system/kubelet.service.
Setting up kubeadm (1.21.9-00) ...
Processing triggers for man-db (2.9.1-1) ...
focal@ubuntu:$ sudo apt-mark hold kubelet kubeadm kubectl
kubelet set on hold.
kubeadm set on hold.
kubectl set on hold.
focal@ubuntu:$ cd /usr/bin
focal@ubuntu:/usr/bin$ sudo rm kubelet kubeadm kubectl
focal@ubuntu:/usr/bin$ sudo wget https://distro.eks.amazonaws.com/kubernetes-1-21/releases/10/artifacts/kubernetes/v1.21.9/bin/linux/amd64/kubelet
--2022-03-21 23:36:40-- https://distro.eks.amazonaws.com/kubernetes-1-21/releases/10/artifacts/kubernetes/v1.21.9/bin/linux/amd64/kubelet
Resolving distro.eks.amazonaws.com (distro.eks.amazonaws.com)... 13.32.54.56, 13.32.54.60, 13.32.54.25, ...
Connecting to distro.eks.amazonaws.com (distro.eks.amazonaws.com)|13.32.54.56|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 118112256 (113M) [binary/octet-stream]
Saving to: ‘kubelet’

kubelet 100%[=================================================>] 112.64M 6.25MB/s in 19s

2022-03-21 23:36:59 (6.05 MB/s) - ‘kubelet’ saved [118112256/118112256]

focal@ubuntu:/usr/bin$ sudo wget https://distro.eks.amazonaws.com/kubernetes-1-21/releases/10/artifacts/kubernetes/v1.21.9/bin/linux/amd64/kubeadm
--2022-03-21 23:37:05-- https://distro.eks.amazonaws.com/kubernetes-1-21/releases/10/artifacts/kubernetes/v1.21.9/bin/linux/amd64/kubeadm
Resolving distro.eks.amazonaws.com (distro.eks.amazonaws.com)... 13.32.54.56, 13.32.54.60, 13.32.54.25, ...
Connecting to distro.eks.amazonaws.com (distro.eks.amazonaws.com)|13.32.54.56|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 44732416 (43M) [binary/octet-stream]
Saving to: ‘kubeadm’

kubeadm 100%[=================================================>] 42.66M 5.77MB/s in 7.3s

2022-03-21 23:37:14 (5.81 MB/s) - ‘kubeadm’ saved [44732416/44732416]

focal@ubuntu:/usr/bin$ sudo wget https://distro.eks.amazonaws.com/kubernetes-1-21/releases/10/artifacts/kubernetes/v1.21.9/bin/linux/amd64/kubectl
--2022-03-21 23:37:26-- https://distro.eks.amazonaws.com/kubernetes-1-21/releases/10/artifacts/kubernetes/v1.21.9/bin/linux/amd64/kubectl
Resolving distro.eks.amazonaws.com (distro.eks.amazonaws.com)... 13.32.54.56, 13.32.54.60, 13.32.54.25, ...
Connecting to distro.eks.amazonaws.com (distro.eks.amazonaws.com)|13.32.54.56|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 46542848 (44M) [binary/octet-stream]
Saving to: ‘kubectl’

kubectl 100%[=================================================>] 44.39M 5.00MB/s in 8.8s

2022-03-21 23:37:36 (5.05 MB/s) - ‘kubectl’ saved [46542848/46542848]

focal@ubuntu:/usr/bin$ sudo chmod +x kubeadm kubectl kubelet
focal@ubuntu:/usr/bin$ cd
focal@ubuntu:$ sudo nano /etc/default/kubelet
focal@ubuntu:$ focal@ubuntu:$ sudo swapoff -a
focal@ubuntu:$ sudo nano /etc/fstab
focal@ubuntu:~~$ focal@ubuntu:~~$ sudo systemctl enable kubelet
focal@ubuntu:$ sudo mkdir -p /var/lib/kubelet
focal@ubuntu:$ sudo nano /var/lib/kubelet/kubeadm-flags.env
focal@ubuntu:~~$ focal@ubuntu:~~$ sudo docker pull public.ecr.aws/eks-distro/etcd-io/etcd:v3.4.18-eks-1-21-10
v3.4.18-eks-1-21-10: Pulling from eks-distro/etcd-io/etcd
4dfd587572c7: Pull complete
5430a4b1aee0: Pull complete
69d160e00699: Pull complete
1854c3e0826f: Pull complete
96c094bf7e44: Pull complete
a875aa951018: Pull complete
Digest: sha256:7174fee9e550cba9b30d373006db3a8387bb9b06d14db132879358bf98993f71
Status: Downloaded newer image for public.ecr.aws/eks-distro/etcd-io/etcd:v3.4.18-eks-1-21-10
public.ecr.aws/eks-distro/etcd-io/etcd:v3.4.18-eks-1-21-10
focal@ubuntu:$ sudo docker pull public.ecr.aws/eks-distro/kubernetes/pause:v1.21.9-eks-1-21-10
v1.21.9-eks-1-21-10: Pulling from eks-distro/kubernetes/pause
40ea8c1b5979: Pull complete
Digest: sha256:646fa0ffa8bb584aa8fbda9551b91c7561eb25d725ddd879152531bc9c8febf4
Status: Downloaded newer image for public.ecr.aws/eks-distro/kubernetes/pause:v1.21.9-eks-1-21-10
public.ecr.aws/eks-distro/kubernetes/pause:v1.21.9-eks-1-21-10
focal@ubuntu:$ sudo docker pull public.ecr.aws/eks-distro/coredns/coredns:v1.8.4-eks-1-21-10
v1.8.4-eks-1-21-10: Pulling from eks-distro/coredns/coredns
c4e16a868f6d: Pull complete
afcf1694b62e: Pull complete
bfaa59083871: Pull complete
7103d0ccafd5: Pull complete
Digest: sha256:13164a59ef3419242e568d9b6fa8fe3c483b771a5684f74fbf9a25b9ac72a201
Status: Downloaded newer image for public.ecr.aws/eks-distro/coredns/coredns:v1.8.4-eks-1-21-10
public.ecr.aws/eks-distro/coredns/coredns:v1.8.4-eks-1-21-10
focal@ubuntu:$ sudo docker tag public.ecr.aws/eks-distro/etcd-io/etcd:v3.4.18-eks-1-21-10 public.ecr.aws/eks-distro/kubernetes/etcd:3.4.13-0
focal@ubuntu:$ sudo docker tag public.ecr.aws/eks-distro/kubernetes/pause:v1.21.9-eks-1-21-10 public.ecr.aws/eks-distro/kubernetes/pause:3.4.1
focal@ubuntu:$ sudo docker tag public.ecr.aws/eks-distro/coredns/coredns:v1.8.4-eks-1-21-10 public.ecr.aws/eks-distro/kubernetes/coredns:v1.8.0
focal@ubuntu:$ sudo kubeadm init --image-repository public.ecr.aws/eks-distro/kubernetes --kubernetes-version v1.21.9-eks-1-21-10
[init] Using Kubernetes version: v1.21.9-eks-1-21-10
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local ubuntu] and IPs [10.96.0.1 192.168.66.132]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost ubuntu] and IPs [192.168.66.132 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost ubuntu] and IPs [192.168.66.132 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

    Unfortunately, an error has occurred:
            timed out waiting for the condition

    This error is likely caused by:
            - The kubelet is not running
            - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

    If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
            - 'systemctl status kubelet'
            - 'journalctl -xeu kubelet'

    Additionally, a control plane component may have crashed or exited when started by the container runtime.
    To troubleshoot, list all containers using your preferred container runtimes CLI.

    Here is one example how you may list all Kubernetes containers running in docker:
            - 'docker ps -a | grep kube | grep -v pause'
            Once you have found the failing container, you can inspect its logs with:
            - 'docker logs CONTAINERID'

error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
focal@ubuntu:~$

Possibly add Include-IPv6-addresses-in-NodeAddresse.patch to other branches

What would you like to be added:

It was added to 1.22. It might need to be added to 1.21 or others

Why is this needed:

How to install EKS-D On-Premise on CentOS or Bare Metal machines

What would you like to be added:
I'd like to run EKS-D On-Premise either on CentOS or Bare Metal machines

Why is this needed:
EKS-D is publicised as the AWS Open Source K8s distribution which can be used by anyone to create and manage K8s clusters On-Premise. As part of our On-Prem workloads we want to use EKS-D to run our AI/ML jobs

I couldn't find any relevant instructions anywhere of how to install EKS-D by ourselves On-Premise. Please help to guide.

error when executing delete_cluster script

What happened:
After I executed delete_cluster.sh then the error message error removing cluster from state store: refusing to delete: unknown file found: s3://<KOPS_STATE_STORE>/<KOPS_CLUSTER_NAME>/config appeared on the console.

So I executed delete_store.sh, and S3 bucket is deleted.
Do I need to execute delete_store.sh after executing delete_cluster.sh?

If I need to execute delete_store.sh, I suggest to add delete_store.sh following page.
EKS Distro kOps Cluster:kOps Cluster Delete

Documentation regarding kubelet version.

Is there any specific version of kubelet we need to use or is it compatible with the upstream Kubernetes ?

There is no community docs - How to deploy in a Libvirt host

What happened:

There are no docs for the community https://distro.eks.amazonaws.com/community/overview/ is empty.

I will like to try eks-d on a libvirt host, (3 master and 2 worker nodes). Can you point me to any docs you might have for doing this? Do you have a slack channel?

Thanks.

63 node limit when using EKS-D from Canonical

What happened:

I recently wrote a blog post titled "63-Node EKS Cluster running on a Single Instance with Firecracker". The goal of this blog post is to create an EKS cluster on multiple nodes, where each of these nodes is in turn a VM powered by Firecracker. Although possibly insubstantial here, all VMs were running on a single physical host.

As the post mentions, my initial attempt was to create a 100 clusters node. Apparently, I was able to do so. But when listing the nodes (i.e. kubectl get nodes) only 63 would be listed. Given that 63=2^6-1, it might be some kind of "magic number".

The version of EKS-D used is Canonical's (EKS Distro snap). However this might be a particular issue with this distribution or an upstream issue, I'm filing the issue here for completeness.

What you expected to happen:

That all 100 nodes would be listed.

How to reproduce it (as minimally and precisely as possible):

All the source code that lead to the blog post is public and open source. It is fully scripted, and should be easy to reproduce. Review the blog post itself and the first part (linked from the blog post) for further information.

Basically, all 100 VMs are started via the provided scripts. And then Ansible is used to install EKS-D via snap package and configure the nodes. Two different (top-level) roles are used: one for the master, one for the workers. Each worker is pre-assigned a token, used to join the cluster. Please note that no errors were found on the eks join commands (99 of such commands where successful) yet only 63 nodes were listed.

Anything else we need to know?:

Environment:

EKS Distro Release Channel: EKS-D by Canonical (snap)
EKS Distro Release: latest

Ignore me

What would you like to be added:

Nothing. EKS-D is flawless

Why is this needed:

I'm just testing something out. Please ignore

Add karpenter autoscaler to eks-d/a prow infra

What would you like to be added:
Karpenter to the prow eks clusters.

Why is this needed:
We run out of resources from time to time in our prow clusters when doing releases, esp now that we have a number of eks-a jobs running in prow. We should add karpenter to support scaling up/down the cluster based on need.

CoreDNS/kubeadm incompatibility

What happened:
I was more or less following these instructions for EKS-D 1-21-4: https://distro.eks.amazonaws.com/users/install/kubeadm-onsite/
But CoreDNS fails to start with these errors in the logs:

pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.EndpointSlice: failed to list *v1beta1.EndpointSlice: endpointslices.discovery.k8s.io is forbidden: User "system:serviceaccount:kube-system:coredns" cannot list resource "endpointslices" in API group "discovery.k8s.io" at the cluster scope

What you expected to happen:
CoreDNS should be Running and Ready

How to reproduce it (as minimally and precisely as possible):
Host system is ubuntu 20.04.
Install kubeadm=1.21.2-00, kubectl=1.21.2-00 and kubelet=1.21.2-00 following https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
Then do this:

$ for binary in kubeadm kubectl kubelet; do curl -O "https://distro.eks.amazonaws.com/kubernetes-1-21/releases/4/artifacts/kubernetes/v1.21.2/bin/linux/amd64/${binary}"; done
$ chmod +x kubeadm kubectl kubelet
# cp kubeadm kubectl kubelet /usr/bin/
# docker pull public.ecr.aws/eks-distro/kubernetes/pause:v1.21.2-eks-1-21-4
# docker tag public.ecr.aws/eks-distro/kubernetes/pause:v1.21.2-eks-1-21-4 public.ecr.aws/eks-distro/kubernetes/pause:3.4.1

Create kubeadm-config.yaml with the following contents (replace podSubnet with something available on your network):

apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
imageRepository: public.ecr.aws/eks-distro/kubernetes
kubernetesVersion: v1.21.2-eks-1-21-4
dns:
  imageRepository: public.ecr.aws/eks-distro/coredns
  imageTag: v1.8.3-eks-1-21-4
etcd:
  local:
    imageRepository: public.ecr.aws/eks-distro/etcd-io
    imageTag: v3.4.16-eks-1-21-4
networking:
  podSubnet: 10.177.0.0/16
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd

Finally run:

# kubeadm init --config kubeadm-config.yaml
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
$ kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml

and check the logs of one of the coredns pods.

Anything else we need to know?:
The error is due to a change in CoreDNS 1.8.1: https://coredns.io/2021/01/20/coredns-1.8.1-release/ and has been fixed in upstream kubeadm but not backported to 1.21 (which uses 1.8.0 by default): kubernetes/kubernetes@74feb07#diff-80bea83c0faf0435d38773c725ba352bfd0e7e0aee6d0cdaa1d223ec5a4189b4
I suggest you cherry-pick that commit or downgrade your CoreDNS version accordingly.

Environment:
I used versions from here: https://github.com/aws/eks-distro/blob/v1-21-eks-4/development/pull-all.sh

docker pull failed with "missing signature key"

What happened:
I followed https://distro.eks.amazonaws.com/users/install/kubeadm-onsite/
Failed in Set up a control plane node
3 .Pull and retag the pause, coredns, and etcd containers (copy and paste as one line):

sudo docker pull public.ecr.aws/eks-distro/kubernetes/pause:v1.19.8-eks-1-19-4
Trying to pull repository public.ecr.aws/eks-distro/kubernetes/pause ...
v1.19.8-eks-1-19-4: Pulling from public.ecr.aws/eks-distro/kubernetes/pause
missing signature key

What you expected to happen:
I can pull these images.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment: centos 7

EKS Distro Release Channel:
EKS Distro Release:

proxy kubeconfig file missing

What happened:

ps aux | grep kube-proxy
root      5567  0.0  0.2 743612 43668 ?        Ssl  May05   5:24 kube-proxy --v=2 --config=/var/lib/kube-proxy-config/config

cat /var/lib/kube-proxy-config/config
cat: /var/lib/kube-proxy-config/config: No such file or directory

What you expected to happen:
Should be able to review the contents of the file /var/lib/kube-proxy-config/config

How to reproduce it (as minimally and precisely as possible):
Deploy eks cluster

kubectl version
Server Version: version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.8-eks-96780e", GitCommit:"96780e1b30acbf0a52c38b6030d7853e575bcdf3", GitTreeState:"clean", BuildDate:"2021-03-10T21:32:29Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}

Anything else we need to know?:

Environment:

EKS Distro Release Channel:
EKS Distro Release:

ARM64 architecture support for RPi testing

What would you like to be added:
Add ARM64 support.

Why is this needed:
Since Microk8s already supports this for Raspberry Pi installs and testing, this will help a number of customers, given the EKS Anywhere announcement. https://microk8s.io/docs/install-alternatives#heading--arm

cluster_wait script can not proceed next step

After I executed create_cluster.sh, then I executed cluster_wait.sh (by the way, cluster_wait.sh has no permission to execute. is it intended?)

cluster_wait.sh has 3 steps

kops validate cluster --wait 10m
kubectl apply -f ./aws-iam-authenticator.yaml
kubectl delete pod -n kube-system -l k8s-app=aws-iam-authenticator

it should go to "kubectl apply" step after "kops validate cluster", but it doesn't.
I tried cluster_wait.sh several times, but the script stops after "kops validate cluster"

with this fail message

VALIDATION ERRORS
KIND	NAME					MESSAGE
Pod	kube-system/aws-iam-authenticator-mllms	system-node-critical pod "aws-iam-authenticator-mllms" is pending

Validation Failed
W0103 11:26:40.703155   33617 validate_cluster.go:221] (will retry): cluster not yet healthy

Validation failed: wait time exceeded during validation

when I execute the command one by one, I can create cluster without any problems.

Docs: Access Denied

What happened:

Getting "Access Denied" when accessing the documentation page. URL https://distro.eks.amazonaws.com/

<Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>E88503E37DCAB5C2</RequestId>
<HostId>wAa+as8c0S44Xn+SJlhMzVf0imIz9PDl3TUGESxF3Ugo3oAfCYy5F+sE7laeI9mavQWYchjDf+g=</HostId>
</Error>

What you expected to happen:

That I can access the documentation

How to reproduce it (as minimally and precisely as possible):

curl  https://distro.eks.amazonaws.com/ -i

the output should not have status 403

Anything else we need to know?:

Environment:

EKS Distro Release Channel:
EKS Distro Release:

Use kops default coredns config

The kops.sh script currently sets an external coredns config file. This file is more or less equivalent to kops 1.19 default.
I suspect this is because kops prior to 1.19 set upstream. But since 1.19 features (metrics server addon) is used anyway, it is probably better to just rely on kops doing the right thing.

Relying on kops to configure coredns correctly will also ensure users automatically receives future changes automatically rather than manually apply the changes in the cluster spec.

kind binary + node image builds

What would you like to be added:

EKS Distro should supply kind binary + node image builds. These may not need to be included in the EKS Distro release artifact YAML, but the binary and node-image container image could be supplied out-of band of our releases.

What needs to be done?:

Based on the discussion with @chanwit in #125 (comment)

Create a new Builder() that can reference EKS-Distro artifacts. I think there would be advantage in adding the OCI tars we provide, as kind could side-load the images and use those by default
- I prototyped a release builder in my fork of kind that can reference an upstream K8s release. This works well with upstream releases (trykind build node-image --type release --release-url https://dl.k8s.io/v1.20.0 -v 10), but fails with an EKS Distro release URL (https://distro.eks.amazonaws.com/kubernetes-1-18/releases/1/artifacts/kubernetes/v1.18.9) because our image tars are OCI tars and not docker tars. We'll need to scope how much effort it would be to have kind be able to load OCI image tars.
Add kind under projects/kubernetes-sigs/kind/
The current kind base image builds from Ubuntu and our preference would be to go with Amazon Linux 2. It doesn't look like it would be a heavy lift to move to an AL2 image and install containerd/runc from amazon-linux-extras. crictl might be the only thing in that base image that neither Amazon Linux nor EKS-distro (yet) builds today.
We'll have to figure out a solution for kind build node-image, because it uses the Docker API, and EKS-Distro build jobs don't have Docker available and instead use buildkit for container images. Either our postsubmit builder needs access to a Docker API, we need to make kind support buildctl/docker buildx, or maybe we can just do everything in a squashed Dockerfile and not even need the release builder in the first place

Update runbook re: prow jobs

What would you like to be added:

There have been a number of changes to how to make new prow jobs for releases. The runbook (https://github.com/aws/eks-distro/blob/main/docs/development/runbook.md) needs to be changed to reflect that

Why is this needed:

Runbook is out-of-date

healthCheck fail when externalTrafficPolicy is set to Local on a loadBalancer service with NLB

What happened:
I tried to set up a service with an NLB with the property externalTrafficPolicy set to Local in order to keep the sourceIP so I followed this blog post:
https://aws.amazon.com/fr/blogs/opensource/network-load-balancer-support-in-kubernetes-1-9/

All Instances are declared unhealthy by the NLB and it should not.

What you expected to happen:
According to the post, at least one node should be healthy and the traffic should be correctly routed to the pod on the healthy node.

How to reproduce it (as minimally and precisely as possible):

Create an EKS cluster using the version 1.19.eks3 with two node
Deploy the nginx pod:
kubectl run nginx --image=nginx --port=80 --labels app=nginx
Deploy the nginx service

kind: Service
metadata:
  name: nginx
  namespace: default
  labels:
    app: nginx
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
spec:
  externalTrafficPolicy: Local
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: nginx
  type: LoadBalancer

Anything else we need to know?:
We tested this on a native kubernetes cluster deployed on EC2 with the same version (1.19.8) and it works.

Environment:

EKS Distro Release: 1.19.8 (1.19.eks3)

Add Kubernetes version to table in README

What would you like to be added:

Add Kubernetes version to table in README and have doc make commands update. It's the versions under this section https://github.com/aws/eks-distro#releases

Why is this needed:

I hate having to check this all the time

Remove outdated user section in docs

What would you like to be added:

Remove all pages under https://distro.eks.amazonaws.com/users/build-prerequisites/
https://distro.eks.amazonaws.com/users/build/

Why is this needed:

Directions are in the repo. The ones on the website are redundant and out-of-date.

Update CSI READMEs to say where to get correct golang version

What would you like to be added:

See #784 (comment)

Why is this needed:

Add version tag to auto generated release PRs

What would you like to be added:

Why is this needed:

error when executing create_cluster.sh

What happened:
I was deploying eke-distro on my Mac Pro.

I set KOPS_STATE_STORE and KOPS_CLUSTER_NAME as guide said.
create_configuration.sh was successfully executed.
then I executed create_cluster.sh, but error happened like below.

Found multiple arguments which look like a cluster name
"xxx.xxx.com" (via flag)
"xxx.xxx.com" (as argument)

This often happens if you specify an argument to a boolean flag without using =
For example: use --bastion=true or --bastion, not --bastion true

I found below article. As I understand, it is because I already set KOPS_CLUSTER_NAME as an environment variable.
kubernetes/kops#5967

Because there are two ways you can pass this to kops -- one is using an environment variable (which I think is the way the devs do it - hence the command returned at the end of the run doesn't have it) and the other is using the --name switch (which I do)

In short, if you have the cluster name included on an environment variable, then kops doesn't need the --name switch, if you don't have the environment variable, then you need the --name switch.

so I modified create_cluster.sh like this. then it worked.

set -eo pipefail

BASEDIR=$(dirname "$0")
source ${BASEDIR}/set_k8s_versions.sh

kops update cluster --yes

Is it right approach? or is there any other way to execute the script without error?

Invalid prerelease Semantic Version string used by EKS

What happened:

I tried to install a Helm chart into EKS which had a version requirement like:

kubeVersion: ">= 1.19.0"

This returned an error:

  Error: chart requires kubeVersion: >= 1.19.0 which is incompatible with Kubernetes v1.20.7-eks-d88609

What you expected to happen:

I expected the chart to install since the kubernetes version is nominally supported.

How to reproduce it (as minimally and precisely as possible):

$ helm create foo
Creating foo
$ echo 'kubeVersion: ">= 1.19.0"' >> foo/Chart.yaml
$ helm --kube-version=1.20.0-eks-123 template foo
Error: chart requires kubeVersion: >= 1.19.0 which is incompatible with Kubernetes v1.20.0-eks-123

Use --debug flag to render out invalid YAML

Anything else we need to know?:

The problem is that the practice of appending -eks-* to the version returned by the kubernetes API is not correct according to Semantic Versioning. A hyphen-appended suffix indicates a prerelease version. For build metadata like the eks suffix on a production release version, a + should be used: 1.20.0+eks-123. See SemVer spec items 9 and 10.

Demonstrating that this works using the test case above:

$ helm --kube-version=1.20.0+eks-123 template foo
...

There is further discussion here: helm/helm#3810

Environment:

EKS Distro Release Channel: n/a
EKS Distro Release: v1.20.7-eks-d88609

PR title for `make release-docs` is wrong

What happened:
The PR title for make release-docs is wrong. Example of generated title: "1.18 15 docs 1645237576"

What you expected to happen:
The title should be something like this: "Added docs for 1.18-15 release"

Other CNI support

Currently eks only support aws-vpc-cni, does eks-d support other cnis ?

did not find Api endpoint for gossip hostname, may not be able to reach cluster in Aws Eks-d (solved)

followed the eks distro official documentation for the kops option.
on this website
https://distro.eks.amazonaws.com/users/install/kops/
git clone https://github.com/aws/eks-distro.git
cd eks-distro/development/kops

export KOPS_STATE_STORE=s3://my-temp-store
i don't know have a specific domain to create a cluster with that,
so that using the gossip dns as stated in the kops tutorial on
https://kops.sigs.k8s.io/getting_started/aws/
https://kops.sigs.k8s.io/gossip/
export KOPS_CLUSTER_NAME=my-test-cluster.k8s.local

setting the AWS region
export AWS_REGION=us-east-1

Creating a cluster

./install_requirements.sh
./create_values_yaml.sh
./create_configuration.sh 
./create_cluster.sh

the create cluster script gives the below message, which is the reason for the following errors.
did not find API endpoint for gossip hostname, may not be able to reach the cluster

the script creates a cluster on AWS with ec2 instances
can get the cluster-info successfully.
kops get

but when I run kubectl commands it gives, can't look up the Kubernetes API

Now the same process as above but with kops directly.
the major difference between the eks distro script mode and manual with kops
is that the kops cluster does things fast and with less data

Major Info, eks Distro scripts run the same kops commands that we run now.
so this is the preferred way mostly, EKS Distro builds upon these commands.

to create a cluster first export the following details, as per your need.
export KOPS_STATE_STORE=s3://my-temp-store
export KOPS_CLUSTER_NAME=my-test-cluster.k8s.local

now create a cluster with this command
kops create cluster --cloud=aws --zones=us-east-1a,us-east-1b,us-east-1c --yes

Info: zones and region are not same, with eks d, AWS region variable only uses one zone in the region,
with the zones in the kops command, it takes the specific zones specified

the above command creates a cluster and gives this output at the end of console
which means the process is successful.
kops has set your kubectl context to

Now for the problem and solution

the main reason the kops cluster process worked while the eks-d process failed
is that the kops process created a load balancer(amazon elb), while the eks-d didn't.

our cluster name is a gossip DNS hostname, so it doesn't have an IP to access or name resolute
this gossip cluster name only communicates inside a cluster, so if we want to access the k8s API
outside the cluster, which we definitely want to
we need a load balancer before that gossip DNS.

here is the key point, I found why kops successfully created a load balancer for our gossip DNS
while eks didn't.
kops get cluster -o yaml

that command gives the cluster config created by both the kops and eks-d
run that individually with the respective deployment process to get the actual config used.

kops cluster
kops get cluster -o yaml > kops.yaml

eks-d cluster
kops get cluster -o yaml > eks.yaml

now when comparing both the YAML files,
I noticed one thing, most of the YAML config is the same,
except with the following

spec:
  api:
    loadBalancer:
      class: Classic
      type: Public
# from this everything is same
    dns: {} # this is same,

so my idea is to add the above extra config that is in the kops config file to the eks-d config file.
the only way this works with eks-d is by editing the eks-d.tpl file

nano ./eks-d.tpl

add the above lines to the config
now save it and run the following scripts to deploy a cluster

./install_requirements.sh
./create_values_yaml.sh
./create_configuration.sh 
./create_cluster.sh

when the process is done, we get this output at the console end.
exporting kubecfg for cluster
which means the local system, can access the k8s API

run this
./cluster_wait.sh
this will take around 10 minutes
but once done, we can run kubectl and helm commands like a normal k8s cluster.

Now for the reason I created this issue, there needs to be an option with environmental variables, if we want a load balancer created with gossip DNS.

even if we are not using this setup in the production, we still need to know the way things work.
so now that you know have a good day.

Fix docs workflow

What would you like to be added:
It would be better if make release-docs did all the git stuff it does when the openPR flag is true BUT did not actually submit the PR. It'd be nice if submitting the PR was still an option though

Why is this needed:
Makes the release docs workflow less cumbersome. Right now, when running this command, it will open a PR and delete the local branch.

stat ./<cluster name>/values.yaml: no such file or directory

I use https://distro.eks.amazonaws.com/users/install/kops/ guide create eks-d , but get error "stat ./cluster01.example.com/values.yaml: no such file or directory"

What happened:
✔ AWS_REGION=us-east-1
✔ AWS CLI authenticated
✔ KOPS_STATE_STORE=s3://kops-eks-d
✔ KOPS_CLUSTER_NAME=cluster01.example.com
Using kOps state store: s3://kops-eks-d
Creating ./cluster01.example.com/aws-iam-authenticator.yaml
Creating cluster01.example.com.yaml

stat ./cluster01.example.com/values.yaml: no such file or directory

kops version
Version 1.20.2

Consider creating release branch specific SNS topics

What would you like to be added:

Why is this needed:

Improved README.md

Hello,
A link(s) on the main AWS site points here. I was hoping this would explain how to install/use EKS-DISTRO, but the README.md is about building docs. It is unclear what the 'built' docs are even for (for example, would it show you an example of installing EKS-D?).

For those who are new to EKS-D, I think the README.md should start with information about EKS-D, not how to build some documentation.

Maybe there is a page that does all the 'basics', but I have yet to find it. I know I will though as soon as I hit "Submit new issue" :)

Create some kubeadm init documentation for deploying eks-d

What would you like to be added:

Documentation on how to deploy eks-d with kubeadm init

Why is this needed:

Help people with deployment

Add codebuild pipeline for eks-d

What would you like to be added:
Currently we do all builds via prow, pre and postsubmit. We should consider adding a codebuild pipeline similar to eks-anywhere to support additional build options.

Why is this needed:
To support internal customers who require builds of EKS-D

Make it easier to find all EKS dependencies

What would you like to be added:
Documentation to understand all adjacent dependencies of an EKS cluster, such as (not comprehensive list):

EKS AMI Building (https://github.com/awslabs/amazon-eks-ami)
Amazon Linux 2 used in AMI building (for hardening)
AWS CNI
Etcd Version
CoreDns version

Honestly, I dont know how far down the rabbit hole goes, but thats the point.

Why is this needed:

As a EKS cluster creator, cloud-architect, security professional or curious enthusiast, I need to understand the choices made in EKS that could be material to my own decision-making.

Examples are:

Ability to experiment with new patches to AWS CNI or other components before officially adopted
Determine if a Linux, Etcd, or other component's CVE applies to my EKS cluster
Determine whether EKS AMI's meet CIS standards
Determine whether Linux Kernel features are available to Nodes and Containers within the cluster (kernel 5.4 is not supported by Amazon Linux 2 which is a dependency of EKS AMI's which is a dependency of EKS)

Some of these are very specific to needs I have encountered, but I can see benefits of good documentation of all dependencies and links to adjacent Github repos where further details could be found on secondary and tertiary dependencies.

Basically, don't make me go hunting through this repo and several others to understand the totality of all EKS dependencies.

CoreDNS - dial tcp i/o timeout

What happened:
Some coredns PODs randomly give errors like these on AWS EKS :

ERROR] plugin/errors: 2 xxxxxxxxxxxxxx.eu-west-3.rds.amazonaws.com. AAAA: dial tcp 10.28.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 xxxxxxxxxxxxxxx.eu-west-3.rds.amazonaws.com. AAAA: dial tcp 10.28.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 xxxxxxxxxxxxxxxxxxxxx.eu-west-3.rds.amazonaws.com. AAAA: dial tcp 10.28.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 xxxxxxxxxxxxxxxxxxxxx.eu-west-3.rds.amazonaws.com. AAAA: dial tcp 10.28.0.2:53: i/o timeout

What you expected to happen:
no timeouts; the interesting part is that specific IP 10.28.0.2 is nowhere to be found; the kube-dns service is always in the 172.XXX range.

The fix for us was to switch the coredns deployment to a daemonSet instead; we did noticed that 2 PODs were regularly assigned to the same node, totally ignoring their podAntiAffinity which specifically prevents 2 coredns PODs to be assigned to the same node.

How to reproduce it (as minimally and precisely as possible):
N/A, using vanilla EKS cluster 1.16

Anything else we need to know?:
we saw the issue while performing load tests, during increased traffic.
the version of CoreDNS: 1.6.6
Kubernetes version: 1.16.12

Corefile:
.:53 {
errors
health
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance

Environment:

EKS Distro Release Channel:
EKS Distro Release: amazon-eks-node-1.16-v20210519

Failed to pull image public.ecr.aws/eks-distro/kubernetes/etcd:3.4.13-0 when installing with kubernetes-version v1.21.2-eks-1-21-3

Hi Fellows,
I ran into issue that failed to pull image public.ecr.aws/eks-distro/kubernetes/etcd:3.4.13-0 when kubeadm init with kubernetes-version v1.21.5-eks-1-21-9

Background

I'm trying to install the EKSD on bare metal based on the guide: https://distro.eks.amazonaws.com/users/install/kubeadm-onsite/
I installed with current latest version 1-21-9
https://distro.eks.amazonaws.com/releases/1-21/9/

What has failed

The installation is successful.
But it failed when coming to execute kubeadm init command:

kubeadm -v 7 init --image-repository public.ecr.aws/eks-distro/kubernetes --kubernetes-version v1.21.5-eks-1-21-9

The command output is:

root@control-plane-1:~# kubeadm -v 7 init --image-repository public.ecr.aws/eks-distro/kubernetes --kubernetes-version v1.21.5-eks-1-21-9
I0228 04:05:42.920153 1179099 initconfiguration.go:115] detected and using CRI socket: /var/run/dockershim.sock
I0228 04:05:42.920723 1179099 interface.go:431] Looking for default routes with IPv4 addresses
I0228 04:05:42.920747 1179099 interface.go:436] Default route transits interface "eno1np0"
I0228 04:05:42.920983 1179099 interface.go:208] Interface eno1np0 is up
I0228 04:05:42.921080 1179099 interface.go:256] Interface "eno1np0" has 2 addresses :[20.11.64.117/24 fe80::b226:28ff:fe4b:1880/64].
I0228 04:05:42.921131 1179099 interface.go:223] Checking addr 20.11.64.117/24.
I0228 04:05:42.921152 1179099 interface.go:230] IP found 20.11.64.117
I0228 04:05:42.921174 1179099 interface.go:262] Found valid IPv4 address 20.11.64.117 for interface "eno1np0".
I0228 04:05:42.921194 1179099 interface.go:442] Found active IP 20.11.64.117
[init] Using Kubernetes version: v1.21.5-eks-1-21-9
[preflight] Running pre-flight checks
I0228 04:05:42.981821 1179099 checks.go:582] validating Kubernetes and kubeadm version
I0228 04:05:42.981865 1179099 checks.go:167] validating if the firewall is enabled and active
I0228 04:05:42.990740 1179099 checks.go:202] validating availability of port 6443
I0228 04:05:42.991756 1179099 checks.go:202] validating availability of port 10259
I0228 04:05:42.991832 1179099 checks.go:202] validating availability of port 10257
I0228 04:05:42.991868 1179099 checks.go:287] validating the existence of file /etc/kubernetes/manifests/kube-apiserver.yaml
I0228 04:05:42.991889 1179099 checks.go:287] validating the existence of file /etc/kubernetes/manifests/kube-controller-manager.yaml
I0228 04:05:42.991903 1179099 checks.go:287] validating the existence of file /etc/kubernetes/manifests/kube-scheduler.yaml
I0228 04:05:42.991916 1179099 checks.go:287] validating the existence of file /etc/kubernetes/manifests/etcd.yaml
I0228 04:05:42.991939 1179099 checks.go:437] validating if the connectivity type is via proxy or direct
I0228 04:05:42.991982 1179099 checks.go:476] validating http connectivity to first IP address in the CIDR
I0228 04:05:42.992010 1179099 checks.go:476] validating http connectivity to first IP address in the CIDR
I0228 04:05:42.992022 1179099 checks.go:103] validating the container runtime
I0228 04:05:43.036464 1179099 checks.go:129] validating if the "docker" service is enabled and active
I0228 04:05:43.102682 1179099 checks.go:336] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
I0228 04:05:43.102831 1179099 checks.go:336] validating the contents of file /proc/sys/net/ipv4/ip_forward
I0228 04:05:43.102887 1179099 checks.go:654] validating whether swap is enabled or not
I0228 04:05:43.102943 1179099 checks.go:377] validating the presence of executable conntrack
I0228 04:05:43.103000 1179099 checks.go:377] validating the presence of executable ip
I0228 04:05:43.103026 1179099 checks.go:377] validating the presence of executable iptables
I0228 04:05:43.103056 1179099 checks.go:377] validating the presence of executable mount
I0228 04:05:43.103068 1179099 checks.go:377] validating the presence of executable nsenter
I0228 04:05:43.103081 1179099 checks.go:377] validating the presence of executable ebtables
I0228 04:05:43.103100 1179099 checks.go:377] validating the presence of executable ethtool
I0228 04:05:43.103112 1179099 checks.go:377] validating the presence of executable socat
I0228 04:05:43.103124 1179099 checks.go:377] validating the presence of executable tc
I0228 04:05:43.103152 1179099 checks.go:377] validating the presence of executable touch
I0228 04:05:43.103167 1179099 checks.go:525] running all checks
I0228 04:05:43.163983 1179099 checks.go:408] checking whether the given node name is valid and reachable using net.LookupHost
I0228 04:05:43.164657 1179099 checks.go:623] validating kubelet version
I0228 04:05:43.226445 1179099 checks.go:129] validating if the "kubelet" service is enabled and active
[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
I0228 04:05:43.235205 1179099 checks.go:202] validating availability of port 10250
I0228 04:05:43.235313 1179099 checks.go:202] validating availability of port 2379
I0228 04:05:43.235370 1179099 checks.go:202] validating availability of port 2380
I0228 04:05:43.235430 1179099 checks.go:250] validating the existence and emptiness of directory /var/lib/etcd
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
I0228 04:05:43.282451 1179099 checks.go:850] pulling public.ecr.aws/eks-distro/kubernetes/kube-apiserver:v1.21.5-eks-1-21-9
I0228 04:07:54.154419 1179099 checks.go:850] pulling public.ecr.aws/eks-distro/kubernetes/kube-controller-manager:v1.21.5-eks-1-21-9
I0228 04:10:29.871136 1179099 checks.go:850] pulling public.ecr.aws/eks-distro/kubernetes/kube-scheduler:v1.21.5-eks-1-21-9
I0228 04:11:51.677215 1179099 checks.go:844] image exists: public.ecr.aws/eks-distro/kubernetes/kube-proxy:v1.21.5-eks-1-21-9
I0228 04:11:51.712994 1179099 checks.go:844] image exists: public.ecr.aws/eks-distro/kubernetes/pause:3.4.1
I0228 04:11:51.752692 1179099 checks.go:850] pulling public.ecr.aws/eks-distro/kubernetes/etcd:3.4.13-0
I0228 04:12:19.701682 1179099 checks.go:850] pulling public.ecr.aws/eks-distro/kubernetes/coredns:v1.8.0
[preflight] Some fatal errors occurred:
[ERROR ImagePull]: failed to pull image public.ecr.aws/eks-distro/kubernetes/etcd:3.4.13-0: output: Error response from daemon: repository public.ecr.aws/eks-distro/kubernetes/etcd not found: name unknown: The repository with name 'kubernetes/etcd' does not exist in the registry with id 'eks-distro'
, error: exit status 1
[ERROR ImagePull]: failed to pull image public.ecr.aws/eks-distro/kubernetes/coredns:v1.8.0: output: Error response from daemon: repository public.ecr.aws/eks-distro/kubernetes/coredns not found: name unknown: The repository with name 'kubernetes/coredns' does not exist in the registry with id 'eks-distro'
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=...
error execution phase preflight
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:235
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:421
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:152
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:850
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:958
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:895
k8s.io/kubernetes/cmd/kubeadm/app.Run
k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
runtime/proc.go:225
runtime.goexit
runtime/asm_amd64.s:1371

This is weird, it is attempting pulling etcd:3.4.13-0 which doesn't match what is defined in the eks-d release manifest.
In the release maniest, version of etcd says: public.ecr.aws/eks-distro/etcd-io/etcd:v3.4.16-eks-1-21-9

Can someone help on this?

Additional info that might help

OS: ubuntu 20.04 LTS
docker version: 20.10.7 (installed by apt install docker.io)
kubeadm version is v1.21.5-eks-1-21
kubectl version is v1.21.5-eks-1-21
kubelet version is v1.21.5-eks-1-21

failed to pull image "public.ecr.aws/eks-distro/kubernetes/pause:3.2"

What happened:
Hi Team,
I am trying to deploy an eks-d using Kubeadm.
Docs I referred to --> https://distro.eks.amazonaws.com/users/install/kubeadm/
I used the same kube.yaml file provided in the above documentation, edited the values containing {{ }}

Error While running command Kubeadm init --config kube.yaml

[preflight] Some fatal errors occurred: [ERROR ImagePull]: failed to pull image public.ecr.aws/eks-distro/kubernetes/pause:3.2: output: Error response from daemon: manifest for public.ecr.aws/eks-distro/kubernetes/pause:3.2 not found: manifest unknown: Requested image not found , error: exit status 1 [preflight] If you know what you are doing, you can make a check non-fatal with--ignore-preflight-errors=...error execution phase preflight k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1 /workspace/anago-v1.18.1-beta.0.38+49aac775931dd1/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:235 k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll /workspace/anago-v1.18.1-beta.0.38+49aac775931dd1/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:422 k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run /workspace/anago-v1.18.1-beta.0.38+49aac775931dd1/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:207 k8s.io/kubernetes/cmd/kubeadm/app/cmd.NewCmdInit.func1 /workspace/anago-v1.18.1-beta.0.38+49aac775931dd1/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:147 k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute /workspace/anago-v1.18.1-beta.0.38+49aac775931dd1/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:826 k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC /workspace/anago-v1.18.1-beta.0.38+49aac775931dd1/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:914 k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute /workspace/anago-v1.18.1-beta.0.38+49aac775931dd1/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:864 k8s.io/kubernetes/cmd/kubeadm/app.Run /workspace/anago-v1.18.1-beta.0.38+49aac775931dd1/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50 main.main _output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25 runtime.main /usr/local/go/src/runtime/proc.go:203 runtime.goexit /usr/local/go/src/runtime/asm_amd64.s:1357

I checked the ecr for the image versions of pause here https://gallery.ecr.aws/?searchTerm=eks-distro&verified=verified
ECR only contains public.ecr.aws/eks-distro/kubernetes/pause:v1.18.9-eks-1-18-1

public.ecr.aws/eks-distro/kubernetes/pause:3.2 image is not available in the ecr.

What you expected to happen:
Expected all the preflight requests to pass and launch the control plane node.

How to reproduce it (as minimally and precisely as possible):
docker pull public.ecr.aws/eks-distro/kubernetes/pause:3.2

Anything else we need to know?:
I think the error is related to this issue as well #108
Can you please add some documentation/example kube.yaml for Kubeadm join as well.

Switch prow instances to bottlerocket ami instead of al2

What would you like to be added:

We should consider switching to bottlerocket based nodes for our prow instances. There may be some challenges around multi-arch building support, but I think we can run bindfmt as a container?

Why is this needed:

how to build a HA etcd cluster using eks-distro ?

when i build k8s cluster using eks-distro, i find it only start one etcd pods(use kubeadm), am i miss something ?

and it's there any way to build a ha etcd cluster ?

Document how to run make release commands

What would you like to be added:

Document make commands for number and docs

Why is this needed:

kubelet has broken ECR auth configuration

What happened:

Setting AWS_PROFILE and AWS_CONFIG_FILE in an effort to change the role the kubelet uses to authenticate with AWS services didn't impact how ECR authentication was performed.

What you expected to happen:

In an effort to minimize the amount of implicit permissions available to pode on EKS, I moved the kubelet's permissions to a separate role, and instructed the kubelet to assume this new role by updating the relevant systemd unit. Most things seemed to work, but ECR auth didn't seem to be working like everything else. Confused with the odd behavior, as I expected https://github.com/kubernetes/kubernetes/blob/v1.17.12/pkg/credentialprovider/aws/aws_credentials.go#L215-L221 to be the relevant code, I took to decompiling the kubelet available in the provided EKS AMI. To my surprise, there was additional code added which doesn't seem to be documented anywhere:

eks-distro/projects/kubernetes/kubernetes/1-18/patches/0006-EKS-PATCH-using-regional-sts-endpoint-for-assume-ecr.patch

Lines 74 to 100 in 91374b7

    
           +	// TODO: Remove this once aws sdk is updated to latest version. 
        
           +	var provider credentials.Provider 
        
           +	ecrPullRoleArn := os.Getenv("ECR_PULL_ROLE_ARN") 
        
           +	assumeRoleRegion := os.Getenv("AWS_DEFAULT_REGION") 
        
           +	if ecrPullRoleArn != "" && assumeRoleRegion != "" { 
        
           +		stsEndpoint, err := k8saws.ConstructStsEndpoint(ecrPullRoleArn, assumeRoleRegion) 
        
           +		if err != nil { 
        
           +			return nil, err 
        
           +		} 
        
           +		klog.Infof("Using AWS assumed role, %v:%v:%v", ecrPullRoleArn, assumeRoleRegion, stsEndpoint) 
        
           +		provider = &stscreds.AssumeRoleProvider{ 
        
           +			Client:  sts.New(sess, aws.NewConfig().WithRegion(assumeRoleRegion).WithEndpoint(stsEndpoint)), 
        
           +			RoleARN: ecrPullRoleArn, 
        
           +		} 
        
           +	} else { 
        
           +		provider = &ec2rolecreds.EC2RoleProvider{ 
        
           +			Client: ec2metadata.New(sess), 
        
           +		} 
        
           +	} 
        
           + 
        
           +	creds := credentials.NewChainCredentials( 
        
           +		[]credentials.Provider{ 
        
           +			&credentials.EnvProvider{}, 
        
           +			provider, 
        
           +			&credentials.SharedCredentialsProvider{}, 
        
           +		}) 
        
           +	sess.Config.Credentials = creds

While I haven't yet updated to 1.18, the assembly I found in the Kubelet looks fairly similar to that code. The environment variable ECR_PULL_ROLE_ARN seems to be completely unknown to the greater internet (no results come up on Google, Bing or DuckDuckGo), and it would be really appreciated to either have this code properly documented, or ideally, just leave the default credential provider alone so all of the environment variable settings work as one might expect.

How to reproduce it (as minimally and precisely as possible):
Run the following on top of the current managed node group AMI (snippeted from our packer config):

cat > /etc/systemd/system/kubelet.service.d/aws-profile.conf <<"EOF"
[Service]
# The following config file needs to be created before attempting to start the
# kubelet:
EnvironmentFile=/etc/kubernetes/kubelet/aws.env
EOF
# Small kludge: We need to have the /usr/bin/aws-iam-authenticator (used for
# authenticating the kubelet with the control plane) use the default profile due
# to how the kube-system/aws-auth ConfigMap is managed. This makes a poor
# assumption about the format, but what can you do? We capture the leading
# indentation, and use that for forming the env-var block, hoping that the block
# doesn't exist.
! grep 'env:' /var/lib/kubelet/kubeconfig # Verify assumption
sed -i -e 's/^\( *\)command:.*$/&\
\1env:\
\1- name: AWS_PROFILE\
\1  value: default/' /var/lib/kubelet/kubeconfig
# Make sure the replacement happened:
grep -q 'env:' /var/lib/kubelet/kubeconfig

Set the contents of /etc/kubernetes/kubelet/aws.env to something like the following (easily done with cloud-init):

AWS_CONFIG_FILE=/etc/kubernetes/kubelet/aws.conf
AWS_PROFILE=kubelet

And set /etc/kubernetes/kubelet/aws.conf to the following with the following obvious substitution for the role_arn:

[profile kubelet]
credential_source = Ec2InstanceMetadata
role_arn = ${var.roles.kubelet_role_arn}

Make sure to setup the correct assume role permissions for the EC2 instance's role, and have the arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly and arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy hanging off of the Kubelet role instead.

Anything else we need to know?:

The goal of this is to minimize the amount of permissions easily accessible by pods running on the worker node. While this doesn't prevent the pods from assuming the same role as kubelet, it does prevent accidental use of the policies and helps to make sure that each pod has a correctly configured ServiceAccount/IAM Role setup.

Possible fix:

Looking at the code, it isn't clear why the setting of sess.Config.Credentials is unconditional at

eks-distro/projects/kubernetes/kubernetes/1-18/patches/0006-EKS-PATCH-using-regional-sts-endpoint-for-assume-ecr.patch

Line 100 in 91374b7

+ sess.Config.Credentials = creds

. It might make more sense to only change that value when the undocumented environment variables are set, thus explicitly choosing this path. Otherwise, the code should ideally behave as similar to what would normally be expected.

Environment:

EKS Distro Release Channel: AWS EKS
EKS Distro Release: AWS EKS 1.17 eks.4

Run EKS-D on microk8s on MacOS

What would you like to be added:
I'd like to run EKS-D on microk8s on MacOS.

Why is this needed:
This page: https://aws.amazon.com/blogs/opensource/introducing-amazon-eks-distro/, mentions being able to use microk8s to deploy an EKS-D compatible cluster.

Clicking through that link brings me here: https://snapcraft.io/eks

However, there is no 'snap' support on MacOS. On MacOS microk8s is installed via 'HomeBrew'.

I couldn't find equivalent instructions for microk8s on MacOs. If all I have to do is pass a channel name on microk8s launch, I didn't find it documented.

Add 1.22 support

What would you like to be added:

Why is this needed:

Discontinue Maintenence of 1-18

What would you like to be added:

Update...
- docs
- supported release branches
- Other things??

Migrate to CDK v2

EKS Distro uses CDK (cloud development kit) as an infrastructure-as-code modeling language. All of our infrastructure is defined as CDK.

We currently use CDK v1; we need to migrate to CDK v2 in order to support many new AWS features, constructors, and third party features (such as Prow features). Remaining on CDK v1 limits our ability to write modern, maintainable IaC and keep our infrastructure up-to-date.

The out come of this issue will be that all EKS Distro CDK is upgraded to CDK V2.

https://docs.aws.amazon.com/cdk/v2/guide/migrating-v2.html

Access denied for

Access denied for distro.eks.amazonaws.com

This XML file does not appear to have any style information associated with it. The document tree is shown below. <Error> <Code>AccessDenied</Code> <Message>Access Denied</Message> <RequestId>8C83AFB3556CA41A</RequestId> <HostId>FmQKJB6vhsCm10o40VolmtGPVJAqJBoRReJGHfXM5iKzuxgsEWg97Pe7HzxjrnDoYreJroxQesg=</HostId> </Error>

Hyperlink can be found on Readme

Tag changes likely break docs generation

These changes likely break docs generation:

	+ // TODO: Remove this once aws sdk is updated to latest version.
	+ var provider credentials.Provider
	+ ecrPullRoleArn := os.Getenv("ECR_PULL_ROLE_ARN")
	+ assumeRoleRegion := os.Getenv("AWS_DEFAULT_REGION")
	+ if ecrPullRoleArn != "" && assumeRoleRegion != "" {
	+ stsEndpoint, err := k8saws.ConstructStsEndpoint(ecrPullRoleArn, assumeRoleRegion)
	+ if err != nil {
	+ return nil, err
	+ }
	+ klog.Infof("Using AWS assumed role, %v:%v:%v", ecrPullRoleArn, assumeRoleRegion, stsEndpoint)
	+ provider = &stscreds.AssumeRoleProvider{
	+ Client: sts.New(sess, aws.NewConfig().WithRegion(assumeRoleRegion).WithEndpoint(stsEndpoint)),
	+ RoleARN: ecrPullRoleArn,
	+ }
	+ } else {
	+ provider = &ec2rolecreds.EC2RoleProvider{
	+ Client: ec2metadata.New(sess),
	+ }
	+ }
	+
	+ creds := credentials.NewChainCredentials(
	+ []credentials.Provider{
	+ &credentials.EnvProvider{},
	+ provider,
	+ &credentials.SharedCredentialsProvider{},
	+ })
	+ sess.Config.Credentials = creds

aws / eks-distro Goto Github PK

eks-distro's People

Contributors

Stargazers

Watchers

Forkers

eks-distro's Issues

following install procedure is succeeded on ubuntu 18.04.6, but failed on ubuntu 20.04.4 this is based on https://distro.eks.amazonaws.com/users/install/kubeadm-onsite/ what is wrong for ubuntu 20.04?

Background

What has failed

Additional info that might help

Recommend Projects

Recommend Topics

Recommend Org

following install procedure is succeeded on ubuntu 18.04.6, but failed on ubuntu 20.04.4
this is based on https://distro.eks.amazonaws.com/users/install/kubeadm-onsite/
what is wrong for ubuntu 20.04?