Comments (13)
@bryantbiggs no I'm not missing, all of those were added just fine, I just didn't add in my example because I think they are irrelevant here as the issue lies in EKS Node Group permissions, the one used to run the Karpenter Controller pods.
The karpenter module itself exposes a config to attach additional policies, but those are the ones used by the nodes created by Karpenter, it's a different role.
from terraform-aws-eks.
You seem to be missing all of the Karpenter components
terraform-aws-eks/examples/karpenter/main.tf
Lines 111 to 160 in 098c6a8
from terraform-aws-eks.
I think you are misunderstanding a few things:
- A reproduction should include all of the relevant pieces. If you are talking about this modules Karpenter sub-module, I think that must be included in the reproduction - otherwise, how can I help without knowing what you are doing?
- The permissions are there and they match the Karpenter controller IAM policy in the Karpenter repository
terraform-aws-eks/modules/karpenter/main.tf
Line 251 in 098c6a8
- The Karpenter controller uses the IAM role created in the Karpenter sub-module to provision nodes. It has nothing to do with the EKS MNG IAM role or its permissions - the EKS MNG IAM role should have very few permissions, only enough to support the VPC CNI operations
from terraform-aws-eks.
@bryantbiggs the permission issues only got resolved after I added extra permissions to the EKS module for the Karpenter node group like this (see eks_managed_node_groups.iam_role_additional_policies
below, and also the eks_karpenter_controller_policy
):
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 20.13.1"
cluster_name = var.cluster_name
cluster_version = var.cluster_version
cluster_endpoint_public_access = true
enable_cluster_creator_admin_permissions = false
kms_key_enable_default_policy = true
eks_managed_node_groups = {
karpenter_group = {
instance_types = ["t3.small"]
subnet_ids = module.vpc.private_subnets
# These extra permissions are required by Karpenter Controller pods
iam_role_additional_policies = {
AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
AmazonEC2FullAccess = "arn:aws:iam::aws:policy/AmazonEC2FullAccess"
additional = aws_iam_policy.eks_karpenter_controller_policy.arn
}
min_size = 2
max_size = 3
desired_size = 2
capacity_type = "SPOT"
taints = {
# This Taint aims to keep just EKS Addons and Karpenter running on this MNG
# The pods that do not tolerate this taint should run on nodes created by Karpenter
addons = {
key = "CriticalAddonsOnly"
value = "true"
effect = "NO_SCHEDULE"
},
}
}
}
cluster_addons = {
coredns = {
most_recent = true
}
kube-proxy = {
most_recent = true
}
eks-pod-identity-agent = {
most_recent = true
}
aws-ebs-csi-driver = {
most_recent = true
service_account_role_arn = module.ebs_csi_driver_irsa.iam_role_arn
}
vpc-cni = {
most_recent = true
}
}
vpc_id = module.vpc.vpc_id
subnet_ids = concat(module.vpc.private_subnets, module.vpc.intra_subnets)
control_plane_subnet_ids = concat(module.vpc.private_subnets, module.vpc.intra_subnets)
tags = merge(local.common_tags, {
# NOTE - if creating multiple security groups with this module, only tag the
# security group that Karpenter should utilize with the following tag
# (i.e. - at most, only one security group should have this tag in your account)
"karpenter.sh/discovery" = var.cluster_name
})
}
resource "aws_iam_policy" "eks_karpenter_controller_policy" {
name = "Karpenter-controller-${var.cluster_name}-policy"
path = "/"
description = "Additional policies attached to the Karpenter Controller which runs on EKS Node Group."
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = [
"pricing:*",
"iam:*",
]
Effect = "Allow"
Resource = "*"
},
]
})
tags = local.common_tags
}
I will also share the extra components you said I'm missing, but I just didn't add because I don't think the problem is related to them:
module "karpenter" {
source = "terraform-aws-modules/eks/aws//modules/karpenter"
version = "~> 20.13.1"
cluster_name = module.eks.cluster_name
enable_pod_identity = true
create_pod_identity_association = true
node_iam_role_additional_policies = {
AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}
tags = local.common_tags
}
# This will install karpenter on the EKS cluster
resource "helm_release" "karpenter" {
namespace = "karpenter"
create_namespace = true
name = "karpenter-${var.cluster_name}"
repository = "oci://public.ecr.aws/karpenter"
chart = "karpenter"
version = "0.37.0"
values = [
<<-EOT
settings:
clusterName: ${module.eks.cluster_name}
clusterEndpoint: ${module.eks.cluster_endpoint}
EOT
]
depends_on = [
module.eks.cluster_id
]
}
@bryantbiggs let me know if you also want me to share my NodePools and EC2NodeClasses.
from terraform-aws-eks.
@bryantbiggs you seem to be misunderstanding my report.
There are two IAM Roles created by the whole setup:
- The one attached to the EKS Node Group where Karpenter Controller pods run
- The one used by the nodes created by Karpenter to run other workload
The error logs I showed are from the Karpenter Controller, which is missing some permissions.
The only way I managed to fix this, was by attaching the extra permissions required by the Karpenter Controller in the EKS module which is attached to the EKS Node Group, so not the Karpenter module.
from terraform-aws-eks.
There are two IAM Roles created by the whole setup:
False - there are three roles in your setup.
- The IAM role used by nodes created by EKS MNG
- The Karpenter controller IAM role - used for creating/removing nodes that it launches
- The IAM role used by the nodes that Karpenter creates - these permissions will be very similar to the IAM role used by nodes created by EKS MNG
you are giving the node IAM role the permissions, which means anything that runs on the nodes will inherit those permissions - this is not correct.
Are you converting an existing Karpenter installation from IRSA to EKS Pod Identity?
from terraform-aws-eks.
@bryantbiggs the other role is unrelated to Karpenter, the one affecting my setup is the second one (Karpenter controller IAM role).
When I deployed everything, it originally comes with the policies below:
- AmazonEC2ContainerRegistryReadOnly
- AmazonEKS_CNI_Policy
- AmazonEKSWorkerNodePolicy
As you can see in my logs shared earlier, the karpenter controller pods are failing to some permissions missing:
- ssm:GetParameter
- iam:GetInstanceProfile
- iam:CreateInstanceProfile
- ec2:DescribeImages
- pricing:GetProducts
- ec2:DescribeSpotPriceHistory
I'm not exactly converting, my old cluster is based on IRSA but I created a brand new VPC + EKS + IAM setup everything from scratch, I thought that'd be easier than trying to migrate an existing cluster. So all the roles, vpc + subnets, eks, everything is brand new.
The old cluster runs Karpenter on Fargate, but since Fargate doesn't seem to support Pod Identity, then we followed the new example which uses EKS Node Group to run the Karpenter Controller.
from terraform-aws-eks.
@bryantbiggs I also understand giving permissions to the nodes is not the best way, as that means all the other cluster addons that run on the same node will inherit those permissions.
I did this just to confirm this is what was missing so Karpenter Controller would work and be able to create nodes (which it did).
So now I would like to learn how to properly give permissions only to the Karpenter pods (if that's possible through Pod Identity), but still doesn't change the fact permissions were missing.
I'm wondering if those extra permissions are only required if using SPOT instances for the EKS Node Group? 🤔
Once the cluster looks good to go live, we shall switch back to "on-demand" with reserved instances.
from terraform-aws-eks.
can you try this pattern - I believe its closest to what you are trying to do https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/patterns/karpenter-mng
from terraform-aws-eks.
@bryantbiggs thanks, I just read the README.md and went through the setup example. It looks very similar to the example in this repo btw.
The Karpenter Controller IAM Role has been created with all the permissions that the Karpenter Controller pods were missing, so it seems like the Karpenter Controller pods can't assume this role, otherwise they shouldn't complain about those permissions.
I'm investigating what's missing, as my karpenter submodule looks exactly like both examples.
from terraform-aws-eks.
@bryantbiggs I found the issue in my setup.
The karpenter submodule by default uses "karpenter" as service account for the Pod Identity Association if we don't provide it (like the example).
My helm_release for installing karpenter on the cluster was named "karpenter-mycluster", which is used for creating the service_account in the cluster, so the pods can't get the permission due to service account name mismatch. The example is hardcoded as "karpenter" (which matches the Pod Identity Association).
This can be easily overlooked as you wouldn't think the helm release name matters.
Since it can't be a random string as it must match the exact service_account name used by the karpenter submodule for creating the Pod Identity Association, I opened a PR to update it to use module.karpenter.service_account
, this "link" makes it more clear to users (like me who tends to change names) that the name must match the service account name from karpenter submodule. This could have saved me a few days of investigation.
Thanks for your help and patience explaining things to me about the roles 🙏
from terraform-aws-eks.
This issue has been resolved in version 20.14.0 🎉
from terraform-aws-eks.
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
from terraform-aws-eks.
Related Issues (20)
- Bottlerocket - SelfManaged NodeGroup - extra parameter issue HOT 2
- Error: Unsupported attribute for provider_key_arn when Upgrading to V19 from v18 HOT 6
- Add support for `ignore_failed_scaling_activities` HOT 1
- Add flexibility to choose cloudwatch event rule name HOT 1
- EKS cluster module doesn't create a cluster access entry for SSO users HOT 4
- No default networking add-ons: Terraform waiting for the nodes to be in Ready state (question) HOT 2
- Port 9443 and 8443 should not be added to node nsg unless these modules are installed HOT 2
- ConfigMap "aws-auth": Unauthorized HOT 6
- Can't pass tags to EC2 instance from eks managed node group HOT 1
- Add upgrade_policy config block for aws_eks_cluster HOT 1
- Created ec2 instances cannot join the cluster HOT 1
- Add depends_on for the 'resource "aws_eks_addon" "before_compute"' HOT 1
- dynamic number of access_entires HOT 1
- Documentation needs improvement + linting issue?
- Using terraform <1.6.0, `aws_ec2_tag` with dynamic tag *values* results in for_each error about unknown *keys* HOT 1
- Missing node to node security group
- AWS CLB creation question HOT 1
- 'cluster_service_ipv4_cidr' input variable apparently only accepts RFC1918 HOT 1
- Add gitlab ci for module creation HOT 1
- Use aws_vpc_security_group_egress_rule and aws_vpc_security_group_ingress_rule instead of aws_security_group_rule HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from terraform-aws-eks.