Giter Site home page Giter Site logo

Comments (13)

AlissonRS avatar AlissonRS commented on August 16, 2024 1

@bryantbiggs no I'm not missing, all of those were added just fine, I just didn't add in my example because I think they are irrelevant here as the issue lies in EKS Node Group permissions, the one used to run the Karpenter Controller pods.

The karpenter module itself exposes a config to attach additional policies, but those are the ones used by the nodes created by Karpenter, it's a different role.

from terraform-aws-eks.

bryantbiggs avatar bryantbiggs commented on August 16, 2024

You seem to be missing all of the Karpenter components

################################################################################
# Karpenter
################################################################################
module "karpenter" {
source = "../../modules/karpenter"
cluster_name = module.eks.cluster_name
enable_pod_identity = true
create_pod_identity_association = true
# Used to attach additional IAM policies to the Karpenter node IAM role
node_iam_role_additional_policies = {
AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}
tags = local.tags
}
module "karpenter_disabled" {
source = "../../modules/karpenter"
create = false
}
################################################################################
# Karpenter Helm chart & manifests
# Not required; just to demonstrate functionality of the sub-module
################################################################################
resource "helm_release" "karpenter" {
namespace = "kube-system"
name = "karpenter"
repository = "oci://public.ecr.aws/karpenter"
repository_username = data.aws_ecrpublic_authorization_token.token.user_name
repository_password = data.aws_ecrpublic_authorization_token.token.password
chart = "karpenter"
version = "0.36.1"
wait = false
values = [
<<-EOT
settings:
clusterName: ${module.eks.cluster_name}
clusterEndpoint: ${module.eks.cluster_endpoint}
interruptionQueue: ${module.karpenter.queue_name}
EOT
]
}

from terraform-aws-eks.

bryantbiggs avatar bryantbiggs commented on August 16, 2024

I think you are misunderstanding a few things:

  1. A reproduction should include all of the relevant pieces. If you are talking about this modules Karpenter sub-module, I think that must be included in the reproduction - otherwise, how can I help without knowing what you are doing?
  2. The permissions are there and they match the Karpenter controller IAM policy in the Karpenter repository
    actions = ["pricing:GetProducts"]
  3. The Karpenter controller uses the IAM role created in the Karpenter sub-module to provision nodes. It has nothing to do with the EKS MNG IAM role or its permissions - the EKS MNG IAM role should have very few permissions, only enough to support the VPC CNI operations

from terraform-aws-eks.

AlissonRS avatar AlissonRS commented on August 16, 2024

@bryantbiggs the permission issues only got resolved after I added extra permissions to the EKS module for the Karpenter node group like this (see eks_managed_node_groups.iam_role_additional_policies below, and also the eks_karpenter_controller_policy):

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.13.1"

  cluster_name                   = var.cluster_name
  cluster_version                = var.cluster_version
  cluster_endpoint_public_access = true
  enable_cluster_creator_admin_permissions = false

  kms_key_enable_default_policy = true

  eks_managed_node_groups = {
    karpenter_group = {
      instance_types  = ["t3.small"]

      subnet_ids      = module.vpc.private_subnets

      # These extra permissions are required by Karpenter Controller pods
      iam_role_additional_policies = {
        AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
        AmazonEC2FullAccess           = "arn:aws:iam::aws:policy/AmazonEC2FullAccess"
        additional                         = aws_iam_policy.eks_karpenter_controller_policy.arn
      }

      min_size     = 2
      max_size     = 3
      desired_size = 2

      capacity_type        = "SPOT"

      taints = {
        # This Taint aims to keep just EKS Addons and Karpenter running on this MNG
        # The pods that do not tolerate this taint should run on nodes created by Karpenter
        addons = {
          key    = "CriticalAddonsOnly"
          value  = "true"
          effect = "NO_SCHEDULE"
        },
      }
    }
  }

  cluster_addons = {
    coredns = {
      most_recent = true
    }
    kube-proxy = {
      most_recent = true
    }
    eks-pod-identity-agent = {
      most_recent = true
    }
    aws-ebs-csi-driver = {
      most_recent              = true
      service_account_role_arn = module.ebs_csi_driver_irsa.iam_role_arn
    }
    vpc-cni = {
      most_recent = true
    }
  }

  vpc_id                   = module.vpc.vpc_id
  subnet_ids               = concat(module.vpc.private_subnets, module.vpc.intra_subnets)
  control_plane_subnet_ids = concat(module.vpc.private_subnets, module.vpc.intra_subnets)

  tags = merge(local.common_tags, {
    # NOTE - if creating multiple security groups with this module, only tag the
    # security group that Karpenter should utilize with the following tag
    # (i.e. - at most, only one security group should have this tag in your account)
    "karpenter.sh/discovery" = var.cluster_name
  })
}

resource "aws_iam_policy" "eks_karpenter_controller_policy" {
  name        = "Karpenter-controller-${var.cluster_name}-policy"
  path        = "/"
  description = "Additional policies attached to the Karpenter Controller which runs on EKS Node Group."

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = [
          "pricing:*",
          "iam:*",
        ]
        Effect   = "Allow"
        Resource = "*"
      },
    ]
  })

  tags = local.common_tags
}

I will also share the extra components you said I'm missing, but I just didn't add because I don't think the problem is related to them:

module "karpenter" {
  source  = "terraform-aws-modules/eks/aws//modules/karpenter"
  version = "~> 20.13.1"

  cluster_name = module.eks.cluster_name

  enable_pod_identity             = true
  create_pod_identity_association = true

  node_iam_role_additional_policies = {
    AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
  }

  tags = local.common_tags
}


# This will install karpenter on the EKS cluster
resource "helm_release" "karpenter" {
  namespace        = "karpenter"
  create_namespace = true

  name       = "karpenter-${var.cluster_name}"
  repository = "oci://public.ecr.aws/karpenter"
  chart      = "karpenter"
  version    = "0.37.0"

  values = [
    <<-EOT
    settings:
      clusterName: ${module.eks.cluster_name}
      clusterEndpoint: ${module.eks.cluster_endpoint}
    EOT
  ]

  depends_on = [
    module.eks.cluster_id
  ]
}

@bryantbiggs let me know if you also want me to share my NodePools and EC2NodeClasses.

from terraform-aws-eks.

AlissonRS avatar AlissonRS commented on August 16, 2024

@bryantbiggs you seem to be misunderstanding my report.

There are two IAM Roles created by the whole setup:

  1. The one attached to the EKS Node Group where Karpenter Controller pods run
  2. The one used by the nodes created by Karpenter to run other workload

The error logs I showed are from the Karpenter Controller, which is missing some permissions.

The only way I managed to fix this, was by attaching the extra permissions required by the Karpenter Controller in the EKS module which is attached to the EKS Node Group, so not the Karpenter module.

from terraform-aws-eks.

bryantbiggs avatar bryantbiggs commented on August 16, 2024

There are two IAM Roles created by the whole setup:

False - there are three roles in your setup.

  1. The IAM role used by nodes created by EKS MNG
  2. The Karpenter controller IAM role - used for creating/removing nodes that it launches
  3. The IAM role used by the nodes that Karpenter creates - these permissions will be very similar to the IAM role used by nodes created by EKS MNG

you are giving the node IAM role the permissions, which means anything that runs on the nodes will inherit those permissions - this is not correct.

Are you converting an existing Karpenter installation from IRSA to EKS Pod Identity?

from terraform-aws-eks.

AlissonRS avatar AlissonRS commented on August 16, 2024

@bryantbiggs the other role is unrelated to Karpenter, the one affecting my setup is the second one (Karpenter controller IAM role).

When I deployed everything, it originally comes with the policies below:

  • AmazonEC2ContainerRegistryReadOnly
  • AmazonEKS_CNI_Policy
  • AmazonEKSWorkerNodePolicy

As you can see in my logs shared earlier, the karpenter controller pods are failing to some permissions missing:

  • ssm:GetParameter
  • iam:GetInstanceProfile
  • iam:CreateInstanceProfile
  • ec2:DescribeImages
  • pricing:GetProducts
  • ec2:DescribeSpotPriceHistory

I'm not exactly converting, my old cluster is based on IRSA but I created a brand new VPC + EKS + IAM setup everything from scratch, I thought that'd be easier than trying to migrate an existing cluster. So all the roles, vpc + subnets, eks, everything is brand new.

The old cluster runs Karpenter on Fargate, but since Fargate doesn't seem to support Pod Identity, then we followed the new example which uses EKS Node Group to run the Karpenter Controller.

from terraform-aws-eks.

AlissonRS avatar AlissonRS commented on August 16, 2024

@bryantbiggs I also understand giving permissions to the nodes is not the best way, as that means all the other cluster addons that run on the same node will inherit those permissions.

I did this just to confirm this is what was missing so Karpenter Controller would work and be able to create nodes (which it did).

So now I would like to learn how to properly give permissions only to the Karpenter pods (if that's possible through Pod Identity), but still doesn't change the fact permissions were missing.

I'm wondering if those extra permissions are only required if using SPOT instances for the EKS Node Group? 🤔

Once the cluster looks good to go live, we shall switch back to "on-demand" with reserved instances.

from terraform-aws-eks.

bryantbiggs avatar bryantbiggs commented on August 16, 2024

can you try this pattern - I believe its closest to what you are trying to do https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/patterns/karpenter-mng

from terraform-aws-eks.

AlissonRS avatar AlissonRS commented on August 16, 2024

@bryantbiggs thanks, I just read the README.md and went through the setup example. It looks very similar to the example in this repo btw.

The Karpenter Controller IAM Role has been created with all the permissions that the Karpenter Controller pods were missing, so it seems like the Karpenter Controller pods can't assume this role, otherwise they shouldn't complain about those permissions.

I'm investigating what's missing, as my karpenter submodule looks exactly like both examples.

from terraform-aws-eks.

AlissonRS avatar AlissonRS commented on August 16, 2024

@bryantbiggs I found the issue in my setup.

The karpenter submodule by default uses "karpenter" as service account for the Pod Identity Association if we don't provide it (like the example).

My helm_release for installing karpenter on the cluster was named "karpenter-mycluster", which is used for creating the service_account in the cluster, so the pods can't get the permission due to service account name mismatch. The example is hardcoded as "karpenter" (which matches the Pod Identity Association).

This can be easily overlooked as you wouldn't think the helm release name matters.

Since it can't be a random string as it must match the exact service_account name used by the karpenter submodule for creating the Pod Identity Association, I opened a PR to update it to use module.karpenter.service_account, this "link" makes it more clear to users (like me who tends to change names) that the name must match the service account name from karpenter submodule. This could have saved me a few days of investigation.

Thanks for your help and patience explaining things to me about the roles 🙏

from terraform-aws-eks.

antonbabenko avatar antonbabenko commented on August 16, 2024

This issue has been resolved in version 20.14.0 🎉

from terraform-aws-eks.

github-actions avatar github-actions commented on August 16, 2024

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

from terraform-aws-eks.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.