particuleio / teks Goto Github PK

View Code? Open in Web Editor NEW

315.0 17.0 77.0 1.65 MB

Full feature EKS cluster with Terragrunt/Terraform

Home Page: https://particuleio.github.io/teks/

License: Apache License 2.0

HCL 88.67% Shell 11.33%

kubernetes kubernetes-cluster aws eks terragrunt terraform external-dns kiam cluster-autoscaler addons

teks's Introduction

tEKS

Terraform/Terragrunt
Contributing
Requirements
- Terragrunt
Quickstart
Main purposes
What you get
Curated Features
Requirements
- Pre-commit
- ASDF
  - Enabling plugins
  - Installing tools
Examples
Additional infrastructure blocks
Branches
License

tEKS is a set of Terraform / Terragrunt modules designed to get you everything you need to run a production EKS cluster on AWS. It ships with sensible defaults, and add a lot of common addons with their configurations that work out of the box.

This is our opinionated view of what a well structred infrastructure as code repository should look like.

⚠️ the v5 and further version of this project have been completely revamp and now offer a skeleton to use as a base for your infrastructure projects around EKS. All the modules have been moved outside this repository and get their own versioning. The old README is accessible here

⚠️ Terraform implementation will not be maintained anymore because of time, and mostly because it has become quite difficult to get feature parity with Terragrunt. Archive branch is available here

Terraform/Terragrunt

Terragrunt implementation is available in the terragrunt folder.

Contributing

Contribution are welcome, as well as issues, we are usually quite reactive. If you need more support for your project, do not hesitate to reach us directly.

Requirements

Terragrunt

Quickstart

Quickstart guide is available here or on the official documentation website

Main purposes

The main goal of this project is to glue together commonly used tooling with Kubernetes/EKS and to get from an AWS Account to a production cluster with everything you need without any manual configuration.

What you get

A production cluster all defined in IaaC with Terraform/Terragrunt:

AWS VPC if needed based on terraform-aws-vpc
EKS cluster base on terraform-aws-eks
Kubernetes addons based on terraform-kubernetes-addons: provides various addons that are often used on Kubernetes and specifically on EKS. This module is currated by Particule and well maintained.

Everything is tied together with Terragrunt and allows you to deploy a multi cluster architecture in a matter of minutes.

Curated Features

The additional features are provided by tEKS here as well as our curated addons module which support a bunch of various configuration.

Bottlerocket support

Bottlerocket OS is available for node groups (see example here). Bottle rocket is a container centric OS with less attack surface and no default shell.

AWS Session Manager by default

All the instances (Bottlerocket or Amazon Linux) are registered with AWS Session Manager. No SSH keys or SSH access is open on instances. Shell access on every instance can be given with SSM for added security.

aws ssm start-session --target INSTANCE_ID

From and to Zero scaling with EKS Managed Node Groups

tEKS support scaling to and from 0, even with using well know Kubernetes labels, there are a number of ongoing issues for support of EKS Managed node groups with Cluster Autoscaler. Thanks to automatic ASG tagging, tEKS adds the necessary tags on autoscaling group to balance similar node groups and allow you to scale to and from 0 and even to use well know labels such as node.kubernetes.io/instance-type or topology.kubernetes.io/zone . The logic can be extended to support other well known labels.

Automatic dependencies upgrade

We are using renovate to automatically open PR with the latest dependencies update (Terraform modules upgrade) so you never miss an upgrade and are alwasy up to date with the latest features.

Enforced security

Encryption by default for root volume on instances with Custom KMS Key
AWS EBS CSI volumes encrypted by default with Custom KMS Key
No IAM credentials on instances, everything is enforced with IRSA.
Each addons is deployed in it's own namespace with sensible default network policies.
Calico Tigera Operator for network policy.
PSP are enabled but not enforced because of depreciation.

Out of the box logging

Three stacks are supported:

AWS for Fluent Bit: Forward containers logs to Cloudwatch Logs
Grafana Loki: Uses Promtail to forward logs to Loki. Grafana or a tEKS supported monitoring stack (see below) is necessary to display logs.

Out of the box monitoring

Prometheus Operator with defaults dashboards
Addons that support metrics are enable along with their serviceMonitor
Custom grafana dashboard are available by default

Two stacks are supported:

Victoria Metrics Stack: Victoria Metrics is a Prometheus alertnative, compatible with prometheus CRDs
Kube Prometheus Stack: Classic Prometheus Monitoring

Long term storage with Thanos

With Prometheus, tEKS includes Thanos by default. Thanos uses S3 to store and query metrics, offering long term storage without the costs. For more information check out our article on the CNCF Blog

Support for ARM instances

With either Amazon Linux or BottleRocket, you can use a mix of ARM and AMD64 instances. Check out our example

Helm v3 provider

All addons support Helm v3 configuration
All charts are easily customizable

Other and not limited to

priorityClasses for addons and critical addons
lot of manual stuff have been automated under the hood

Always up to date

We always support the latest modules and features for our addons module.

Our cutting edges addons include (not limited to):

AWS EBS CSI Drivers: Support for Volume encryption by default, snapshot, etc
AWS EFS CSI Drivers: Use AWS NFS shares.
Secret Store CSI Driver: load secret from Secret Managers with aws-secret-store-csi-driver driver
Linkerd2 or Certificate Manager CSI for mTLS

Requirements

Terragrunt is not a hard requirement but all the modules are tested with Terragrunt.

Pre-commit

This repository use pre-commit hooks, please see this on how to setup tooling

ASDF

ASDF is a package manager which is great for managing cloud native tooling. More info here (eg. French).

Enabling plugins

for p in $(cut -d " " .tool-versions -f1); do asdf plugin add $p; done

Installing tools

asdf install

Examples

terragrunt/live folder provides an opinionated directory structure for a production environment.

Additional infrastructure blocks

If you wish to extend your infrastructure you can pick up additional modules on the particuleio github page. Some modules can also be found on the clusterfrak-dynamics github page.

Branches

main: Backward incompatible with v1.X but compatible with v2.X, releases bumped to v3.X because a lot has changed.
release-1.X: Compatible with Terraform < 0.12 and Terragrunt < 0.19. Be sure to target the same modules version.
release-2.X: Compatible with Terraform >= 0.12 and Terragrunt >= 0.19. Be sure to target the same modules version.

License

teks's People

Contributors

Stargazers

Watchers

Forkers

fossabot ricebowljr rguichard destygo satyaelipe zopz robertdigital polarpoint-io laashub-soa bainss manaatmoon timurgaleev clushie lefthander akurtasinski hcavarsan rongyj jamiecyber tmahalligan tbobm mytestlab123 s4rd1nh4 tenhishadow vaultgitorg rchenzheng devopsexamples jvidalg andriibarabash ithings-ch sharmaansh21 anillingutla tamahiko vineetx avaussant mglotov tlloydaubry shnikita umairedu dinidu enixdark halasystems huangwind205 bersanf armenr nikoofficial svg153 thatarchguy zestrells elioseverojunior skyridge-inc eliodevbr diegodejesus kkingkill apurvara devops-corner basdemir geeksikhsecurity hellfireclub2020 applike-ss okikiolumide calsaviour nabilblk mmehta-10 instagrim-dev dimensie10 mameshini seifrajhi terraform-blueprints bocan l0ris flyingtwigs deltacodepl zilberd-forks platformengineerid airwalk-mj luong-komorebi

teks's Issues

Error: Invalid Configuration for Read-Only Attribute

Hi TEKS team,

I am encountering an error with eks-addons where I am seeing this error come up numerous times after updating to latest addons / teks release. I get this error when destroying eks-addons-critical, and applying it to an already existing one.

Am I missing something or doing anything wrong?

╷
│ Error: Invalid Configuration for Read-Only Attribute
│
│   with tls_cert_request.thanos-tls-querier-cert-csr,
│   on thanos-tls-querier.tf line 138, in resource "tls_cert_request" "thanos-tls-querier-cert-csr":
│  138:   key_algorithm   = "ECDSA"
│
│ Cannot set value for this attribute as the provider has marked it as
│ read-only. Remove the configuration line setting the value.
│
│ Refer to the provider documentation or contact the provider developers for
│ additional information about configurable and read-only attributes that are
│ supported.
╵
╷
│ Error: Invalid Configuration for Read-Only Attribute
│
│   with tls_self_signed_cert.thanos-tls-querier-ca-cert,
│   on thanos.tf line 350, in resource "tls_self_signed_cert" "thanos-tls-querier-ca-cert":
│  350:   key_algorithm     = "ECDSA"
│
│ Cannot set value for this attribute as the provider has marked it as
│ read-only. Remove the configuration line setting the value.
│
│ Refer to the provider documentation or contact the provider developers for
│ additional information about configurable and read-only attributes that are
│ supported.

terragrunt version v0.38.6
Terraform version v1.2.5

demo cluster is not working

Hello All,
I am trying to spin up the demo cluster and by default, it is not working to create the managed worker nodes. Is this a bug in the latest AMI? Or is it an issue with something I am doing?

Custom VPC endponts list and the module resources?

When deploying w/o NAT gateway, what is the expected pattern for giving a list of required VPC endpoints, with EKS cluster security group ID(s) and policies (aws_iam_policy_document) maybe?

To my understanding, endpoints would be the right place to add it there?

FTR, I want to follow that guide to deploy on a private VPC and Fargate workers instead of EC2, so I need the following PrivateLinks (VPC endpoints):

Interface endpoints for ECR (both ecr.api and ecr.dkr) to pull container images
A gateway endpoint for S3 to pull the actual image layers
An interface endpoint for EC2
An interface endpoint for STS to support Fargate and IAM Roles for Services Accounts
An interface endpoint for CloudWatch logging (logs) if CloudWatch logging is required

What I couldn't get is, how/should I specify other module inputs for endpoints then: aws_iam_policy_document and security_group, like it is shown in example?

Please confirm if that can be done like that:

the rules defined in node_security_group_additional_rules, merged with defaults that terraform-eks-aws provides via local cluster_security_group_rules, should be pre-created by tEKS. And then passed into terraform-aws-eks via external cluster_security_group_id = "sg-xxxx" and setting create_cluster_security_group = false

how to manage secrets in terragrunt

Hello @ArchiFleKs,

how would you manage secrets that would be part of the extra_values in the helm addons you extend in the terragrunt.hcl files ?

some preferences to use AWS SSM and not Vault (because it will be part of the cluster).

Thank you for your advices.

Thanos Query not able to fetch data from Thanos Store

Only 2 hour data is visible in grafna dashbord i have checked promethus also getting only 2 hour of data . And data is pushed to s3 bucket
my terragrunt.hcl file in eks-adons folder is
include {
path = "${find_in_parent_folders()}"
}

terraform {
source = "github.com/particuleio/terraform-kubernetes-addons.git//modules/aws?ref=v2.1.0"
}

dependency "eks" {
config_path = "../eks"

mock_outputs = {
cluster_id = "cluster-name"
cluster_oidc_issuer_url = "https://oidc.eks.eu-west-3.amazonaws.com/id/0000000000000000"
}
}

dependency "vpc" {
config_path = "../vpc"

mock_outputs = {
private_subnets_cidr_blocks = [
"privateip.cidr",
"privateip.cidr"
]
}
}

generate "provider" {
path = "provider.tf"
if_exists = "overwrite"
contents = <<-EOF
provider "aws" {
region = "${local.aws_region}"
}
provider "kubectl" {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
token = data.aws_eks_cluster_auth.cluster.token
load_config_file = false
}
provider "kubernetes" {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
token = data.aws_eks_cluster_auth.cluster.token
}
provider "helm" {
kubernetes {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
token = data.aws_eks_cluster_auth.cluster.token
}
}
data "aws_eks_cluster" "cluster" {
name = var.cluster-name
}
data "aws_eks_cluster_auth" "cluster" {
name = var.cluster-name
}
EOF
}

locals {
aws_region = yamldecode(file("${find_in_parent_folders("region_values.yaml")}"))["aws_region"]
custom_tags = merge(
yamldecode(file("${find_in_parent_folders("global_tags.yaml")}")),
yamldecode(file("${find_in_parent_folders("env_tags.yaml")}"))
)
default_domain_name = yamldecode(file("${find_in_parent_folders("global_values.yaml")}"))["default_domain_name"]
default_domain_suffix = "${local.custom_tags["Env"]}.${local.custom_tags["Project"]}.${local.default_domain_name}"
}

inputs = {

cluster-name = dependency.eks.outputs.cluster_id

tags = merge(
local.custom_tags
)

eks = {
"cluster_oidc_issuer_url" = dependency.eks.outputs.cluster_oidc_issuer_url
}

aws-ebs-csi-driver = {
enabled = true
is_default_class = true
}

aws-for-fluent-bit = {
enabled = true
}

test this with nginx controller

aws-load-balancer-controller = {
enabled = true
}

aws-node-termination-handler = {
enabled = false
}

calico = {
enabled = true
}

cert-manager = {
enabled = false
acme_email = "[email protected]"
acme_http01_enabled = true
acme_http01_ingress_class = "nginx"
acme_dns01_enabled = true
allowed_cidrs = dependency.vpc.outputs.private_subnets_cidr_blocks
experimental_csi_driver = true
}

cluster-autoscaler = {
enabled = true
}

cni-metrics-helper = {
enabled = false
}

external-dns = {
external-dns = {
enabled = true
},
}

ingress-nginx = {
enabled = true
use_l7 = true
allowed_cidrs = dependency.vpc.outputs.private_subnets_cidr_blocks
}

istio-operator = {
enabled = false
}

karma = {
enabled = false
}

keycloak = {
enabled = false
}

kong = {
enabled = false
}

kube-prometheus-stack = {
enabled = true
allowed_cidrs = dependency.vpc.outputs.private_subnets_cidr_blocks
thanos_sidecar_enabled = true
thanos_bucket_force_destroy = true
extra_values = <<-EXTRA_VALUES
grafana:
deploymentStrategy:
type: Recreate
ingress:
enabled: true
#paths:
# - /grafana
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: "letsencrypt"
hosts:
- grafana.${local.default_domain_suffix}
#tls:
# - secretName: grafana.${local.default_domain_suffix}
# hosts:
# - grafana.${local.default_domain_suffix}
persistence:
enabled: true
storageClassName: ebs-sc
accessModes:
- ReadWriteOnce
size: 1Gi
prometheus:
ingress:
enabled: true
#paths:
# - /prometheus
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: "letsencrypt"
hosts:
- prometheus.${local.default_domain_suffix}
#tls:
# - secretName: prometheus.${local.default_domain_suffix}
# hosts:
# - prometheus.${local.default_domain_suffix}
prometheusSpec:
additionalScrapeConfigs:
- job_name: 'divum'
scrape_interval: 5s
ec2_sd_configs:
- region: ap-south-1
port: 9100
# This should not be here!
# check: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config, prometheus/prometheus#5738, https://www.robustperception.io/automatically-monitoring-ec2-instances
access_key: xxxxxxxxxx
secret_key: xyz
relabel_configs:
- source_labels: [__meta_ec2_tag_Name]
action: keep
- source_labels: [__meta_ec2_tag_Name]
target_label: instance
- source_labels: [__meta_ec2_public_ip]
target_label: ip
- source_labels: [__meta_ec2_tag_release_env,__meta_ec2_tag_service_name]
separator: ' | '
target_label: job
replicas: 1
retention: 2d
retentionSize: "6GB"
ruleSelectorNilUsesHelmValues: false
serviceMonitorSelectorNilUsesHelmValues: false
podMonitorSelectorNilUsesHelmValues: false
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: ebs-sc
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
alertmanager:
ingress:
enabled: true
#paths:
# - /alert-manager
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: "letsencrypt"
hosts:
- alert-manager.${local.default_domain_suffix}
#tls:
# - secretName: alert-manager.${local.default_domain_suffix}
# hosts:
# - alert-manager.${local.default_domain_suffix}
EXTRA_VALUES
}

loki-stack = {
enabled = false
bucket_force_destroy = true
}

metrics-server = {
enabled = true
allowed_cidrs = dependency.vpc.outputs.private_subnets_cidr_blocks
}

npd = {
enabled = false
}

sealed-secrets = {
enabled = false
}

thanos = {
enabled = true
generate_ca = true
bucket_force_destroy = true
}

}

thanos sidecar
--prometheus.url=http://127.0.0.1:9090/

--grpc-address=[$(POD_IP)]:10901
--http-address=[$(POD_IP)]:10902
--objstore.config=$(OBJSTORE_CONFIG)
--tsdb.path=/prometheus
--log.level=info
--log.format=logfmt

thanos query

--log.level=info
--log.format=logfmt
--grpc-address=0.0.0.0:10901
--http-address=0.0.0.0:10902
--query.replica-label=prometheus_replica
--store=dnssrv+_grpc._tcp.kube-prometheus-stack-thanos-discovery.monitoring.svc.cluster.local
--store=dnssrv+_grpc._tcp.thanos-storegateway.monitoring.svc.cluster.local
--query.timeout=5m
--query.lookback-delta=15m
--query.replica-label=rule_replica

thanos store

--log.level=info
--log.format=logfmt
--grpc-address=0.0.0.0:10901
--http-address=0.0.0.0:10902
--data-dir=/data
--objstore.config-file=/conf/objstore.yml
--ignore-deletion-marks-delay=24h

could you please help me out because this running on prod env in eks cluster
in Grafana datsource is promethus and url is
http://thanos-query-frontend:9090

Issue creating KMS key

For testing out tEKS i don't want to use KMS for ebs volume encryption, however the module insists on creating resources and fails.

This is the failing resource:

 # aws_kms_key.this will be created
  + resource "aws_kms_key" "this" {
      + arn                                = (known after apply)
      + bypass_policy_lockout_safety_check = false
      + customer_master_key_spec           = "SYMMETRIC_DEFAULT"
      + description                        = "EKS Secret Encryption Key for my-foo-cluster"
      + enable_key_rotation                = true
      + id                                 = (known after apply)
      + is_enabled                         = true
      + key_id                             = (known after apply)
      + key_usage                          = "ENCRYPT_DECRYPT"
      + multi_region                       = false
      + policy                             = jsonencode(
            {
              + Statement = [
                  + {
                      + Action    = "kms:*"
                      + Effect    = "Allow"
                      + Principal = {
                          + AWS = "arn:aws:iam::123456789012:root"
                        }
                      + Resource  = "*"
                      + Sid       = "Enable IAM User Permissions"
                    },
                  + {
                      + Action    = [
                          + "kms:ReEncrypt*",
                          + "kms:GenerateDataKey*",
                          + "kms:Encrypt",
                          + "kms:DescribeKey",
                          + "kms:Decrypt",
                        ]
                      + Effect    = "Allow"
                      + Principal = {
                          + AWS = "arn:aws:iam::123456789012:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling"
                        }
                      + Resource  = "*"
                      + Sid       = "Allow service-linked role use of the CMK"
                    },
                  + {
                      + Action    = "kms:CreateGrant"
                      + Condition = {
                          + Bool = {
                              + "kms:GrantIsForAWSResource" = [
                                  + "true",
                                ]
                            }
                        }
                      + Effect    = "Allow"
                      + Principal = {
                          + AWS = "arn:aws:iam::123456789012:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling"
                        }
                      + Resource  = "*"
                      + Sid       = "Allow attachment of persistent resources"
                    },
                ]
              + Version   = "2012-10-17"
            }
        )
      + tags                               = {
          + "Environment" = "foo"
          + "Owner"       = "me"
          + "Project"     = "teks"
        }
      + tags_all                           = {
          + "Environment" = "foo"
          + "Owner"       = "me"
          + "Project"     = "teks"
        }
    }

And this is the error message i get:

╷
│ Error: error creating KMS Key: MalformedPolicyDocumentException: Policy contains a statement with one or more invalid principals.
│ 
│   with aws_kms_key.this,
│   on main.tf line 1, in resource "aws_kms_key" "this":
│    1: resource "aws_kms_key" "this" {
│ 
╵

The role AWSServiceRoleForAutoScaling does not exist yet.

EKS Cluster creation failed in region us-east-1

Hi,

Thank you for a great project.
While I was following getting started guide, I did not fully understand the procedure.

 ➜ tg apply -auto-approve
[terragrunt] [/Users/amitkarpe/code/terraform/tg/first/teks/terragrunt/live/mycluster/us-east-1/clusters/full/eks] 2021/01/27 17:19:49 Running command: terraform --version
[terragrunt] 2021/01/27 17:19:49 Terraform version: 0.14.5
[terragrunt] 2021/01/27 17:19:49 Reading Terragrunt config file at /Users/amitkarpe/code/terraform/tg/first/teks/terragrunt/live/mycluster/us-east-1/clusters/full/eks/terragrunt.hcl
[terragrunt] [/Users/amitkarpe/code/terraform/tg/first/teks/terragrunt/live/mycluster/us-east-1/clusters/full/vpc] 2021/01/27 17:19:49 Generated file /Users/amitkarpe/code/terraform/tg/first/teks/terragrunt/live/mycluster/us-east-1/clusters/full/vpc/.terragrunt-cache/569303351/backend.tf.
[terragrunt] [/Users/amitkarpe/code/terraform/tg/first/teks/terragrunt/live/mycluster/us-east-1/clusters/full/vpc] 2021/01/27 17:19:49 Running command: terraform init -get=false -get-plugins=false
[terragrunt] [/Users/amitkarpe/code/terraform/tg/first/teks/terragrunt/live/mycluster/us-east-1/clusters/full/vpc] 2021/01/27 17:19:53 Running command: terraform output -json
Failed to load state: AuthorizationHeaderMalformed: The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'eu-west-3'
	status code: 400, request id: E0C2281C2098748E, host id: yOzy5JSeIdmHlj/m2M+rkKD9KY86uCrLQ+1xtp+Rp+jmYFYVaDmixcwaiy3KdvvcwKxLWkFvEJ0=
[terragrunt] 2021/01/27 17:20:01 exit status 1

I created cloned here, copied demo into mycluster folder.
While running terragrunt apply, it failed with the above errors. Can you please let me know how to follow the guide?

Also, I want to know do you have a slack channel or any community forum to a discussion about further contribution?

error: You must be logged in to the server (Unauthorized)

When running this the user role assumedRole doesn't have authorisation, when executing the second after_hook in the eks module.

after_hook "kubeconfig" {
commands = ["apply"]
execute = ["bash", "-c", "terraform output kubeconfig 2>/dev/null > ${get_terragrunt_dir()}/kubeconfig"]
}

after_hook "kube-system-label" {
commands = ["apply"]
execute = ["bash", "-c", "kubectl --kubeconfig ${get_terragrunt_dir()}/kubeconfig label ns kube-system name=kube-system --overwrite"]
}

VPC dependency datasources error

Hello
Hello, I'm trying to use your template to create an eks cluster, but when I run the terragrunt plan command from the vpc directory, it returns the error below.

datasources is a dependency of /home/user/projects/vitta/terraform-live/aws-eks/development/us-east-1/vpc/terragrunt.hcl but detected no outputs. Either the target module has not been applied yet, or the module has no outputs. If this is expected, set the skip_outputs flag to true on the dependency block.

even running the command terragrunt run-all plan I got the same error.

I tried adding the skip_outputs flag to true on the dependency block, but without success.

could you let me know if I'm doing something wrong, I'm following the readme but still not getting success

New AWS account has no iam role AWSServiceRoleForAutoScaling

Thanks for providing this configuration example. It helped me enormously understanding how to configure eks with terraform and terragrunt.

But trying to create an eks cluster in a new aws account I had the issue that the aws kms key for eks root volume encryption could not be created with the error message:

error creating KMS Key: MalformedPolicyDocumentException: Policy contains a statement with one or more invalid principals.

The reason as far as I can tell is, that in github.com/particuleio/terraform-aws-kms.git the following role is reverenced by the eks root volume encryption policy:

"arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling"

But the role is not available in a new AWS account and is only created if an EC2 auto scaling group is created. I could fix the issue by creating an EC2 auto scaling group and afterwards deleting it again.

I think the role can be created directly by terraform, but I am no expert. Otherwise, adding some information to the documentation regarding this pitfall could help others trying to use this template.

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

This repository currently has no open or pending branches.

Detected dependencies

github-actions

.github/workflows/mkdocs.yml

actions/checkout v4

.github/workflows/renovate.yml

actions/checkout v4

actions/setup-node v4

ubuntu 22.04

.github/workflows/terraform.yml

actions/checkout v4

asdf-vm/actions v3

actions/setup-python v5

aws-actions/configure-aws-credentials v4

pre-commit/action v3.0.1

voxmedia/github-action-slack-notify-build v2

actions/checkout v4

cycjimmy/semantic-release-action v2

voxmedia/github-action-slack-notify-build v2

voxmedia/github-action-slack-notify-build v2

ubuntu 22.04

ubuntu 22.04

terraform

terragrunt/modules/datasources/main.tf

aws >= 3.72

hashicorp/terraform >= 1.0

terragrunt/provider-config/aws/aws.tf

terragrunt/provider-config/eks-addons/eks-addons.tf

terragrunt/provider-config/eks/eks.tf

terragrunt

terragrunt/live/production/eu-west-1/clusters/demo/ebs-encryption/terragrunt.hcl

github.com/terraform-aws-modules/terraform-aws-kms v2.2.1

terragrunt/live/production/eu-west-1/clusters/demo/eks-addons-critical/terragrunt.hcl

github.com/particuleio/terraform-kubernetes-addons v15.3.0

terragrunt/live/production/eu-west-1/clusters/demo/eks-addons/terragrunt.hcl

github.com/particuleio/terraform-kubernetes-addons v15.3.0

terragrunt/live/production/eu-west-1/clusters/demo/eks/terragrunt.hcl

github.com/terraform-aws-modules/terraform-aws-eks v20.8.5

terragrunt/live/production/eu-west-1/clusters/demo/vpc-endpoints/terragrunt.hcl

github.com/terraform-aws-modules/terraform-aws-vpc v5.8.1

terragrunt/live/production/eu-west-1/clusters/demo/vpc/terragrunt.hcl

github.com/terraform-aws-modules/terraform-aws-vpc v5.8.1

terragrunt/live/production/eu-west-1/datasources/terragrunt.hcl

terragrunt/live/production/terragrunt.hcl

Check this box to trigger a request for Renovate to run again on this repository

Multi Cloud Support

Hi Particule IO Team!

We are getting ready to implement terragrunt and kubernetes in Azure and/or GCP. We love what you have done with teks and appreciate all the hard work. Is there anyway you have suggestions or a template / way of doing teks and terragrunt for Azure and GCP?

Thank you for all your support so far!

Zach

block_device_mappings.0.ebs.0.kms_key_id" (arn:::aws) is an invalid ARN: arn: not enough sections

Hi, I'm using your project to create an eks cluster, but I'm having this error., can you explain if I'm doing something wrong?

Error: "block_device_mappings.0.ebs.0.kms_key_id" (arn:::aws) is an invalid ARN: arn: not enough sections

with module.eks_managed_node_group["default-a"].aws_launch_template.this[0],
on modules/eks-managed-node-group/main.tf line 45, in resource "aws_launch_template" "this":
45: resource "aws_launch_template" "this" {

Error: "block_device_mappings.1.ebs.0.kms_key_id" (arn:::aws) is an invalid ARN: arn: not enough sections

Issue with version 18.30 of terraform-aws-eks

Looks like they added data "aws_default_tags" "current" {}
and that is already part of the provider-aws.tf file.. and due to that we get this error:

│ Error: Duplicate data "aws_default_tags" configuration
│
│   on provider-aws.tf line 12:
│   12: data "aws_default_tags" "current" {}
│
│ A aws_default_tags data resource named "current" was already declared at
│ main.tf:3,1-34. Resource names must be unique per type in each module.

"cert-manager" has no deployed releases

I'm trying to run EKS, and after many errors, I get to finish the apply once.

Now, if I try to destroy or to apply again, it shows always this error:

helm_release.cert-manager[0]: Modifying... [id=cert-manager]
>
> Error: "cert-manager" has no deployed releases
> 
>   with helm_release.cert-manager[0],
>   on cert-manager.tf line 117, in resource "helm_release" "cert-manager":
> 117: resource "helm_release" "cert-manager" {

Someone had the same issue?

Issue with the eks-asg-tags.tf

Hello @ArchiFleKs , looks like I found another issue. when I provide multiple subnet_ids for a managed node group

      subnet_ids              = [dependency.vpc.outputs.private_subnets[0], dependency.vpc.outputs.private_subnets[1], dependency.vpc.outputs.private_subnets[2]]

I get this error with the eks-asg-tags.tf.

│ Error: Invalid function argument
│ 
│   on eks-asg-tags.tf line 44, in resource "null_resource" "node_groups_asg_tags":
│   44:   "Value" : one(data.aws_autoscaling_group.node_groups[each.key].availability_zones),
│     ├────────────────
│     │ data.aws_autoscaling_group.node_groups is object with 4 attributes
│     │ each.key is "gpu"
│ 
│ Invalid value for "list" parameter: must be a list, set, or tuple value
│ with either zero or one elements.

Error when running terragrunt plan in the EKS Module

First I would like to thank for this great repository and the code here, it's pure gold.

Sadly I'm experiencing an error which for a beginner like me isn't very simple to find out why it happens.

Error: Failed to instantiate provider "kubectl" to obtain schema: fork/exec /Users/dnetzer/Repositories/Vonage/vgai-studio-infra/live/dev/eu-west-1/eks/.terragrunt-cache/Ug5X0EQf-ySvHn_k0tDzaYfjHuo/vo8pQqWUeCu_1_TBy7LGvx51SW0/terraform-provider-kubectl: exec format error

This is the error I get when I run terragrunt plan, after I run a successful terragrunt init in the EKS module.
I tried using kubectl 1.17.11 (similar to what you use in the repo), and I've tried with 19.0.0 (latest stable).

EDIT 1:
updated terragrunt to v0.23.40 which uses terraform v0.13.2, and now when I run terragrunt init it fails with the following error:

Error: Failed to install provider

Error while installing hashicorp/kubectl: provider registry
registry.terraform.io does not have a provider named
registry.terraform.io/hashicorp/kubectl

Any help would be much appreciated.

Last version v1.7.0 not valid sample file/directory

In the last version release no more time ago, the sample directory and files no compatible with the latest changes and improvement does to code.

Addition the last version of terragrunt v0.18.7 it is not compatible with terraform 0.12.1 issue

Terragrunt Structure for China and Gov Cloud

Hi All,

Just wanted to get some of your recommendations for the terragrunt folder structure if we had prod environments in:

Same Account

us-west-2
us-east-1

Different Accounts

us-china
us-gov

I was thinking we could have us-west-2 and us-china environments in the same folder structure for terragrunt/live/prod, but it looks like that groups stacks together in the same AWS account. So it would need to be terragrunt/live/china and terragrunt/live/gov.

Just wanted to ping here for your recommendations? @ArchiFleKs

setting `aws_account_id` doesn't ensure all resources are created in that account

I am trying out this template for EKS cluster creation right now.

While doing the apply, i was wondering why my vpc endpoint resources did not show up in the new subaccount that i created.

Turns out they were created in the main account i was using, even though i set aws_account_id to the sub account.

That is not ideal or obvious to a new user and i assume it also is a bug?

These resources i can see in my main account, which should be in the new sub account instead:

vpc
subnets
routing tables
igw
egress igw
eip
endpoints
nat gw

I see that it says in the requirements [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) configured with the account you want to deploy into, however my assumption was that my profile should have the permissions needed to create the resources.

Why else would there be a aws_account_id variable?

It seems i will have to use the iam_role option then to enforce where to spawn the resources, will check that out.

When destroying the incorrectly created resources, i do now get:

╷
│ Error: expected "url" url to not be empty, got 
│ 
│   with data.flux_sync.main[0],
│   on flux2.tf line 103, in data "flux_sync" "main":
│  103:   url         = local.flux2["github_url"]
│ 
╵
╷
│ Error: error reading EKS Cluster (cluster-name): couldn't find resource
│ 
│   with data.aws_eks_cluster.cluster,
│   on provider-local.tf line 33, in data "aws_eks_cluster" "cluster":
│   33: data "aws_eks_cluster" "cluster" {
│ 
╵

not ideal, because we wanted to use flux2 without github.
Will try that again with a demo url set.

Setting a demo url did actually not let me remove the resources, so i manually removed them.

Critical Pods (nginx-ingress, calico-node...) fail readiness checks under CPU pressure

Hi,

With the configuration provided in this repo, when a (user-scheduled) Pod without CPU limits puts a lot of CPU pressure on a node, critical pods are denied the CPU shares required to correctly pass readiness or even liveliness checks (I've observed this in production, this is reproducible).
This results in nginx-ingress, calico-node and kiam not receiving traffic, or even restarting.

I've noticed that

calico-node
kiam
nginx-ingress

do not have CPU requests set. I guess that's the reason why the health checks timeout under CPU pressure.

Should we add CPU requests for all these Pods? Is there a better way to fix this issue?

Thanks!

Clarify terraform backend usage for terraform

S3 bucket is handled automatically with terragrunt but with terraform, Cloudposs module is used which needs to first setup a local backend and then to copy it over to S3, this is detailed here but should be clarify in our docs out at least link to the original documentation

getting this error in nginx after deploy cert-manager and ingress tls

getting error after deploy cert-manager and ingress tls and its work fine in http
terragrunt.hcl file is

 cert-manager = {
    enabled                   = true
    acme_email                = "[email protected]"
    acme_http01_enabled       = true
    acme_http01_ingress_class = "nginx"
    acme_dns01_enabled        = true
    allowed_cidrs             = local.public_subnets_cidr_blocks
    experimental_csi_driver   = true
  }
  kube-prometheus-stack = {
    enabled                     = true
    allowed_cidrs               = local.public_subnets_cidr_blocks
    thanos_sidecar_enabled      = true
    thanos_bucket_force_destroy = true
    extra_values                = <<-EXTRA_VALUES
      grafana:
        deploymentStrategy:
          type: Recreate
        ingress:
          enabled: true
          annotations:
            kubernetes.io/ingress.class: nginx
            cert-manager.io/cluster-issuer: "letsencrypt"
            kubernetes.io/tls-acme: "true"
            ingress.kubernetes.io/force-ssl-redirect: "true"    
          hosts:
            - grafana.${local.default_domain_suffix}
          tls:
            - secretName: grafana.${local.default_domain_suffix}
              hosts:
               - grafana.${local.default_domain_suffix}
        persistence:
          enabled: true
          storageClassName: ebs-sc
          accessModes:
            - ReadWriteOnce
          size: 1Gi
          }
          ```
          ------
      logs of nginx 
      ```
      "networking.k8s.io/v1beta1", ResourceVersion:"18258834", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
W0721 17:45:08.508516       6 backend_ssl.go:46] Error obtaining X.509 certificate: no object matching key "monitoring/prometheus.thanos.prom-stack.blackbucklabs.net" in local store
W0721 17:45:08.510744       6 controller.go:1196] Error getting SSL certificate "monitoring/prometheus.thanos.prom-stack.blackbucklabs.net": local SSL certificate monitoring/prometheus.thanos.prom-stack.blackbucklabs.net was not found. Using default certificate

certificate is persent in nginx pod but it is taking defalut one both nginx and secrect are in same name space

cert-manager log

kubectl logs -f cert-manager-8df74bb89-t6d4z  -n cert-manager
I0722 17:32:09.523111       1 start.go:74] cert-manager "msg"="starting controller"  "git-commit"="614438aed00e1060870b273f2238794ef69b60ab" "version"="v1.3.1"
W0722 17:32:09.523200       1 client_config.go:608] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0722 17:32:09.524185       1 controller.go:171] cert-manager/controller/build-context "msg"="configured acme dns01 nameservers" "nameservers"=["172.20.0.10:53"]
I0722 17:32:09.524773       1 controller.go:72] cert-manager/controller "msg"="enabled controllers: [certificaterequests-approver certificaterequests-issuer-acme certificaterequests-issuer-ca certificaterequests-issuer-selfsigned certificaterequests-issuer-vault certificaterequests-issuer-venafi certificates-issuing certificates-key-manager certificates-metrics certificates-readiness certificates-request-manager certificates-revision-manager certificates-trigger challenges clusterissuers ingress-shim issuers orders]"
I0722 17:32:09.525467       1 controller.go:131] cert-manager/controller "msg"="starting leader election"
I0722 17:32:09.526315       1 metrics.go:166] cert-manager/controller/build-context/metrics "msg"="listening for connections on" "address"={"IP":"::","Port":9402,"Zone":""}
I0722 17:32:09.526724       1 leaderelection.go:243] attempting to acquire leader lease  kube-system/cert-manager-controller...
I0722 17:33:27.726101       1 leaderelection.go:253] successfully acquired lease kube-system/cert-manager-controller
I0722 17:33:27.728026       1 reflector.go:207] Starting reflector *v1.Secret (5m0s) from external/io_k8s_client_go/tools/cache/reflector.go:156
I0722 17:33:29.228590       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="challenges"
I0722 17:33:29.228839       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificaterequests-approver"
I0722 17:33:29.229009       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificaterequests-issuer-venafi"
I0722 17:33:29.229135       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificates-revision-manager"
I0722 17:33:29.229282       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="ingress-shim"
I0722 17:33:29.229423       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificaterequests-issuer-vault"
I0722 17:33:29.229479       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificates-issuing"
I0722 17:33:29.229519       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificates-request-manager"
I0722 17:33:29.229561       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificaterequests-issuer-acme"
I0722 17:33:29.229599       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificaterequests-issuer-ca"
I0722 17:33:29.229641       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="issuers"
I0722 17:33:29.229683       1 reflector.go:207] Starting reflector *v1.Order (10h0m0s) from external/io_k8s_client_go/tools/cache/reflector.go:156
I0722 17:33:29.229848       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificaterequests-issuer-selfsigned"
I0722 17:33:29.230078       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificates-key-manager"
I0722 17:33:29.230183       1 reflector.go:207] Starting reflector *v1.CertificateRequest (10h0m0s) from external/io_k8s_client_go/tools/cache/reflector.go:156
I0722 17:33:29.230317       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificates-metrics"
I0722 17:33:29.230437       1 reflector.go:207] Starting reflector *v1.Certificate (10h0m0s) from external/io_k8s_client_go/tools/cache/reflector.go:156
I0722 17:33:29.230568       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificates-readiness"
I0722 17:33:29.230702       1 reflector.go:207] Starting reflector *v1.Pod (10h0m0s) from external/io_k8s_client_go/tools/cache/reflector.go:156
I0722 17:33:29.230829       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificates-trigger"
I0722 17:33:29.229434       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="clusterissuers"
I0722 17:33:29.229351       1 reflector.go:207] Starting reflector *v1beta1.Ingress (10h0m0s) from external/io_k8s_client_go/tools/cache/reflector.go:156
I0722 17:33:29.229391       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="orders"
I0722 17:33:29.230084       1 reflector.go:207] Starting reflector *v1.Challenge (10h0m0s) from external/io_k8s_client_go/tools/cache/reflector.go:156
I0722 17:33:29.230154       1 reflector.go:207] Starting reflector *v1.ClusterIssuer (10h0m0s) from external/io_k8s_client_go/tools/cache/reflector.go:156
I0722 17:33:29.230663       1 reflector.go:207] Starting reflector *v1.Secret (10h0m0s) from external/io_k8s_client_go/tools/cache/reflector.go:156
I0722 17:33:29.230921       1 reflector.go:207] Starting reflector *v1.Service (10h0m0s) from external/io_k8s_client_go/tools/cache/reflector.go:156
I0722 17:33:29.230119       1 reflector.go:207] Starting reflector *v1.Issuer (10h0m0s) from external/io_k8s_client_go/tools/cache/reflector.go:156
W0722 17:33:29.273356       1 warnings.go:67] networking.k8s.io/v1beta1 Ingress is deprecated in v1.19+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
W0722 17:33:29.299606       1 warnings.go:67] networking.k8s.io/v1beta1 Ingress is deprecated in v1.19+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
I0722 17:34:16.731919       1 setup.go:90] cert-manager/controller/clusterissuers "msg"="generating acme account private key" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt-staging" "related_resource_namespace"="cert-manager" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt-staging" "resource_namespace"="" "resource_version"="v1"
I0722 17:34:16.738384       1 setup.go:90] cert-manager/controller/clusterissuers "msg"="generating acme account private key" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt" "related_resource_namespace"="cert-manager" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt" "resource_namespace"="" "resource_version"="v1"
I0722 17:34:16.990767       1 setup.go:178] cert-manager/controller/clusterissuers "msg"="ACME server URL host and ACME private key registration host differ. Re-checking ACME account registration" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt-staging" "related_resource_namespace"="cert-manager" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt-staging" "resource_namespace"="" "resource_version"="v1"
I0722 17:34:17.239203       1 setup.go:178] cert-manager/controller/clusterissuers "msg"="ACME server URL host and ACME private key registration host differ. Re-checking ACME account registration" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt" "related_resource_namespace"="cert-manager" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt" "resource_namespace"="" "resource_version"="v1"
I0722 17:34:18.771611       1 setup.go:270] cert-manager/controller/clusterissuers "msg"="verified existing registration with ACME server" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt-staging" "related_resource_namespace"="cert-manager" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt-staging" "resource_namespace"="" "resource_version"="v1"
I0722 17:34:18.771803       1 conditions.go:95] Setting lastTransitionTime for Issuer "letsencrypt-staging" condition "Ready" to 2021-07-22 17:34:18.771785968 +0000 UTC m=+129.276599161
I0722 17:34:18.835926       1 setup.go:270] cert-manager/controller/clusterissuers "msg"="verified existing registration with ACME server" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt" "related_resource_namespace"="cert-manager" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt" "resource_namespace"="" "resource_version"="v1"
I0722 17:34:18.835965       1 conditions.go:95] Setting lastTransitionTime for Issuer "letsencrypt" condition "Ready" to 2021-07-22 17:34:18.835958833 +0000 UTC m=+129.340771996
I0722 17:34:18.934824       1 setup.go:170] cert-manager/controller/clusterissuers "msg"="skipping re-verifying ACME account as cached registration details look sufficient" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt-staging" "related_resource_namespace"="cert-manager" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt-staging" "resource_namespace"="" "resource_version"="v1"
I0722 17:34:18.957978       1 setup.go:170] cert-manager/controller/clusterissuers "msg"="skipping re-verifying ACME account as cached registration details look sufficient" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt" "related_resource_namespace"="cert-manager" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt" "resource_namespace"="" "resource_version"="v1"
I0722 17:34:21.994663       1 setup.go:170] cert-manager/controller/clusterissuers "msg"="skipping re-verifying ACME account as cached registration details look sufficient" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt-staging" "related_resource_namespace"="cert-manager" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt-staging" "resource_namespace"="" "resource_version"="v1"
I0722 17:34:22.239811       1 setup.go:170] cert-manager/controller/clusterissuers "msg"="skipping re-verifying ACME account as cached registration details look sufficient" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt" "related_resource_namespace"="cert-manager" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt" "resource_namespace"="" "resource_version"="v1"
W0722 17:38:52.301824       1 warnings.go:67] networking.k8s.io/v1beta1 Ingress is deprecated in v1.19+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
I0722 17:42:48.343983       1 conditions.go:182] Setting lastTransitionTime for Certificate "grafana.thanos.prom-stack.blackbucklabs.net" condition "Ready" to 2021-07-22 17:42:48.343976156 +0000 UTC m=+638.848789319

cert-manager webhook log

kubectl logs -f cert-manager-webhook-86f4bbc997-kcfwx   -n cert-manager
W0722 17:32:07.986393       1 client_config.go:608] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
W0722 17:32:07.989574       1 client_config.go:608] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0722 17:32:07.989798       1 webhook.go:69] cert-manager/webhook "msg"="using dynamic certificate generating using CA stored in Secret resource"  "secret_name"="cert-manager-webhook-ca" "secret_namespace"="cert-manager"
I0722 17:32:07.990506       1 server.go:148] cert-manager/webhook "msg"="listening for insecure healthz connections"  "address"=":6080"
I0722 17:32:07.990585       1 server.go:161] cert-manager/webhook "msg"="listening for secure connections"  "address"=":10260"
I0722 17:32:07.990614       1 server.go:187] cert-manager/webhook "msg"="registered pprof handlers"
I0722 17:32:07.992240       1 reflector.go:207] Starting reflector *v1.Secret (1m0s) from external/io_k8s_client_go/tools/cache/reflector.go:156
I0722 17:32:09.127841       1 dynamic_source.go:199] cert-manager/webhook "msg"="Updated serving TLS certificate"

ingress

Warning: extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
Name:             kube-prometheus-stack-grafana
Namespace:        monitoring
Address:          ae6a3fd83c00a490c92975527b65c33a-500584658.ap-south-1.elb.amazonaws.com
Default backend:  default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
TLS:
  grafana.thanos.prom-stack.blackbucklabs.net terminates grafana.thanos.prom-stack.blackbucklabs.net
Rules:
  Host                                         Path  Backends
  ----                                         ----  --------
  grafana.thanos.prom-stack.blackbucklabs.net
                                               /   kube-prometheus-stack-grafana:80 (10.32.37.79:3000)
Annotations:                                   cert-manager.io/cluster-issuer: letsencrypt
                                               ingress.kubernetes.io/force-ssl-redirect: true
                                               kubernetes.io/ingress.class: nginx
                                               kubernetes.io/tls-acme: true
                                               meta.helm.sh/release-name: kube-prometheus-stack
                                               meta.helm.sh/release-namespace: monitoring
Events:                                        <none>

Couldn't find EKS resource

Hi, could u please help me to find out how to properly define a data source for cluster critical addons? I'm getting this error on terragrunt run-all plan command
Error: error reading EKS Cluster (cluster-name): couldn't find resource │ │ with data.aws_eks_cluster.cluster, │ on provider-local.tf line 22, in data "aws_eks_cluster" "cluster": │ 22: data "aws_eks_cluster" "cluster" {

and this is my critical addons terragrunt.hcl file
`
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "${get_parent_terragrunt_dir()}/_modules/remote/critical_addons/modules/aws"
}

locals {
environment_vars = read_terragrunt_config(find_in_parent_folders("env.hcl"))
region_vars = read_terragrunt_config(find_in_parent_folders("region.hcl"))

env                                               = local.environment_vars.locals.environment
general_name                              = local.environment_vars.locals.general_name
mock_commands                         = local.environment_vars.locals.mock_commands
cluster_version                             = local.environment_vars.locals.cluster_version
region                                          = local.region_vars.locals.aws_region
extra_values                                 = local.environment_vars.locals.extra_values
kube_system_namespace            = local.environment_vars.locals.kube_system_namespace

}

generate "provider" {
path = "provider-local.tf"
if_exists = "overwrite"
contents = <<EOF
provider "kubernetes" {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
token = data.aws_eks_cluster_auth.cluster.token
}

provider "kubectl" {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
token = data.aws_eks_cluster_auth.cluster.token
}

provider "helm" {
kubernetes {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
token = data.aws_eks_cluster_auth.cluster.token
}
}

data "aws_eks_cluster" "cluster" {
name = var.cluster-name
}

data "aws_eks_cluster_auth" "cluster" {
name = var.cluster-name
}
EOF
}

dependency "eks" {
config_path = find_in_parent_folders("eks")

mock_outputs_allowed_terraform_commands = local.mock_commands
mock_outputs = {
  cluster_id              = "cluster-name"
  cluster_oidc_issuer_url = "https://oidc.eks.us-east-1.amazonaws.com/id/0000000000000000"
}

}

dependency "vpc" {
config_path = find_in_parent_folders("vpc")

mock_outputs_allowed_terraform_commands = local.mock_commands
mock_outputs = {
private_subnets_cidr_blocks = ["fake","fake"]
}
}

dependencies {
paths = ["../iam/oidc"]
}

inputs = {
cluster-name = dependency.eks.outputs.cluster_id

eks = {
  "cluster_oidc_issuer_url" = dependency.eks.outputs.cluster_oidc_issuer_url
}

metrics-server = {
  enabled       = true
  extra_values  = local.extra_values
  namespace     = local.kube_system_namespace
  allowed_cidrs = dependency.vpc.outputs.private_subnets_cidr_blocks
}

cluster-autoscaler = {
  enabled       = true
  namespace     = local.kube_system_namespace
  extra_values  = local.extra_values 
}

keda = {
  enabled          = true
  extra_values     = local.extra_values 
  create_ns        = true
}

Not able to create aws eks cluster

when i run terragrunt run-all apply getting this error

 INFO[0708] Executing hook: kubeconfig                    prefix=[/Users/ramesh/zinka-monitoring/prod-deployment-2/terragrunt/live/thanos/ap-south-1/clusters/observer/eks]
ERRO[0714] Error running hook kubeconfig with message: exit status 1  prefix=[/Users/ramesh/zinka-monitoring/prod-deployment-2/terragrunt/live/thanos/ap-south-1/clusters/observer/eks]
ERRO[0714] Module /Users/ramesh/zinka-monitoring/prod-deployment-2/terragrunt/live/thanos/ap-south-1/clusters/observer/eks has finished with an error: 4 errors occurred:
  * exit status 1
  * exit status 1
  * exit status 1
  * exit status 1
  
  eks terragrunt file 
  include {
  path = "${find_in_parent_folders()}"
}

terraform {
  source = "github.com/terraform-aws-modules/terraform-aws-eks?ref=master"

  after_hook "kubeconfig" {
    commands = ["apply"]
    execute  = ["bash", "-c", "terraform output --raw kubeconfig 2>/dev/null > ${get_terragrunt_dir()}/kubeconfig"]
  }

  after_hook "kubeconfig-tg" {
    commands = ["apply"]
    execute  = ["bash", "-c", "terraform output --raw kubeconfig 2>/dev/null > kubeconfig"]
  }

  after_hook "kube-system-label" {
    commands = ["apply"]
    execute  = ["bash", "-c", "kubectl --kubeconfig kubeconfig label ns kube-system name=kube-system --overwrite"]
  }

  after_hook "undefault-gp2" {
    commands = ["apply"]
    execute  = ["bash", "-c", "kubectl --kubeconfig kubeconfig patch storageclass gp2 -p '{\"metadata\": {\"annotations\":{\"storageclass.kubernetes.io/is-default-class\":\"false\"}}}'"]
  }
}

locals {
  aws_region = yamldecode(file("${find_in_parent_folders("region_values.yaml")}"))["aws_region"]
  env        = yamldecode(file("${find_in_parent_folders("env_tags.yaml")}"))["Env"]
  prefix     = yamldecode(file("${find_in_parent_folders("global_values.yaml")}"))["prefix"]
  name       = yamldecode(file("${find_in_parent_folders("cluster_values.yaml")}"))["name"]
  custom_tags = merge(
    yamldecode(file("${find_in_parent_folders("global_tags.yaml")}")),
    yamldecode(file("${find_in_parent_folders("env_tags.yaml")}"))
  )
  cluster_name = "${local.prefix}-${local.env}-${local.name}"

  vpc_id = "xxxxxxxx"
  
  # these should be private subnets
  subnet_ids = [
      "subnet-xxxxxxxxxxx",
      "subnet-xxxxxxxxxxx",
      "subnet-xxxxxxxx",
  ]
}

generate "provider" {
  path      = "provider.tf"
  if_exists = "overwrite"
  contents  = <<-EOF
    provider "aws" {
      region = "${local.aws_region}"
    }
    provider "kubernetes" {
      host                   = data.aws_eks_cluster.cluster.endpoint
      cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
      token                  = data.aws_eks_cluster_auth.cluster.token
    }
    data "aws_eks_cluster" "cluster" {
      name = aws_eks_cluster.this[0].id
    }
    data "aws_eks_cluster_auth" "cluster" {
      name = aws_eks_cluster.this[0].id
    }
  EOF
}

inputs = {

  aws = {
    "region" = local.aws_region
  }

  tags = merge(
    local.custom_tags
  )

  cluster_name                         = local.cluster_name
  subnet_ids                           = local.subnet_ids
  vpc_id                               = local.vpc_id
  write_kubeconfig                     = true
  enable_irsa                          = true
  kubeconfig_aws_authenticator_command = "aws"
  kubeconfig_aws_authenticator_command_args = [
    "eks",
    "get-token",
    "--cluster-name",
    local.cluster_name
  ]
  kubeconfig_aws_authenticator_additional_args = []

  cluster_version           = "1.19"
  cluster_enabled_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"]

  # Should contain security groups for Office Access only
  # https://aws.amazon.com/blogs/containers/upcoming-changes-to-ip-assignment-for-eks-managed-node-groups/
  node_groups = {
    "default-${local.aws_region}" = {
      create_launch_template = true
      public_ip              = true
      key_name               = "awsKeyName"
      desired_capacity       = 3
      max_capacity           = 5
      min_capacity           = 3
      instance_types         = ["m5a.large"]
      disk_size              = 30
      k8s_labels = {
        pool = "default"
      }
      capacity_type = "ON_DEMAND"
    }
  }
}

Not able to deploy eks-addons. ingress-nginx not deploying.

Can anyone please help me here?

I ran the command:

~/observer/eks-addons$ terragrunt apply

kubernetes_namespace.ingress-nginx[0]: Creating...
kubernetes_namespace.ingress-nginx[0]: Creation complete after 2s [id=ingress-nginx]
kubernetes_network_policy.ingress-nginx_default_deny[0]: Creating...
kubernetes_network_policy.ingress-nginx_allow_namespace[0]: Creating...
kubernetes_network_policy.ingress-nginx_allow_control_plane[0]: Creating...
kubernetes_network_policy.ingress-nginx_allow_ingress[0]: Creating...
kubernetes_network_policy.ingress-nginx_allow_monitoring[0]: Creating...
kubernetes_network_policy.ingress-nginx_allow_monitoring[0]: Creation complete after 0s [id=ingress-nginx/ingress-nginx-allow-monitoring]
kubernetes_network_policy.ingress-nginx_allow_ingress[0]: Creation complete after 0s [id=ingress-nginx/ingress-nginx-allow-ingress]
kubernetes_network_policy.ingress-nginx_default_deny[0]: Creation complete after 0s [id=ingress-nginx/ingress-nginx-default-deny]
kubernetes_network_policy.ingress-nginx_allow_namespace[0]: Creation complete after 0s [id=ingress-nginx/ingress-nginx-allow-namespace]
kubernetes_network_policy.ingress-nginx_allow_control_plane[0]: Creation complete after 0s [id=ingress-nginx/ingress-nginx-allow-control-plane]
helm_release.kube-prometheus-stack[0]: Modifying... [id=kube-prometheus-stack]
helm_release.kube-prometheus-stack[0]: Still modifying... [id=kube-prometheus-stack, 10s elapsed]
helm_release.kube-prometheus-stack[0]: Still modifying... [id=kube-prometheus-stack, 20s elapsed]
helm_release.kube-prometheus-stack[0]: Still modifying... [id=kube-prometheus-stack, 30s elapsed]
helm_release.kube-prometheus-stack[0]: Still modifying... [id=kube-prometheus-stack, 40s elapsed]
helm_release.kube-prometheus-stack[0]: Still modifying... [id=kube-prometheus-stack, 50s elapsed]
helm_release.kube-prometheus-stack[0]: Still modifying... [id=kube-prometheus-stack, 1m0s elapsed]
helm_release.kube-prometheus-stack[0]: Still modifying... [id=kube-prometheus-stack, 1m10s elapsed]
helm_release.kube-prometheus-stack[0]: Still modifying... [id=kube-prometheus-stack, 1m20s elapsed]
helm_release.kube-prometheus-stack[0]: Still modifying... [id=kube-prometheus-stack, 1m30s elapsed]
helm_release.kube-prometheus-stack[0]: Modifications complete after 1m39s [id=kube-prometheus-stack]
helm_release.ingress-nginx[0]: Creating...
helm_release.ingress-nginx[0]: Still creating... [10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [20s elapsed]
helm_release.ingress-nginx[0]: Still creating... [30s elapsed]
helm_release.ingress-nginx[0]: Still creating... [40s elapsed]
helm_release.ingress-nginx[0]: Still creating... [50s elapsed]
helm_release.ingress-nginx[0]: Still creating... [1m0s elapsed]
helm_release.ingress-nginx[0]: Still creating... [1m10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [1m20s elapsed]
helm_release.ingress-nginx[0]: Still creating... [1m30s elapsed]
helm_release.ingress-nginx[0]: Still creating... [1m40s elapsed]
helm_release.ingress-nginx[0]: Still creating... [1m50s elapsed]
helm_release.ingress-nginx[0]: Still creating... [2m0s elapsed]
helm_release.ingress-nginx[0]: Still creating... [2m10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [2m20s elapsed]
helm_release.ingress-nginx[0]: Still creating... [2m30s elapsed]
helm_release.ingress-nginx[0]: Still creating... [2m40s elapsed]
helm_release.ingress-nginx[0]: Still creating... [2m50s elapsed]
helm_release.ingress-nginx[0]: Still creating... [3m0s elapsed]
helm_release.ingress-nginx[0]: Still creating... [3m10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [3m20s elapsed]
helm_release.ingress-nginx[0]: Still creating... [3m30s elapsed]
helm_release.ingress-nginx[0]: Still creating... [3m40s elapsed]
helm_release.ingress-nginx[0]: Still creating... [3m50s elapsed]
helm_release.ingress-nginx[0]: Still creating... [4m0s elapsed]
helm_release.ingress-nginx[0]: Still creating... [4m10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [4m20s elapsed]
helm_release.ingress-nginx[0]: Still creating... [4m30s elapsed]
helm_release.ingress-nginx[0]: Still creating... [4m40s elapsed]
helm_release.ingress-nginx[0]: Still creating... [4m50s elapsed]
helm_release.ingress-nginx[0]: Still creating... [5m0s elapsed]
helm_release.ingress-nginx[0]: Still creating... [5m10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [5m20s elapsed]
helm_release.ingress-nginx[0]: Still creating... [5m30s elapsed]
helm_release.ingress-nginx[0]: Still creating... [5m40s elapsed]
helm_release.ingress-nginx[0]: Still creating... [5m50s elapsed]
helm_release.ingress-nginx[0]: Still creating... [6m0s elapsed]
helm_release.ingress-nginx[0]: Still creating... [6m10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [6m20s elapsed]
helm_release.ingress-nginx[0]: Still creating... [6m30s elapsed]
helm_release.ingress-nginx[0]: Still creating... [6m40s elapsed]
helm_release.ingress-nginx[0]: Still creating... [6m50s elapsed]
helm_release.ingress-nginx[0]: Still creating... [7m0s elapsed]
helm_release.ingress-nginx[0]: Still creating... [7m10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [7m20s elapsed]
helm_release.ingress-nginx[0]: Still creating... [7m30s elapsed]
helm_release.ingress-nginx[0]: Still creating... [7m40s elapsed]
helm_release.ingress-nginx[0]: Still creating... [7m50s elapsed]
helm_release.ingress-nginx[0]: Still creating... [8m0s elapsed]
helm_release.ingress-nginx[0]: Still creating... [8m10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [8m20s elapsed]
helm_release.ingress-nginx[0]: Still creating... [8m30s elapsed]
helm_release.ingress-nginx[0]: Still creating... [8m40s elapsed]
helm_release.ingress-nginx[0]: Still creating... [8m50s elapsed]
helm_release.ingress-nginx[0]: Still creating... [9m0s elapsed]
helm_release.ingress-nginx[0]: Still creating... [9m10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [9m20s elapsed]
helm_release.ingress-nginx[0]: Still creating... [9m30s elapsed]
helm_release.ingress-nginx[0]: Still creating... [9m40s elapsed]
helm_release.ingress-nginx[0]: Still creating... [9m50s elapsed]
helm_release.ingress-nginx[0]: Still creating... [10m0s elapsed]
helm_release.ingress-nginx[0]: Still creating... [10m10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [10m20s elapsed]
helm_release.ingress-nginx[0]: Still creating... [10m30s elapsed]
helm_release.ingress-nginx[0]: Still creating... [10m40s elapsed]
helm_release.ingress-nginx[0]: Still creating... [10m50s elapsed]
helm_release.ingress-nginx[0]: Still creating... [11m0s elapsed]
helm_release.ingress-nginx[0]: Still creating... [11m10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [11m20s elapsed]
helm_release.ingress-nginx[0]: Still creating... [11m30s elapsed]
helm_release.ingress-nginx[0]: Still creating... [11m40s elapsed]
helm_release.ingress-nginx[0]: Still creating... [11m50s elapsed]
helm_release.ingress-nginx[0]: Still creating... [12m0s elapsed]
helm_release.ingress-nginx[0]: Still creating... [12m10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [12m20s elapsed]
╷
│ Error: Kubernetes cluster unreachable: the server has asked for the client to provide credentials
│ 
│   with helm_release.ingress-nginx[0],
│   on ingress-nginx.tf line 131, in resource "helm_release" "ingress-nginx":
│  131: resource "helm_release" "ingress-nginx" {
│ 
╵
ERRO[0976] Hit multiple errors:
Hit multiple errors:
exit status 1

The ingress-nginx-controller is in pending state. I am not sure how to debug this.
Can someone help here?

~/observer/eks-addons$  kubectl get svc -n ingress-nginx

NAME                                 TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
ingress-nginx-controller             LoadBalancer   172.20.19.197   <pending>     80:30064/TCP,443:30531/TCP   18m
ingress-nginx-controller-admission   ClusterIP      172.20.32.162   <none>        443/TCP                      18m
ingress-nginx-controller-metrics     ClusterIP      172.20.49.233   <none>        10254/TCP                    18m

Installing loki fails because of ingress value seemingly incorrect

I have tried to let teks create loki via the the terraform-kubernetes-addons repo.

However i do get this error:

╷
│ Error: failed to create resource: Ingress.extensions "loki" is invalid: spec.rules[0].host: Invalid value: "map[host:logz.my.domain.tld paths:[/]]": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
│ 
│   with helm_release.loki-stack[0],
│   on loki-stack.tf line 136, in resource "helm_release" "loki-stack":
│  136: resource "helm_release" "loki-stack" {
│ 
╵

The hostname can properly be matched against the given regex, so i don't see why it should fail.
What i noticed however was that it seemed to render a go object as the hostname (which doesn't make sense to me when looking at the manifests that are templated).
Or do i interpret this incorrectly?
I am a little confused anyway about this, as when i am rendering the loki-stack chart in the given version (2.8.2) with the values given it would not create an ingress resource.
That is due to the helm chart loki-stack having the loki helm chart as a dependency and thus expecting its configuration in loki.*.
When i do change that in the extra_values it doesn't create an ingress resource at all though, so it seems that terraforms helm vs. my local one (v3.10.0) is behaving differently?

I only removed the annotations, from the terragrunt.hcl for the loki stack, as i don't want to use the nginx ingress class.
So my extra_values look like this:

    extra_values         = <<-VALUES
      resources:
        requests:
          cpu: 1
          memory: 2Gi
        limits:
          cpu: 2
          memory: 4Gi
      config:
        limits_config:
          ingestion_rate_mb: 320
          ingestion_burst_size_mb: 512
          max_streams_per_user: 100000
        chunk_store_config:
          max_look_back_period: 2160h
        table_manager:
          retention_deletes_enabled: true
          retention_period: 2160h
      ingress:
        enabled: true
        hosts:
          - host: logz.${include.root.locals.merged.default_domain_name}
            paths: ["/"]
        tls:
          - secretName: logz.${include.root.locals.merged.default_domain_name}
            hosts:
              - logz.${include.root.locals.merged.default_domain_name}
        VALUES

I'd be really glad for some hints on where to look at.

particuleio / teks Goto Github PK

teks's Introduction

tEKS

Terraform/Terragrunt

Contributing

Requirements

Terragrunt

Quickstart

Main purposes

What you get

Curated Features

Bottlerocket support

AWS Session Manager by default

From and to Zero scaling with EKS Managed Node Groups

Automatic dependencies upgrade

Enforced security

Out of the box logging

Out of the box monitoring

Long term storage with Thanos

Support for ARM instances

Helm v3 provider

Other and not limited to

Always up to date

Requirements

Pre-commit

ASDF

Enabling plugins

Installing tools

Examples

Additional infrastructure blocks

Branches

License

teks's People

Contributors

Stargazers

Watchers

Forkers

teks's Issues

test this with nginx controller

Detected dependencies

cert-manager log

cert-manager webhook log

ingress

Recommend Projects

Recommend Topics

Recommend Org