Giter Site home page Giter Site logo

radar-k8s-infrastructure's Introduction

RADAR-K8s-Infrastructure

This repository aims to provide IaC templates for RADAR-Kubernetes users who intend to deploy the platform to Kubernetes clusters supported by cloud providers such as AWS.


Terraform validate Terraform validate

Dependencies

Terraform >= 1.7.0, < 1.8.0
AWS CLI >= 2.11

Usage

It is recommended that you use RADAR-K8s-Infrastructure as a template and create your own IaC repository from it (starting with a private one probably). Make sure to customise enclosed templates to your needs before creating the desired infrastructure.

use this template

Configure credentials

export TF_VAR_AWS_REGION=$AWS_REGION
export TF_VAR_AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID
export TF_VAR_AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY
# For temporary credentials and SSO
export TF_VAR_AWS_SESSION_TOKEN=$AWS_SESSION_TOKEN

Workspaces

The definition of resources required for running RADAR-base components is located in the cluster directory, while other optional resources are defined in the config directory. Please treat each directory as a separate workspace and perform terraform operations individually. The cluster resources need to be created and made fully available before you proceed with the creation of the config ones.

To retain the user-specific configurations for future infrastructure updates, modify terraform.tfvars within the workspace and push the change to your repository. If needed, additional variables defined in variables.tf can also be included there.

ℹ️ Important Notice
As a best practice, never save raw values of secret variables in your repository. Instead, always encrypt them before committing. If your cluster is no longer in use, run terraform destory to delete all the associated resources and reduce your cloud spending. If you have resources created within config, run terraform destory in that directory before running the counterpart in cluster.

Create the infrastructure

cd cluster
# Initialise the working directory

terraform init
# Review the changes going to be made 

terraform plan
# Create/update the infrastructure

terraform apply --auto-approve

Created resources:

  • VPC featuring both public and private subnets
  • VPC endpoints for privately accessing AWS services
  • Internet and NAT gateways
  • EKS cluster with a default worker node group
  • EKS coredns, kube-proxy, vpc-cni and aws-ebs-csi-driver addons
  • EBS storage classes referenced by PVCs
  • IRSAs for VPC CNI and EBS CSI controllers
  • Initial EC2 instances launched with Spot capacity
  • Default network ACLs and route tables
  • KMS keys and CloudWatch log groups
  • Essential IAM policies, roles, users and user groups for accessing aforementioned resources

Connect to and verify the cluster

# Make sure to use --region if the cluster is deployed in non-default region and --profile if the cluster is deployed in a non-default AWS account
aws eks update-kubeconfig --name [eks_cluster_name]
kubectl get nodes
kubectl get pods -A

Once the infrastructure update is finished successfully, you can start deploying RADAR-base components to the newly created cluster by following the Installation Guide. Before running helmfile sync, you will find it necessary to configure certain resource values which are required by production.yaml but only known post to infrastructure creation. We have exported the values of those resources and you can get them by simply running:

terraform output

You could also automate this value injection by implementing your own templating strategy to customise production.yaml

Configure the cluster (optional)

N.B.: To get external DNS, Cert Manager and SMTP working via Route 53 (if chosen as your DNS service), you need to configure your registered top-level domain and its corresponding hosted zone ID via variable domain_name in config/terraform.tfvars. Additionally, set enable_route53 to true.

cd config
terraform init
terraform plan
terraform apply --auto-approve

Optional resource creations are disabled by default. To enable the creation of a specific resource named X, navigate to config/terraform.tfvars and update the value of enable_X to true before applying the tempate.

Created resources (if all enabled):

  • EIP allocated for the load balancer created by Ingress-NGINX
  • Karpenter provisioner, the node template and the SQS interruption queue
  • Metrics Server along with the Kubernetes Dashboard and the read-only user
  • MSK cluster featuring Kafka brokers and zookeepers
  • RDS instance running managementportal, appserver and rest_sources_auth databases
  • Route53 zone and records accompanied by IRSAs for external DNS and Cert Manager
  • S3 buckets for intermediate-output-storage, output-storage and velero-backups
  • SES SMTP endpoint
  • CloudWatch event rules and targets
  • Essential IAM policies, roles, users for aforementioned resources

Known limitations

  • Since EBS has been chosen as the default storage, node groups will be created in a single AZ due to the mounting restriction.
  • Sometimes Terraform tries to replace the existing MSK cluster while re-applying the templates even if there is no change on the cluster. Mitigate this with terraform untaint aws_msk_cluster.msk_cluster.
  • Prior to terraform destroy, infrastructure resources created by pods/controllers and may not be visible to Terraform need to be deleted, e.g., nginx-ingress's NLB. A good practice is to always begin by running helmfile destroy.
  • If Karpenter is used for node provisioning, ensure the nodes created by it are not lingering around before running terraform destroy.

radar-k8s-infrastructure's People

Contributors

baixiac avatar keyvaann avatar

Stargazers

 avatar  avatar

Watchers

Amos Folarin avatar Lukasz Zalewski avatar Yatharth Ranjan avatar Pauline Conde avatar Heet Sankesara avatar

Forkers

keyvaann

radar-k8s-infrastructure's Issues

Can't destroy the the terraform environment due to DependencyViolation

I'm trying to destroy the environment via terraform destroy but I'm getting this error message after multiple attempts:

│ Error: deleting EC2 Subnet (subnet-0926a18c09585dd05): DependencyViolation: The subnet 'subnet-0926a18c09585dd05' has dependencies and cannot be deleted.
│ 	status code: 400, request id: 53305f8a-addd-48de-a246-d0ade3cc432c
|
│ Error: deleting EC2 Subnet (subnet-01635dae617c6f131): DependencyViolation: The subnet 'subnet-01635dae617c6f131' has dependencies and cannot be deleted.
│ 	status code: 400, request id: 4f884fc5-f4ee-4503-9e29-7c8e995e6578
|
│ Error: deleting EC2 Subnet (subnet-00353977044e645d5): DependencyViolation: The subnet 'subnet-00353977044e645d5' has dependencies and cannot be deleted.
│ 	status code: 400, request id: 1bc4c253-2f86-4d2b-8b84-ff7a5ee97298
╵```

Use cluster name for resource creation

S3 bucket names should be globally unique, values commonly used in ${var.environment} such as dev and prod won't be unique.

Edit: ${var.environment} is used in other places as well. I think its description and examples can be improved to suggest that it's name of the environment in general and not whether it's dev or prod. Or the value to be merged with ${var.eks_cluster_name}.

Compatibility with newer Terraform versions

The repository is configured to use Terraform 1.4 however now version 1.7 is about to get released. It would be good to officially support newer versions. I have created an EKS cluster via version 1.6 and other than one error didn't had other issues so it hopefully it won't be too much effort to support the newer versions.

Consider merging cluster and config directories

I'm not sure what is the best practice but since we have made extra components optional, I think merging the 2 directories would be useful to have all of the code in one place and managing state and the variables easy as well.

Allow configuring default storage class

Currently default storage class in the cluster is gp2 and it's not configurable. It would be good to make it configurable during the installation, I saw some suggestions to change the class manually after the cluster creation but that wouldn't be ideal, especially that it's not something that we can add to the RADAR-Kubernetes. I didn't find an easy way to change the default setting with the current modules that are being used, so more investigation might be needed.

Getting credentials: exec: executable aws failed with exit code 255

Running terraform apply in the cluster directory fails with this error message:

│ Error: Have got the following error while validating the existence of the ConfigMap "aws-auth": Get "https://xxx.gr7.eu-west-2.eks.amazonaws.com/api/v1/namespaces/kube-system/configmaps/aws-auth": getting credentials: exec: executable aws failed with exit code 255
│ 
│   with module.eks.kubernetes_config_map_v1_data.aws_auth[0],
│   on .terraform/modules/eks/main.tf line 553, in resource "kubernetes_config_map_v1_data" "aws_auth":
│  553: resource "kubernetes_config_map_v1_data" "aws_auth" {

Upon rerunning terraform apply it appears that it's failing to create this resource:

  # module.eks.kubernetes_config_map_v1_data.aws_auth[0] will be created
  + resource "kubernetes_config_map_v1_data" "aws_auth" {
      + data          = {
          + "mapAccounts" = jsonencode([])
          + "mapRoles"    = <<-EOT
                - "groups":
                  - "system:bootstrappers"
                  - "system:nodes"
                  "rolearn": "arn:aws:iam::xxx:role/dmz-eks-node-group-xxx"
                  "username": "system:node:{{EC2PrivateDNSName}}"
                - "groups":
                  - "system:bootstrappers"
                  - "system:nodes"
                  "rolearn": "arn:aws:iam::xxx:role/worker-eks-node-group-xxx"
                  "username": "system:node:{{EC2PrivateDNSName}}"
                - "groups":
                  - "system:masters"
                  "rolearn": "arn:aws:iam::xxx:role/connect-prod-radar-base-admin-role"
                  "username": "xxx-radar-base-admin-role"
            EOT
          + "mapUsers"    = jsonencode([])
        }
      + field_manager = "Terraform"
      + force         = true
      + id            = (known after apply)

      + metadata {
          + name      = "aws-auth"
          + namespace = "kube-system"
        }
    }

RDS Creation

As discussed today, control is needed for RDS db creation. Few options suggested,

  1. wait & list the RDS database to see if it is online before progressing or
  2. add an issue to create the RDS using a Terraform post-install-hook

Architecture documentation

I think it would be good to have some documentation on architecture of the resources that is created by this repository.
For example I see that 2 node groups will be created during cluster creation and one is in dmz group with a public subnet and the other one in a private subnet with worker group. So I'm wondering:

  • Why is this necessary?
  • Is there any changes needed to be done to RADAR-Kubernete to take advantage of it? (I can see the taints being defined in the dmz group)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.