I have issues I'm submitting a... <ul class="contains-task-l

Awww man, you're too kind <a class="user-mention notranslate" data-hovercard-type="use

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Experience with blue/green using this module? about terraform-aws-eks HOT 7 CLOSED

terraform-aws-modules commented on September 10, 2024

Experience with blue/green using this module?

from terraform-aws-eks.

Comments (7)

brandonjbjelland commented on September 10, 2024 2

Awww man, you're too kind @hobbsh ! This is the first I'd heard the project contrasted with kubespray. Perhaps a dedicated doc on certain operational aspects like this is warranted. I'll keep that in mind moving forward.

from terraform-aws-eks.

brandonjbjelland commented on September 10, 2024 1

Thanks for that output... not the best of situations but let's see what we can't come up with.

Not the most practical solution but perhaps a slight improvement on what you've got - consider reusing the orphaned map left by a blanked out worker group entry such that a pair of entries are dedicated to a service in a blue-green fashion. A deployment becomes somewhat heavy in that it requires 3 distinct changes but it's at least somewhat better than leaving a long trail of deployment corpses.

State 1: Blue is live; green offline

                  map(
                      "name", "k8s-worker-blue",
                      "ami_id", "ami-179fc16f",
                      "asg_desired_capacity", "0",
                      "asg_max_size", "0",
                      "asg_min_size", "0",
                      ),
                  map(
                      "name", "k8s-worker-green",
                      "ami_id", "ami-67a0841f",
                      "asg_desired_capacity", "5",
                      "asg_max_size", "8",
                      "asg_min_size", "5",
                      "instance_type","${lookup(var.worker_sizes, "${terraform.workspace}")}",
                      "key_name", "${aws_key_pair.infra-deployer.key_name}",
                      "root_volume_size", "48"
                      )

State 2: transitional, Both blue and green live. When containers are balanced, monitor, verify, and drain green.

                  map(
                      "name", "k8s-worker-blue",
                      "ami_id", "ami-179fc16f",
                      "asg_desired_capacity", "5",
                      "asg_max_size", "8",
                      "asg_min_size", "5",
                      "instance_type","${lookup(var.worker_sizes, "${terraform.workspace}")}",
                      "key_name", "${aws_key_pair.infra-deployer.key_name}",
                      "root_volume_size", "48"
                      ),
                  map(
                      "name", "k8s-worker-green",
                      "ami_id", "ami-67a0841f",
                      "asg_desired_capacity", "5",
                      "asg_max_size", "8",
                      "asg_min_size", "5",
                      "instance_type","${lookup(var.worker_sizes, "${terraform.workspace}")}",
                      "key_name", "${aws_key_pair.infra-deployer.key_name}",
                      "root_volume_size", "48"
                      )

State 3: spin down green; blue takes all traffic

                  map(
                      "name", "k8s-worker-blue",
                      "ami_id", "ami-179fc16f",
                      "asg_desired_capacity", "5",
                      "asg_max_size", "8",
                      "asg_min_size", "5",
                      "instance_type","${lookup(var.worker_sizes, "${terraform.workspace}")}",
                      "key_name", "${aws_key_pair.infra-deployer.key_name}",
                      "root_volume_size", "48"
                      ),
                  map(
                      "name", "k8s-worker-green",
                      "ami_id", "ami-67a0841f",
                      "asg_desired_capacity", "0",
                      "asg_max_size", "0",
                      "asg_min_size", "0",
                      )

If this seems like a sound enough pattern for executing an update on worker node clusters, it probably makes sense to add a quick blurb in the readme.

from terraform-aws-eks.

brandonjbjelland commented on September 10, 2024

Hey @hobbsh - totally valid points here. Let's see if we can't find a way.

Thinking about this a bit, the situation wouldn't be so bad if we had a load balancer enforcing health in tandem with create_before_destroy and a minimum healthy instance count. Alas, not in this brave new world...

I haven't quite put together how removing a dead asg entry from the list ends up being disruptive especially if you've explicitly named your groups as you show above. Does terraform tear down everything and recreate the ones that remain?

from terraform-aws-eks.

hobbsh commented on September 10, 2024

@brandoconnor Yes terraform sees the index change and recreates everything. Terraform will see the list indexes have changed and reassign the resource names based on the new indexes. It's a bit confusing to me why a one element worker_group list does not have [0] appended to it in the resource name, it shows up as just module.eks.aws_autoscaling_group.workers. Only after adding a second worker_group does the index appear, but that may be irrelevant and something I haven't noticed in Terraform until now.

I have been thinking about possibly adding a flag like delete = true so all that's left of an old ASG map is map("delete", "true") but that would require reworking the count parameter on all the ASG resources. I have also done targeted destroys but then that gets messy with resources wanting to recreate since the ASG still exists in the worker_groups list, again requiring some sort of flag to tell Terraform not to recreate. Maybe more things will be possible when Terraform v0.12 is released.

I dug back in the terraform history and found an example of what deleting the original ASG looks like (this was before I explicitly named with the AMI ID but iirc it did not make a difference - I've killed several worker groups by accident in staging this way):

~ module.eks.aws_autoscaling_group.workers
      launch_configuration:                      "staging-k8s-worker2018072300313993610000000b" => "${element(aws_launch_configuration.workers.*.id, count.index)}"

  - module.eks.aws_autoscaling_group.workers[1]

-/+ module.eks.aws_launch_configuration.workers (new resource required)
      id:                                        "staging-k8s-worker2018072300313993610000000b" => <computed> (forces new resource)
      associate_public_ip_address:               "false" => "false"
      ebs_block_device.#:                        "0" => <computed>
      ebs_optimized:                             "true" => "true"
      enable_monitoring:                         "true" => "true"
      iam_instance_profile:                      "staging20180723003138205600000007" => "staging20180723003138205600000007"
      image_id:                                  "ami-179fc16f" => "ami-c82004b0" (forces new resource)
      instance_type:                             "m4.large" => "m4.large"
      key_name:                                  "infra-deployer" => "infra-deployer"
      name:                                      "staging-k8s-worker2018072300313993610000000b" => <computed>
      name_prefix:                               "staging-k8s-worker" => "staging-k8s-worker"
      root_block_device.#:                       "1" => "1"
      root_block_device.0.delete_on_termination: "true" => "true"
      root_block_device.0.iops:                  "0" => "0"
      root_block_device.0.volume_size:           "20" => "48" (forces new resource)
      root_block_device.0.volume_type:           "gp2" => "gp2"
      security_groups.#:                         "2" => "2"
      security_groups.1093865381:                "sg-cf611ebf" => "sg-cf611ebf"
      security_groups.3825257995:                "sg-45093e3b" => "sg-45093e3b"
      user_data_base64: "REDACTED "

  - module.eks.aws_launch_configuration.workers[1]

from terraform-aws-eks.

brandonjbjelland commented on September 10, 2024

The good news is the unit of deployment in a k8s-centric system shouldn't be the worker nodes themselves so having to roll out a refreshed worker group often doesn't seem likely, though it's an eventuality given that AMIs all need updates and retirement.

from terraform-aws-eks.

hobbsh commented on September 10, 2024

@brandoconnor thanks for the thoughts! I was pretty caught up in having one worker group and no extra resources that I kinda glossed over the concept of alternating the worker groups (instead of creating a new one and trying to delete all old ones completely). I'll give this a shot next time I need to roll out a new AMI (probably in a few weeks).

Compared to something like kubespray, this module combined with a managed controlplane allows much greater flexibility (I can create a mirror cluster in a different region in 20 minutes!) so I really appreciate the work put in here! This is probably good to close and hopefully helps other people in a similar situation.

from terraform-aws-eks.

github-actions commented on September 10, 2024

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

from terraform-aws-eks.

Experience with blue/green using this module? about terraform-aws-eks HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent