Giter Site home page Giter Site logo

Comments (7)

brandonjbjelland avatar brandonjbjelland commented on September 10, 2024 2

Awww man, you're too kind @hobbsh ! This is the first I'd heard the project contrasted with kubespray. Perhaps a dedicated doc on certain operational aspects like this is warranted. I'll keep that in mind moving forward.

from terraform-aws-eks.

brandonjbjelland avatar brandonjbjelland commented on September 10, 2024 1

Thanks for that output... not the best of situations but let's see what we can't come up with.

Not the most practical solution but perhaps a slight improvement on what you've got - consider reusing the orphaned map left by a blanked out worker group entry such that a pair of entries are dedicated to a service in a blue-green fashion. A deployment becomes somewhat heavy in that it requires 3 distinct changes but it's at least somewhat better than leaving a long trail of deployment corpses.

State 1: Blue is live; green offline

                  map(
                      "name", "k8s-worker-blue",
                      "ami_id", "ami-179fc16f",
                      "asg_desired_capacity", "0",
                      "asg_max_size", "0",
                      "asg_min_size", "0",
                      ),
                  map(
                      "name", "k8s-worker-green",
                      "ami_id", "ami-67a0841f",
                      "asg_desired_capacity", "5",
                      "asg_max_size", "8",
                      "asg_min_size", "5",
                      "instance_type","${lookup(var.worker_sizes, "${terraform.workspace}")}",
                      "key_name", "${aws_key_pair.infra-deployer.key_name}",
                      "root_volume_size", "48"
                      )

State 2: transitional, Both blue and green live. When containers are balanced, monitor, verify, and drain green.

                  map(
                      "name", "k8s-worker-blue",
                      "ami_id", "ami-179fc16f",
                      "asg_desired_capacity", "5",
                      "asg_max_size", "8",
                      "asg_min_size", "5",
                      "instance_type","${lookup(var.worker_sizes, "${terraform.workspace}")}",
                      "key_name", "${aws_key_pair.infra-deployer.key_name}",
                      "root_volume_size", "48"
                      ),
                  map(
                      "name", "k8s-worker-green",
                      "ami_id", "ami-67a0841f",
                      "asg_desired_capacity", "5",
                      "asg_max_size", "8",
                      "asg_min_size", "5",
                      "instance_type","${lookup(var.worker_sizes, "${terraform.workspace}")}",
                      "key_name", "${aws_key_pair.infra-deployer.key_name}",
                      "root_volume_size", "48"
                      )

State 3: spin down green; blue takes all traffic

                  map(
                      "name", "k8s-worker-blue",
                      "ami_id", "ami-179fc16f",
                      "asg_desired_capacity", "5",
                      "asg_max_size", "8",
                      "asg_min_size", "5",
                      "instance_type","${lookup(var.worker_sizes, "${terraform.workspace}")}",
                      "key_name", "${aws_key_pair.infra-deployer.key_name}",
                      "root_volume_size", "48"
                      ),
                  map(
                      "name", "k8s-worker-green",
                      "ami_id", "ami-67a0841f",
                      "asg_desired_capacity", "0",
                      "asg_max_size", "0",
                      "asg_min_size", "0",
                      )

If this seems like a sound enough pattern for executing an update on worker node clusters, it probably makes sense to add a quick blurb in the readme.

from terraform-aws-eks.

brandonjbjelland avatar brandonjbjelland commented on September 10, 2024

Hey @hobbsh - totally valid points here. Let's see if we can't find a way.

Thinking about this a bit, the situation wouldn't be so bad if we had a load balancer enforcing health in tandem with create_before_destroy and a minimum healthy instance count. Alas, not in this brave new world...

I haven't quite put together how removing a dead asg entry from the list ends up being disruptive especially if you've explicitly named your groups as you show above. Does terraform tear down everything and recreate the ones that remain?

from terraform-aws-eks.

hobbsh avatar hobbsh commented on September 10, 2024

@brandoconnor Yes terraform sees the index change and recreates everything. Terraform will see the list indexes have changed and reassign the resource names based on the new indexes. It's a bit confusing to me why a one element worker_group list does not have [0] appended to it in the resource name, it shows up as just module.eks.aws_autoscaling_group.workers. Only after adding a second worker_group does the index appear, but that may be irrelevant and something I haven't noticed in Terraform until now.

I have been thinking about possibly adding a flag like delete = true so all that's left of an old ASG map is map("delete", "true") but that would require reworking the count parameter on all the ASG resources. I have also done targeted destroys but then that gets messy with resources wanting to recreate since the ASG still exists in the worker_groups list, again requiring some sort of flag to tell Terraform not to recreate. Maybe more things will be possible when Terraform v0.12 is released.

I dug back in the terraform history and found an example of what deleting the original ASG looks like (this was before I explicitly named with the AMI ID but iirc it did not make a difference - I've killed several worker groups by accident in staging this way):

~ module.eks.aws_autoscaling_group.workers
      launch_configuration:                      "staging-k8s-worker2018072300313993610000000b" => "${element(aws_launch_configuration.workers.*.id, count.index)}"

  - module.eks.aws_autoscaling_group.workers[1]

-/+ module.eks.aws_launch_configuration.workers (new resource required)
      id:                                        "staging-k8s-worker2018072300313993610000000b" => <computed> (forces new resource)
      associate_public_ip_address:               "false" => "false"
      ebs_block_device.#:                        "0" => <computed>
      ebs_optimized:                             "true" => "true"
      enable_monitoring:                         "true" => "true"
      iam_instance_profile:                      "staging20180723003138205600000007" => "staging20180723003138205600000007"
      image_id:                                  "ami-179fc16f" => "ami-c82004b0" (forces new resource)
      instance_type:                             "m4.large" => "m4.large"
      key_name:                                  "infra-deployer" => "infra-deployer"
      name:                                      "staging-k8s-worker2018072300313993610000000b" => <computed>
      name_prefix:                               "staging-k8s-worker" => "staging-k8s-worker"
      root_block_device.#:                       "1" => "1"
      root_block_device.0.delete_on_termination: "true" => "true"
      root_block_device.0.iops:                  "0" => "0"
      root_block_device.0.volume_size:           "20" => "48" (forces new resource)
      root_block_device.0.volume_type:           "gp2" => "gp2"
      security_groups.#:                         "2" => "2"
      security_groups.1093865381:                "sg-cf611ebf" => "sg-cf611ebf"
      security_groups.3825257995:                "sg-45093e3b" => "sg-45093e3b"
      user_data_base64: "REDACTED "

  - module.eks.aws_launch_configuration.workers[1]

from terraform-aws-eks.

brandonjbjelland avatar brandonjbjelland commented on September 10, 2024

The good news is the unit of deployment in a k8s-centric system shouldn't be the worker nodes themselves so having to roll out a refreshed worker group often doesn't seem likely, though it's an eventuality given that AMIs all need updates and retirement.

from terraform-aws-eks.

hobbsh avatar hobbsh commented on September 10, 2024

@brandoconnor thanks for the thoughts! I was pretty caught up in having one worker group and no extra resources that I kinda glossed over the concept of alternating the worker groups (instead of creating a new one and trying to delete all old ones completely). I'll give this a shot next time I need to roll out a new AMI (probably in a few weeks).

Compared to something like kubespray, this module combined with a managed controlplane allows much greater flexibility (I can create a mirror cluster in a different region in 20 minutes!) so I really appreciate the work put in here! This is probably good to close and hopefully helps other people in a similar situation.

from terraform-aws-eks.

github-actions avatar github-actions commented on September 10, 2024

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

from terraform-aws-eks.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.