Comments (7)
Awww man, you're too kind @hobbsh ! This is the first I'd heard the project contrasted with kubespray. Perhaps a dedicated doc on certain operational aspects like this is warranted. I'll keep that in mind moving forward.
from terraform-aws-eks.
Thanks for that output... not the best of situations but let's see what we can't come up with.
Not the most practical solution but perhaps a slight improvement on what you've got - consider reusing the orphaned map left by a blanked out worker group entry such that a pair of entries are dedicated to a service in a blue-green fashion. A deployment becomes somewhat heavy in that it requires 3 distinct changes but it's at least somewhat better than leaving a long trail of deployment corpses.
State 1: Blue is live; green offline
map(
"name", "k8s-worker-blue",
"ami_id", "ami-179fc16f",
"asg_desired_capacity", "0",
"asg_max_size", "0",
"asg_min_size", "0",
),
map(
"name", "k8s-worker-green",
"ami_id", "ami-67a0841f",
"asg_desired_capacity", "5",
"asg_max_size", "8",
"asg_min_size", "5",
"instance_type","${lookup(var.worker_sizes, "${terraform.workspace}")}",
"key_name", "${aws_key_pair.infra-deployer.key_name}",
"root_volume_size", "48"
)
State 2: transitional, Both blue and green live. When containers are balanced, monitor, verify, and drain green.
map(
"name", "k8s-worker-blue",
"ami_id", "ami-179fc16f",
"asg_desired_capacity", "5",
"asg_max_size", "8",
"asg_min_size", "5",
"instance_type","${lookup(var.worker_sizes, "${terraform.workspace}")}",
"key_name", "${aws_key_pair.infra-deployer.key_name}",
"root_volume_size", "48"
),
map(
"name", "k8s-worker-green",
"ami_id", "ami-67a0841f",
"asg_desired_capacity", "5",
"asg_max_size", "8",
"asg_min_size", "5",
"instance_type","${lookup(var.worker_sizes, "${terraform.workspace}")}",
"key_name", "${aws_key_pair.infra-deployer.key_name}",
"root_volume_size", "48"
)
State 3: spin down green; blue takes all traffic
map(
"name", "k8s-worker-blue",
"ami_id", "ami-179fc16f",
"asg_desired_capacity", "5",
"asg_max_size", "8",
"asg_min_size", "5",
"instance_type","${lookup(var.worker_sizes, "${terraform.workspace}")}",
"key_name", "${aws_key_pair.infra-deployer.key_name}",
"root_volume_size", "48"
),
map(
"name", "k8s-worker-green",
"ami_id", "ami-67a0841f",
"asg_desired_capacity", "0",
"asg_max_size", "0",
"asg_min_size", "0",
)
If this seems like a sound enough pattern for executing an update on worker node clusters, it probably makes sense to add a quick blurb in the readme.
from terraform-aws-eks.
Hey @hobbsh - totally valid points here. Let's see if we can't find a way.
Thinking about this a bit, the situation wouldn't be so bad if we had a load balancer enforcing health in tandem with create_before_destroy
and a minimum healthy instance count. Alas, not in this brave new world...
I haven't quite put together how removing a dead asg entry from the list ends up being disruptive especially if you've explicitly named your groups as you show above. Does terraform tear down everything and recreate the ones that remain?
from terraform-aws-eks.
@brandoconnor Yes terraform sees the index change and recreates everything. Terraform will see the list indexes have changed and reassign the resource names based on the new indexes. It's a bit confusing to me why a one element worker_group list does not have [0]
appended to it in the resource name, it shows up as just module.eks.aws_autoscaling_group.workers
. Only after adding a second worker_group does the index appear, but that may be irrelevant and something I haven't noticed in Terraform until now.
I have been thinking about possibly adding a flag like delete = true
so all that's left of an old ASG map is map("delete", "true")
but that would require reworking the count
parameter on all the ASG resources. I have also done targeted destroys but then that gets messy with resources wanting to recreate since the ASG still exists in the worker_groups
list, again requiring some sort of flag to tell Terraform not to recreate. Maybe more things will be possible when Terraform v0.12 is released.
I dug back in the terraform history and found an example of what deleting the original ASG looks like (this was before I explicitly named with the AMI ID but iirc it did not make a difference - I've killed several worker groups by accident in staging this way):
~ module.eks.aws_autoscaling_group.workers
launch_configuration: "staging-k8s-worker2018072300313993610000000b" => "${element(aws_launch_configuration.workers.*.id, count.index)}"
- module.eks.aws_autoscaling_group.workers[1]
-/+ module.eks.aws_launch_configuration.workers (new resource required)
id: "staging-k8s-worker2018072300313993610000000b" => <computed> (forces new resource)
associate_public_ip_address: "false" => "false"
ebs_block_device.#: "0" => <computed>
ebs_optimized: "true" => "true"
enable_monitoring: "true" => "true"
iam_instance_profile: "staging20180723003138205600000007" => "staging20180723003138205600000007"
image_id: "ami-179fc16f" => "ami-c82004b0" (forces new resource)
instance_type: "m4.large" => "m4.large"
key_name: "infra-deployer" => "infra-deployer"
name: "staging-k8s-worker2018072300313993610000000b" => <computed>
name_prefix: "staging-k8s-worker" => "staging-k8s-worker"
root_block_device.#: "1" => "1"
root_block_device.0.delete_on_termination: "true" => "true"
root_block_device.0.iops: "0" => "0"
root_block_device.0.volume_size: "20" => "48" (forces new resource)
root_block_device.0.volume_type: "gp2" => "gp2"
security_groups.#: "2" => "2"
security_groups.1093865381: "sg-cf611ebf" => "sg-cf611ebf"
security_groups.3825257995: "sg-45093e3b" => "sg-45093e3b"
user_data_base64: "REDACTED "
- module.eks.aws_launch_configuration.workers[1]
from terraform-aws-eks.
The good news is the unit of deployment in a k8s-centric system shouldn't be the worker nodes themselves so having to roll out a refreshed worker group often doesn't seem likely, though it's an eventuality given that AMIs all need updates and retirement.
from terraform-aws-eks.
@brandoconnor thanks for the thoughts! I was pretty caught up in having one worker group and no extra resources that I kinda glossed over the concept of alternating the worker groups (instead of creating a new one and trying to delete all old ones completely). I'll give this a shot next time I need to roll out a new AMI (probably in a few weeks).
Compared to something like kubespray, this module combined with a managed controlplane allows much greater flexibility (I can create a mirror cluster in a different region in 20 minutes!) so I really appreciate the work put in here! This is probably good to close and hopefully helps other people in a similar situation.
from terraform-aws-eks.
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
from terraform-aws-eks.
Related Issues (20)
- Created ec2 instances cannot join the cluster HOT 1
- Add depends_on for the 'resource "aws_eks_addon" "before_compute"' HOT 1
- dynamic number of access_entires HOT 2
- Documentation needs improvement + linting issue?
- Using terraform <1.6.0, `aws_ec2_tag` with dynamic tag *values* results in for_each error about unknown *keys* HOT 1
- Missing node to node security group
- AWS CLB creation question HOT 1
- 'cluster_service_ipv4_cidr' input variable apparently only accepts RFC1918 HOT 1
- Add gitlab ci for module creation HOT 1
- Use aws_vpc_security_group_egress_rule and aws_vpc_security_group_ingress_rule instead of aws_security_group_rule HOT 1
- [karpenter] enable ability to import existing eventbridge rules HOT 4
- [Karpenter] upgrade to v.1.0 HOT 2
- Karpenter module terraform destroy running into: DeleteConflict: Cannot delete a policy attached to entities HOT 1
- Support conditional instance refreshing configuration HOT 4
- Karpenter is not authorized to perform: ec2:CreateTags on exiting WorkerNodes HOT 2
- OIDC URL alternatives to use on `aws_eks_identity_provider_config` for IRSA HOT 4
- Docs is not right HOT 3
- Karpenter module tries to re-add empty tags to aws_eks_pod_identity_association HOT 3
- Support using Cilium as a CNI HOT 1
- Module assigning incorrect security groups HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from terraform-aws-eks.