Description
I've tried to deploy an EMR cluster using a custom autoscaling policy and it turned out that the cluster gets successfully created but the custom automatic scaling policy fails.
To debug this, I've started to look into the EMR events, these two were the most meaningful:
Then, I've looked into the Cloudtrail Logs, and I found out that the EMR Cluster Service Role was not able to assume the EMR Cluster Autoscaling Role. The error message was like that: Unable to assume IAM role: arn:aws:iam::aws-account-id:role/Spark-ETL-autoscaling
After that, I checked the trust relationship of the Autoscaling Role, which looked like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "EMRAssumeRole",
"Effect": "Allow",
"Principal": {
"Service": [
"elasticmapreduce.amazonaws.com",
"application-autoscaling.amazonaws.com"
]
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"aws:SourceAccount": "123456"
},
"ArnLike": {
"aws:SourceArn": "arn:aws:elasticmapreduce:eu-central-1:123456:*"
}
}
}
]
}
And I've also verified the AWS doc here, regarding the trust relationship that the autoscaling role for EMR must have:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "application-autoscaling.amazonaws.com"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"aws:SourceAccount": "<account-id>"
},
"ArnLike": {
"aws:SourceArn": "arn:aws:application-autoscaling:<region>:<account-id>:scalable-target/*"
}
}
}
]
}
It's pretty straightforward to note that the condition with "aws:SourceArn": "arn:aws:application-autoscaling:<region>:<account-id>:scalable-target/*"
is missing in the module here.
To solve the issue, I had to implement the trust relationship of the autoscaling role for EMR as following:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "EMRAssumeRole",
"Effect": "Allow",
"Principal": {
"Service": [
"elasticmapreduce.amazonaws.com",
"application-autoscaling.amazonaws.com"
]
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"aws:SourceAccount": "123456"
},
"ArnLike": {
"aws:SourceArn": [
"arn:aws:elasticmapreduce:eu-central-1:123456:*",
"arn:aws:application-autoscaling:eu-central-1:123456:scalable-target/*"
]
}
}
}
]
}
Versions
- Provider version(s): provider registry.terraform.io/hashicorp/aws v5.44.0
Reproduction Code [Required]
module "emr" {
source = "terraform-aws-modules/emr/aws"
version = "v2.0.0"
name = var.cluster_name
release_label = var.release_label
applications = var.applications
bootstrap_action = var.bootstrap_action
vpc_id = data.terraform_remote_state.vpc.outputs.vpc_id
log_uri = var.log_uri
ebs_root_volume_size = var.ebs_root_volume_size
step_concurrency_level = var.step_concurrency_level
termination_protection = var.termination_protection
ec2_attributes = {
subnet_id = var.subnet_id
key_name = "airflow"
}
configurations_json = var.configurations_json
iam_role_use_name_prefix = false
iam_instance_profile_policies = {
AmazonElasticMapReduceforEC2Role = "arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceforEC2Role"
AWSGlueConsoleFullAccess = "arn:aws:iam::aws:policy/AWSGlueConsoleFullAccess"
SecretManagerProductionReadWrite = aws_iam_policy.secret_manager_read_write.arn
AppFlow = aws_iam_policy.app_flow.arn
}
# Master Group
master_instance_group = {
name = "Master - 1"
instance_count = var.master_instance_count
instance_type = var.master_instance_type
}
# Core Group
core_instance_group = {
name = "Core - 1"
instance_count = var.core_instance_count
instance_type = var.core_instance_type
autoscaling_policy = jsonencode({
"Constraints" : {
"MinCapacity" : 2,
"MaxCapacity" : 8
},
"Rules" : [
{
"Action" : {
"SimpleScalingPolicyConfiguration" : {
"ScalingAdjustment" : 1,
"CoolDown" : 1200,
"AdjustmentType" : "CHANGE_IN_CAPACITY"
}
},
"Trigger" : {
"CloudWatchAlarmDefinition" : {
"MetricName" : "ContainerPending",
"ComparisonOperator" : "GREATER_THAN_OR_EQUAL",
"Statistic" : "AVERAGE",
"Period" : 300,
"EvaluationPeriods" : 3,
"Unit" : "COUNT",
"Namespace" : "AWS/ElasticMapReduce",
"Threshold" : 6
}
},
"Name" : "prod_emr_core_scale_out"
},
{
"Action" : {
"SimpleScalingPolicyConfiguration" : {
"ScalingAdjustment" : -1,
"CoolDown" : 600,
"AdjustmentType" : "CHANGE_IN_CAPACITY"
}
},
"Trigger" : {
"CloudWatchAlarmDefinition" : {
"MetricName" : "ContainerPending",
"ComparisonOperator" : "LESS_THAN_OR_EQUAL",
"Statistic" : "AVERAGE",
"Period" : 300,
"EvaluationPeriods" : 8,
"Unit" : "COUNT",
"Namespace" : "AWS/ElasticMapReduce",
"Threshold" : 5
}
},
"Name" : "prod_emr_core_scale_in"
}
]
}
})
}
# Security Groups
managed_security_group_use_name_prefix = false
master_security_group_rules = [ ... ]
slave_security_group_rules = [ ... ]
}
Steps to reproduce the behavior:
- Create an EMR cluster using the above code (add a
variables.tf
with some values)
- Note that the custom automatic scaling policies has the failed status
Expected behavior
The Service Role for EMR is able to assume the Autoscaling Role and there are no terraform drifts.
Actual behavior
The service Role for EMR is not able to assume the Autoscaling Role due to misconfigured trust-relationship for the Autoscaling Role, and I have a terraform drift since I had to manually change the trust relationship in the AWS Console.