The management guide has a fairly extensive guide to IAM permissions (including sub-pages of that), but as far as I can tell, seems to be lacking a fairly important piece of information: what EMR nodes actually need to do their job, independent of tasks running on top of them.
Right now the guidance seems to be roughly, "use the arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceforEC2Role
managed policy, or you can customize your permissions, especially as it pertains to a security configuration that configures AssumeRole
for EMRFS".
But arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceforEC2Role
is actually pretty powerful:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Resource": "*",
"Action": [
"cloudwatch:*",
"dynamodb:*",
"ec2:Describe*",
"elasticmapreduce:Describe*",
"elasticmapreduce:ListBootstrapActions",
"elasticmapreduce:ListClusters",
"elasticmapreduce:ListInstanceGroups",
"elasticmapreduce:ListInstances",
"elasticmapreduce:ListSteps",
"kinesis:CreateStream",
"kinesis:DeleteStream",
"kinesis:DescribeStream",
"kinesis:GetRecords",
"kinesis:GetShardIterator",
"kinesis:MergeShards",
"kinesis:PutRecord",
"kinesis:SplitShard",
"rds:Describe*",
"s3:*",
"sdb:*",
"sns:*",
"sqs:*",
"glue:CreateDatabase",
"glue:UpdateDatabase",
"glue:DeleteDatabase",
"glue:GetDatabase",
"glue:GetDatabases",
"glue:CreateTable",
"glue:UpdateTable",
"glue:DeleteTable",
"glue:GetTable",
"glue:GetTables",
"glue:GetTableVersions",
"glue:CreatePartition",
"glue:BatchCreatePartition",
"glue:UpdatePartition",
"glue:DeletePartition",
"glue:BatchDeletePartition",
"glue:GetPartition",
"glue:GetPartitions",
"glue:BatchGetPartition",
"glue:CreateUserDefinedFunction",
"glue:UpdateUserDefinedFunction",
"glue:DeleteUserDefinedFunction",
"glue:GetUserDefinedFunction",
"glue:GetUserDefinedFunctions"
]
}
]
}
Basically, it can do anything to S3, SNS, SQS, SDB, DynamoDB, and several other potentially scary things.
So if you don't fully trust your EMR tasks not to delete/corrupt all your S3 buckets and Dynamo tables, you probably want to customize that policy. But the documentation doesn't make a clear distinction between what EMR itself needs and some speculative permissions on what tasks running on top of it might want.
As far as I've been able to tell, these are completely unused by EMR itself:
And S3 is at least partially used to upload logs to the configured logging bucket. Of course, if I tell my EMR job to fetch from s3://foo/bar, I'll need to also include permissions for that in my policy, but that separation is not very crisp right now.
It's also very hard for me to assess whether SNS/SQS is used internally by EMR today because both services have cross-account support so even if I see no relevant queues or topics in my account, I can't say with confidence that I'm not hobbling some uncommon EMR feature by not granting EMR access to those services.
The best experiment I've been able to run is to put the whole thing in a private subnet with no internet access and an S3 VPCE to send logs to S3. The EMR cluster seems quite content in that scenario, which suggests to me that everything but S3 is optional. But obviously if I were to tell an EMR package to fetch from Glue, that would break.
Ultimately, it would be nice to have a broken down table in the documentation saying things like (e.g.,) :
- You always need S3 PutObject and ListBucket powers over your configured logging prefix.
- If you want to use our Glue integration, you need permissions X, Y, Z on the instance IAM role
- If you want to use our EMRFS AssumeRole powers, you need to grant AssumeRole powers to the instance IAM role
Or absent that (but this isn't a documentation thing), a cleaner separation between "task powers" and "EMR machinery powers" like what we have in ECS.