Giter Site home page Giter Site logo

aws-samples / ecs-cid-sample Goto Github PK

View Code? Open in Web Editor NEW
103.0 55.0 77.0 7.83 MB

In this code provided with the blog, we will demonstrate how to use the draining state to update the AMI used by EC2 instances in your cluster by updating the launch configuration of your auto-scaling group. The process also ensures that the EC2 instances get the tasks drained the tasks launch on new container instance before termination occurs.

License: Apache License 2.0

ecs-cid-sample's Introduction

ECS Container draining

Amazon EC2 Container Service (Amazon ECS) is a container management service that makes it easy to run, stop, and manage Docker containers on a cluster of Amazon EC2 instances. When you run tasks using Amazon ECS, you place them on a cluster, which is a logical grouping of EC2 instances. Amazon ECS downloads your container images from a registry that you specify, and runs those images on the container instances within your cluster.

There are times when EC2 instances need to be removed from the cluster, for example cluster scale-down or updating an AMI. Today we have delivered Container Instance Draining to simplify these scenarios. The draining state prevents new tasks from being started on the container instance, notifies the service scheduler to move tasks that are running on the instance to other instances in the cluster, and enables you to wait until tasks have successfully moved before terminating the instance.

Overview of steps

  1. Download the CloudFormation template

  2. Launch the CloudFormation template that creates the following AWS resources:

  • The VPC and associated network elements (subnets, security groups, route table, etc)
  • ECS Cluster, ECS service, a sample ECS task definition
  • Auto scaling group with two EC2 instances and a termination lifecycle hook
  • Lambda function, permissions to invoke lambda, and lambda execution roles
  • SNS topic, policy

For the full solution overview visit Blog link.

CloudFormation template

  • cform/ecs.yaml

Copyright 2016-2017 Amazon.com, Inc. or its affiliates. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at

http://aws.amazon.com/apache2.0/

or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

ecs-cid-sample's People

Contributors

cbbarclay avatar hyandell avatar masneyb avatar mperi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ecs-cid-sample's Issues

The TTL in the architecture diagram is not in the code

The architecture diagram shows the lambda code doing a TTL check. It doesn't seem to be doing that anymore.

I'd guess the TTL isn't necessary because the Lifecycle hook has a HeartbeatTimeout of 900. And from what I can tell the lambda function never sends a heartbeat. It only sends a completion message. So after 900 seconds the autoscaller will terminate the instance. And then the lambda function will see the instance doesn't exist anymore and it will stop repeating.

'ECS' object has no attribute 'update_container_instances_state': AttributeError

Seems impossible from the code, but it is happening and stopping it from working. Perhaps this function isn't available in lambda?

'ECS' object has no attribute 'update_container_instances_state': AttributeError
Traceback (most recent call last):
File "/var/task/asg_drain_ecs_tasks.py", line 166, in lambda_handler
tasksRunning, tmpMsgAppend = checkContainerInstanceTaskStatus(Ec2InstanceId)
File "/var/task/asg_drain_ecs_tasks.py", line 90, in checkContainerInstanceTaskStatus
ecsResponse = ecsClient.update_container_instances_state(cluster=clusterName,containerInstances=[containerInstanceId],status='DRAINING')
AttributeError: 'ECS' object has no attribute 'update_container_instances_state'

Invalid type for parameter cluster

Parameter validation failed:
Invalid type for parameter cluster, value: None, type: <type 'NoneType'>, valid types: <type 'basestring'>: ParamValidationError
Traceback (most recent call last):
File "/var/task/index.py", line 141, in lambda_handler
clusterListResp = ecsClient.list_container_instances(cluster=clusterName)
File "/var/task/botocore/client.py", line 253, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/var/task/botocore/client.py", line 517, in _make_api_call
api_params, operation_model, context=request_context)
File "/var/task/botocore/client.py", line 572, in _convert_to_request_dict
api_params, operation_model)
File "/var/task/botocore/validate.py", line 270, in serialize_to_request
raise ParamValidationError(report=report.generate_report())
ParamValidationError: Parameter validation failed:
Invalid type for parameter cluster, value: None, type: <type 'NoneType'>, valid types: <type 'basestring'>

Instance is terminate despite the existence of a task in Deactivating status

The life cycle of ECS task status has been changed to transition Deactivating.
As a result, the task is not counted in the runningTasksCount result and a complete_lifecycle_action is performed even though there is a task in the Deactivating state.

As a result, the instance is terminated even though the task is not stopped.

Various issue with the sample code

    • The code can post messages to the wrong SNS topic when retrying. It looks for the first SNS topic in the account that has a lambda function subscribed to it and posts the retry message to that topic. See line # 161 in index.py for the implementation. The SNS message that is received by the Lambda already has the ARN of the SNS topic.
    • The code does not do any kind of pagination against the ECS API when reading the list of EC2 instances. So if it couldn't find the instance ID that was about to be terminated on the first page, then the instance was not set to DRAINING and the end users would see 50X messages when the operation timed out and autoscaling killed the instance.
  1. There is a large amount of unused code and variables in the in the code. The index.py file can be knocked down by half. #19 should resolve the unused variables.

    • The retry logic did not put in any kind of delay in place. The Lambda function would be invoked about 5-10 times a second, and each Lambda function invocation would provably make close to a dozen AWS API calls. This could result in accounts being throttled.
    • The .zip archive code differs to what is in VCS (particular there are changes to code/index.py. which are not included in the bundled archive. This could result in unexpected results for users following the blog article's recommendation

SNS List Subscription is a paginated response

https://github.com/awslabs/ecs-cid-sample/blob/9b026517c9190f013c655e6fa99360754fabcc4b/code/index.py#L166

The List Subscription response is paginated, however, the code does not account for this.

Example Response:

<ListSubscriptionsResponse 
  xmlns="http://sns.amazonaws.com/doc/2010-03-31/">
  <ListSubscriptionsResult>
    <Subscriptions>
      <member>
        <Owner>xxxxxxxxx</Owner>
        <Endpoint>https://alert.victorops.com/integrations/cloudwatch/xxxxxxx/alert/80160dea-e842-457f-b6d6-46c8e2af5b53/taxonline</Endpoint>
        <Protocol>https</Protocol>
        <SubscriptionArn>arn:aws:sns:ap-southeast-2:xxxxxxxxx:taxonline-redis-prod-redis-sns-alarms:87ef97ff-f038-4aed-a95f-524b87b7d185</SubscriptionArn>
        <TopicArn>arn:aws:sns:ap-southeast-2:xxxxxxxxx:taxonline-redis-prod-redis-sns-alarms</TopicArn>
      </member>
      <member>
        <Owner>xxxxxxxxx</Owner>
        <Endpoint>https://alert.victorops.com/integrations/cloudwatch/xxxxxxx/alert/80160dea-e842-457f-b6d6-46c8e2af5b53/taxonline</Endpoint>
        <Protocol>https</Protocol>
        <SubscriptionArn>arn:aws:sns:ap-southeast-2:xxxxxxxxx:taxonline-stack-prod-i-alb-sns-topic:e376a7c0-2e15-4929-ad63-1bc5e7588f52</SubscriptionArn>
        <TopicArn>arn:aws:sns:ap-southeast-2:xxxxxxxxx:taxonline-stack-prod-i-alb-sns-topic</TopicArn>
      </member>
      <member>
        <Owner>xxxxxxxxx</Owner>
        <Endpoint>https://alert.victorops.com/integrations/cloudwatch/xxxxxxx/alert/80160dea-e842-457f-b6d6-46c8e2af5b53/taxonline</Endpoint>
        <Protocol>https</Protocol>
        <SubscriptionArn>arn:aws:sns:ap-southeast-2:xxxxxxxxx:tax-api-prod-sns-alarms:7b793a6f-1ee0-4c27-a1f6-d934fc183e43</SubscriptionArn>
        <TopicArn>arn:aws:sns:ap-southeast-2:xxxxxxxxx:tax-api-prod-sns-alarms</TopicArn>
      </member>
      <member>
        <Owner>xxxxxxxxx</Owner>
        <Endpoint>arn:aws:sqs:ap-southeast-2:xxxxxxxxx:lifecycled-i-0e5020994255d0b79</Endpoint>
        <Protocol>sqs</Protocol>
        <SubscriptionArn>arn:aws:sns:ap-southeast-2:xxxxxxxxx:buildkite-tax-online-prod-agents-AgentLifecycleTopic-1TSGM5VYT7GKC:a07db5b7-ccb6-4e68-ae21-97550680aff8</SubscriptionArn>
        <TopicArn>arn:aws:sns:ap-southeast-2:xxxxxxxxx:buildkite-tax-online-prod-agents-AgentLifecycleTopic-1TSGM5VYT7GKC</TopicArn>
      </member>
      <member>
        <Owner>xxxxxxxxx</Owner>
        <Endpoint>arn:aws:sqs:ap-southeast-2:xxxxxxxxx:lifecycled-i-03706f88220376c56</Endpoint>
        <Protocol>sqs</Protocol>
        <SubscriptionArn>arn:aws:sns:ap-southeast-2:xxxxxxxxx:buildkite-tax-online-prod-agents-AgentLifecycleTopic-1TSGM5VYT7GKC:f1845ba5-00d3-45c2-b29e-25d912a23f0a</SubscriptionArn>
        <TopicArn>arn:aws:sns:ap-southeast-2:xxxxxxxxx:buildkite-tax-online-prod-agents-AgentLifecycleTopic-1TSGM5VYT7GKC</TopicArn>
      </member>
      <member>
        <Owner>xxxxxxxxx</Owner>
        <Endpoint>https://alert.victorops.com/integrations/cloudwatch/xxxxxxx/alert/80160dea-e842-457f-b6d6-46c8e2af5b53/taxonline</Endpoint>
        <Protocol>https</Protocol>
        <SubscriptionArn>arn:aws:sns:ap-southeast-2:xxxxxxxxx:taxonline-taxapi-db-prod-db-sns-alarms:a0f9075f-bf52-483c-ba8b-bc7f125d39e7</SubscriptionArn>
        <TopicArn>arn:aws:sns:ap-southeast-2:xxxxxxxxx:taxonline-taxapi-db-prod-db-sns-alarms</TopicArn>
      </member>
      <member>
        <Owner>xxxxxxxxx</Owner>
        <Endpoint>https://alert.victorops.com/integrations/cloudwatch/xxxxxxx/alert/80160dea-e842-457f-b6d6-46c8e2af5b53/taxonline</Endpoint>
        <Protocol>https</Protocol>
        <SubscriptionArn>arn:aws:sns:ap-southeast-2:xxxxxxxxx:taxonline-stack-prod-ecs-sns-topic:e72f5958-96e2-4f97-9520-5baad5e60f9b</SubscriptionArn>
        <TopicArn>arn:aws:sns:ap-southeast-2:xxxxxxxxx:taxonline-stack-prod-ecs-sns-topic</TopicArn>
      </member>
      <member>
        <Owner>xxxxxxxxx</Owner>
        <Endpoint>arn:aws:sqs:ap-southeast-2:xxxxxxxxx:lifecycled-i-0abfb17ea24a975d3</Endpoint>
        <Protocol>sqs</Protocol>
        <SubscriptionArn>arn:aws:sns:ap-southeast-2:xxxxxxxxx:buildkite-tax-online-prod-agents-AgentLifecycleTopic-1TSGM5VYT7GKC:ced09db8-b3cc-417a-9d4f-9dca0d74272c</SubscriptionArn>
        <TopicArn>arn:aws:sns:ap-southeast-2:xxxxxxxxx:buildkite-tax-online-prod-agents-AgentLifecycleTopic-1TSGM5VYT7GKC</TopicArn>
      </member>
    </Subscriptions>
    <NextToken>AAFCsV6MDdkuvnFSqkLx1Sz5r+BoZb+4d4whkgiXP5l6zw==</NextToken>
  </ListSubscriptionsResult>
  <ResponseMetadata>
    <RequestId>116daafe-533e-5bd1-ae82-f8651ceeb4f3</RequestId>
  </ResponseMetadata>
</ListSubscriptionsResponse>

unable to create "ASGTerminateHook" via CloudFormation in cn-north-1 region

debug log attached:

2017-12-30 22:13:46,474 - MainThread - awscli.clidriver - DEBUG - CLI version: aws-cli/1.14.9 Python/3.6.3 Darwin/16.7.0 botocore/1.8.13
2017-12-30 22:13:46,475 - MainThread - awscli.clidriver - DEBUG - Arguments entered to CLI: ['autoscaling', 'put-lifecycle-hook', '--lifecycle-hook-name', 'my-hook', '--auto-scaling-group-name', 'drain-test-EcsInstanceAsg-6TJ96189ZKY3', '--lifecycle-transition', 'autoscaling:EC2_INSTANCE_TERMINATING', '--notification-target-arn', 'arn:aws-cn:sns:cn-north-1:157197624774:drain-test-ASGSNSTopic-C9V6GOPIDIN7', '--role-arn', 'arn:aws-cn:iam::157197624774:role/drain-test-SNSLambdaRole-1EPINIZB33OMT', '--debug']
2017-12-30 22:13:46,475 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function add_scalar_parsers at 0x1057f4378>
2017-12-30 22:13:46,475 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2017-12-30 22:13:46,476 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function inject_assume_role_provider_cache at 0x1052ff840>
2017-12-30 22:13:46,476 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2017-12-30 22:13:46,476 - MainThread - botocore.session - DEBUG - Loading variable credentials_file from defaults.
2017-12-30 22:13:46,476 - MainThread - botocore.session - DEBUG - Loading variable config_file from defaults.
2017-12-30 22:13:46,476 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2017-12-30 22:13:46,476 - MainThread - botocore.session - DEBUG - Loading variable metadata_service_timeout from defaults.
2017-12-30 22:13:46,476 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2017-12-30 22:13:46,476 - MainThread - botocore.session - DEBUG - Loading variable metadata_service_num_attempts from defaults.
2017-12-30 22:13:46,477 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2017-12-30 22:13:46,477 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function attach_history_handler at 0x1056a8d90>
2017-12-30 22:13:46,477 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2017-12-30 22:13:46,477 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2017-12-30 22:13:46,478 - MainThread - botocore.session - DEBUG - Loading variable api_versions from defaults.
2017-12-30 22:13:46,478 - MainThread - botocore.loaders - DEBUG - Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/autoscaling/2011-01-01/service-2.json
2017-12-30 22:13:46,482 - MainThread - botocore.hooks - DEBUG - Event service-data-loaded.autoscaling: calling handler <function register_retries_for_service at 0x104e180d0>
2017-12-30 22:13:46,482 - MainThread - botocore.handlers - DEBUG - Registering retry handlers for service: autoscaling
2017-12-30 22:13:46,484 - MainThread - botocore.hooks - DEBUG - Event building-command-table.autoscaling: calling handler <function add_waiters at 0x1057fd620>
2017-12-30 22:13:46,490 - MainThread - awscli.clidriver - DEBUG - OrderedDict([('lifecycle-hook-name', <awscli.arguments.CLIArgument object at 0x1059abdd8>), ('auto-scaling-group-name', <awscli.arguments.CLIArgument object at 0x1059abe10>), ('lifecycle-transition', <awscli.arguments.CLIArgument object at 0x1059abe48>), ('role-arn', <awscli.arguments.CLIArgument object at 0x1059abeb8>), ('notification-target-arn', <awscli.arguments.CLIArgument object at 0x1059a3320>), ('notification-metadata', <awscli.arguments.CLIArgument object at 0x1059abe80>), ('heartbeat-timeout', <awscli.arguments.CLIArgument object at 0x1059abf98>), ('default-result', <awscli.arguments.CLIArgument object at 0x1059b4048>)])
2017-12-30 22:13:46,490 - MainThread - botocore.hooks - DEBUG - Event building-argument-table.autoscaling.put-lifecycle-hook: calling handler <function add_streaming_output_arg at 0x1057f48c8>
2017-12-30 22:13:46,490 - MainThread - botocore.hooks - DEBUG - Event building-argument-table.autoscaling.put-lifecycle-hook: calling handler <function add_cli_input_json at 0x105302048>
2017-12-30 22:13:46,490 - MainThread - botocore.hooks - DEBUG - Event building-argument-table.autoscaling.put-lifecycle-hook: calling handler <function unify_paging_params at 0x105778d08>
2017-12-30 22:13:46,501 - MainThread - botocore.loaders - DEBUG - Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/autoscaling/2011-01-01/paginators-1.json
2017-12-30 22:13:46,502 - MainThread - botocore.hooks - DEBUG - Event building-argument-table.autoscaling.put-lifecycle-hook: calling handler <function add_generate_skeleton at 0x10575e620>
2017-12-30 22:13:46,502 - MainThread - botocore.hooks - DEBUG - Event before-building-argument-table-parser.autoscaling.put-lifecycle-hook: calling handler <bound method OverrideRequiredArgsArgument.override_required_args of <awscli.customizations.cliinputjson.CliInputJSONArgument object at 0x1059b40b8>>
2017-12-30 22:13:46,502 - MainThread - botocore.hooks - DEBUG - Event before-building-argument-table-parser.autoscaling.put-lifecycle-hook: calling handler <bound method GenerateCliSkeletonArgument.override_required_args of <awscli.customizations.generatecliskeleton.GenerateCliSkeletonArgument object at 0x1059b40f0>>
2017-12-30 22:13:46,504 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.autoscaling.put-lifecycle-hook.lifecycle-hook-name: calling handler <function uri_param at 0x10528ce18>
2017-12-30 22:13:46,504 - MainThread - botocore.hooks - DEBUG - Event process-cli-arg.autoscaling.put-lifecycle-hook: calling handler <awscli.argprocess.ParamShorthandParser object at 0x1052c7860>
2017-12-30 22:13:46,504 - MainThread - awscli.arguments - DEBUG - Unpacked value of 'my-hook' for parameter "lifecycle_hook_name": 'my-hook'
2017-12-30 22:13:46,504 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.autoscaling.put-lifecycle-hook.auto-scaling-group-name: calling handler <function uri_param at 0x10528ce18>
2017-12-30 22:13:46,504 - MainThread - botocore.hooks - DEBUG - Event process-cli-arg.autoscaling.put-lifecycle-hook: calling handler <awscli.argprocess.ParamShorthandParser object at 0x1052c7860>
2017-12-30 22:13:46,504 - MainThread - awscli.arguments - DEBUG - Unpacked value of 'drain-test-EcsInstanceAsg-6TJ96189ZKY3' for parameter "auto_scaling_group_name": 'drain-test-EcsInstanceAsg-6TJ96189ZKY3'
2017-12-30 22:13:46,504 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.autoscaling.put-lifecycle-hook.lifecycle-transition: calling handler <function uri_param at 0x10528ce18>
2017-12-30 22:13:46,504 - MainThread - botocore.hooks - DEBUG - Event process-cli-arg.autoscaling.put-lifecycle-hook: calling handler <awscli.argprocess.ParamShorthandParser object at 0x1052c7860>
2017-12-30 22:13:46,505 - MainThread - awscli.arguments - DEBUG - Unpacked value of 'autoscaling:EC2_INSTANCE_TERMINATING' for parameter "lifecycle_transition": 'autoscaling:EC2_INSTANCE_TERMINATING'
2017-12-30 22:13:46,505 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.autoscaling.put-lifecycle-hook.role-arn: calling handler <function uri_param at 0x10528ce18>
2017-12-30 22:13:46,505 - MainThread - botocore.hooks - DEBUG - Event process-cli-arg.autoscaling.put-lifecycle-hook: calling handler <awscli.argprocess.ParamShorthandParser object at 0x1052c7860>
2017-12-30 22:13:46,505 - MainThread - awscli.arguments - DEBUG - Unpacked value of 'arn:aws-cn:iam::157197624774:role/drain-test-SNSLambdaRole-1EPINIZB33OMT' for parameter "role_arn": 'arn:aws-cn:iam::157197624774:role/drain-test-SNSLambdaRole-1EPINIZB33OMT'
2017-12-30 22:13:46,505 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.autoscaling.put-lifecycle-hook.notification-target-arn: calling handler <function uri_param at 0x10528ce18>
2017-12-30 22:13:46,505 - MainThread - botocore.hooks - DEBUG - Event process-cli-arg.autoscaling.put-lifecycle-hook: calling handler <awscli.argprocess.ParamShorthandParser object at 0x1052c7860>
2017-12-30 22:13:46,505 - MainThread - awscli.arguments - DEBUG - Unpacked value of 'arn:aws-cn:sns:cn-north-1:157197624774:drain-test-ASGSNSTopic-C9V6GOPIDIN7' for parameter "notification_target_arn": 'arn:aws-cn:sns:cn-north-1:157197624774:drain-test-ASGSNSTopic-C9V6GOPIDIN7'
2017-12-30 22:13:46,505 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.autoscaling.put-lifecycle-hook.notification-metadata: calling handler <function uri_param at 0x10528ce18>
2017-12-30 22:13:46,505 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.autoscaling.put-lifecycle-hook.heartbeat-timeout: calling handler <function uri_param at 0x10528ce18>
2017-12-30 22:13:46,505 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.autoscaling.put-lifecycle-hook.default-result: calling handler <function uri_param at 0x10528ce18>
2017-12-30 22:13:46,506 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.autoscaling.put-lifecycle-hook.cli-input-json: calling handler <function uri_param at 0x10528ce18>
2017-12-30 22:13:46,506 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.autoscaling.put-lifecycle-hook.generate-cli-skeleton: calling handler <function uri_param at 0x10528ce18>
2017-12-30 22:13:46,506 - MainThread - botocore.hooks - DEBUG - Event calling-command.autoscaling.put-lifecycle-hook: calling handler <bound method CliInputJSONArgument.add_to_call_parameters of <awscli.customizations.cliinputjson.CliInputJSONArgument object at 0x1059b40b8>>
2017-12-30 22:13:46,506 - MainThread - botocore.hooks - DEBUG - Event calling-command.autoscaling.put-lifecycle-hook: calling handler <bound method GenerateCliSkeletonArgument.generate_json_skeleton of <awscli.customizations.generatecliskeleton.GenerateCliSkeletonArgument object at 0x1059b40f0>>
2017-12-30 22:13:46,506 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2017-12-30 22:13:46,506 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2017-12-30 22:13:46,506 - MainThread - botocore.session - DEBUG - Loading variable region from config file with value 'cn-north-1'.
2017-12-30 22:13:46,506 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2017-12-30 22:13:46,506 - MainThread - botocore.session - DEBUG - Loading variable ca_bundle from defaults.
2017-12-30 22:13:46,506 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2017-12-30 22:13:46,506 - MainThread - botocore.session - DEBUG - Loading variable api_versions from defaults.
2017-12-30 22:13:46,507 - MainThread - botocore.credentials - DEBUG - Looking for credentials via: env
2017-12-30 22:13:46,507 - MainThread - botocore.credentials - DEBUG - Looking for credentials via: assume-role
2017-12-30 22:13:46,509 - MainThread - botocore.credentials - DEBUG - Looking for credentials via: shared-credentials-file
2017-12-30 22:13:46,511 - MainThread - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials
2017-12-30 22:13:46,512 - MainThread - botocore.loaders - DEBUG - Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/endpoints.json
2017-12-30 22:13:46,515 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2017-12-30 22:13:46,515 - MainThread - botocore.hooks - DEBUG - Event choose-service-name: calling handler <function handle_service_name_alias at 0x104e00400>
2017-12-30 22:13:46,519 - MainThread - botocore.hooks - DEBUG - Event creating-client-class.autoscaling: calling handler <function add_generate_presigned_url at 0x104d96f28>
2017-12-30 22:13:46,521 - MainThread - botocore.args - DEBUG - The s3 config key is not a dictionary type, ignoring its value of: None
2017-12-30 22:13:46,536 - MainThread - botocore.endpoint - DEBUG - Setting autoscaling timeout as (60, 60)
2017-12-30 22:13:46,536 - MainThread - botocore.client - DEBUG - Registering retry handlers for service: autoscaling
2017-12-30 22:13:46,537 - MainThread - botocore.hooks - DEBUG - Event before-parameter-build.autoscaling.PutLifecycleHook: calling handler <function generate_idempotent_uuid at 0x104e139d8>
2017-12-30 22:13:46,537 - MainThread - botocore.endpoint - DEBUG - Making request for OperationModel(name=PutLifecycleHook) (verify_ssl=True) with params: {'url_path': '/', 'query_string': '', 'method': 'POST', 'headers': {'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8', 'User-Agent': 'aws-cli/1.14.9 Python/3.6.3 Darwin/16.7.0 botocore/1.8.13'}, 'body': {'Action': 'PutLifecycleHook', 'Version': '2011-01-01', 'LifecycleHookName': 'my-hook', 'AutoScalingGroupName': 'drain-test-EcsInstanceAsg-6TJ96189ZKY3', 'LifecycleTransition': 'autoscaling:EC2_INSTANCE_TERMINATING', 'RoleARN': 'arn:aws-cn:iam::157197624774:role/drain-test-SNSLambdaRole-1EPINIZB33OMT', 'NotificationTargetARN': 'arn:aws-cn:sns:cn-north-1:157197624774:drain-test-ASGSNSTopic-C9V6GOPIDIN7'}, 'url': 'https://autoscaling.cn-north-1.amazonaws.com.cn/', 'context': {'client_region': 'cn-north-1', 'client_config': <botocore.config.Config object at 0x105a6cf28>, 'has_streaming_input': False, 'auth_type': None}}
2017-12-30 22:13:46,537 - MainThread - botocore.hooks - DEBUG - Event request-created.autoscaling.PutLifecycleHook: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x105a6ce10>>
2017-12-30 22:13:46,537 - MainThread - botocore.hooks - DEBUG - Event choose-signer.autoscaling.PutLifecycleHook: calling handler <function set_operation_specific_signer at 0x104e138c8>
2017-12-30 22:13:46,538 - MainThread - botocore.auth - DEBUG - Calculating signature using v4 auth.
2017-12-30 22:13:46,538 - MainThread - botocore.auth - DEBUG - CanonicalRequest:
POST
/

content-type:application/x-www-form-urlencoded; charset=utf-8
host:autoscaling.cn-north-1.amazonaws.com.cn
x-amz-date:20171230T141346Z

content-type;host;x-amz-date
dac71e31547c0406fbe5f9b0a25c0cbea1685f6222b5d84728a5c66e6c9df674
2017-12-30 22:13:46,538 - MainThread - botocore.auth - DEBUG - StringToSign:
AWS4-HMAC-SHA256
20171230T141346Z
20171230/cn-north-1/autoscaling/aws4_request
404656a0fac924bedb409a3e87e707d16fe270756723c8f1c94100ae87f43ec6
2017-12-30 22:13:46,539 - MainThread - botocore.auth - DEBUG - Signature:
c197cc793db4b313fa053d983ff7ae8fcb4145dbbcd98944484d851754253a21
2017-12-30 22:13:46,541 - MainThread - botocore.endpoint - DEBUG - Sending http request: <PreparedRequest [POST]>
2017-12-30 22:13:46,542 - MainThread - botocore.vendored.requests.packages.urllib3.connectionpool - INFO - Starting new HTTPS connection (1): autoscaling.cn-north-1.amazonaws.com.cn
2017-12-30 22:13:46,942 - MainThread - botocore.vendored.requests.packages.urllib3.connectionpool - DEBUG - "POST / HTTP/1.1" 400 556
2017-12-30 22:13:46,947 - MainThread - botocore.parsers - DEBUG - Response headers: {'x-amzn-requestid': 'a4ef28f8-ed6b-11e7-8518-77f8caf04164', 'content-type': 'text/xml', 'content-length': '556', 'date': 'Sat, 30 Dec 2017 14:13:43 GMT', 'connection': 'close'}
2017-12-30 22:13:46,947 - MainThread - botocore.parsers - DEBUG - Response body:
b'\n \n Sender\n ValidationError\n Unable to publish test message to notification target arn:aws-cn:sns:cn-north-1:157197624774:drain-test-ASGSNSTopic-C9V6GOPIDIN7 using IAM role arn:aws-cn:iam::157197624774:role/drain-test-SNSLambdaRole-1EPINIZB33OMT. Please check your target and role configuration and try to put lifecycle hook again.\n \n a4ef28f8-ed6b-11e7-8518-77f8caf04164\n\n'
2017-12-30 22:13:46,947 - MainThread - botocore.hooks - DEBUG - Event needs-retry.autoscaling.PutLifecycleHook: calling handler <botocore.retryhandler.RetryHandler object at 0x1058b2ef0>
2017-12-30 22:13:46,947 - MainThread - botocore.retryhandler - DEBUG - No retry needed.
2017-12-30 22:13:46,949 - MainThread - awscli.clidriver - DEBUG - Exception caught in main()
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/awscli/clidriver.py", line 207, in main
return command_table[parsed_args.command](remaining, parsed_args)
File "/usr/local/lib/python3.6/site-packages/awscli/clidriver.py", line 347, in call
return command_table[parsed_args.operation](remaining, parsed_globals)
File "/usr/local/lib/python3.6/site-packages/awscli/clidriver.py", line 519, in call
call_parameters, parsed_globals)
File "/usr/local/lib/python3.6/site-packages/awscli/clidriver.py", line 639, in invoke
client, operation_name, parameters, parsed_globals)
File "/usr/local/lib/python3.6/site-packages/awscli/clidriver.py", line 651, in _make_client_call
**parameters)
File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 317, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 615, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (ValidationError) when calling the PutLifecycleHook operation: Unable to publish test message to notification target arn:aws-cn:sns:cn-north-1:157197624774:drain-test-ASGSNSTopic-C9V6GOPIDIN7 using IAM role arn:aws-cn:iam::157197624774:role/drain-test-SNSLambdaRole-1EPINIZB33OMT. Please check your target and role configuration and try to put lifecycle hook again.
2017-12-30 22:13:46,953 - MainThread - awscli.clidriver - DEBUG - Exiting with rc 255

An error occurred (ValidationError) when calling the PutLifecycleHook operation: Unable to publish test message to notification target arn:aws-cn:sns:cn-north-1:157197624774:drain-test-ASGSNSTopic-C9V6GOPIDIN7 using IAM role arn:aws-cn:iam::157197624774:role/drain-test-SNSLambdaRole-1EPINIZB33OMT. Please check your target and role configuration and try to put lifecycle hook again.

zip build instructions missing

It would be useful to have the index.zip build instructions on the README. Looks like missing some files such as a requirements.txt to pin the specific pip versions that the Lambda is compatible with.

paging snsClient.list_subscriptions()

snsClient.list_subscriptions() seems to be returning 6 endpoints and paging 7+ to next page.
workarround as follow:

@@ -163,14 +163,17 @@ def lambda_handler(event, context):
 
             # If tasks are still running...
             if tasksRunning == 1:
-                response = snsClient.list_subscriptions()
-                for key in response['Subscriptions']:
-                    logger.info("Endpoint %s AND TopicArn %s and protocol %s ",key['Endpoint'], key['TopicArn'],
-                                                                                  key['Protocol'])
-                    if TopicArn == key['TopicArn'] and key['Protocol'] == 'lambda':
-                        logger.info("TopicArn match, publishToSNS function...")
-                        msgResponse = publishToSNS(message, key['TopicArn'])
-                        logger.debug("msgResponse %s and time is %s",msgResponse, datetime.datetime)
+                paginator = snsClient.get_paginator('list_subscriptions')
+                subscriptionListPages = paginator.paginate()
+                for subscriptionListResp in subscriptionListPages:
+                    #response = snsClient.list_subscriptions()
+                    for key in subscriptionListResp['Subscriptions']:
+                        logger.info("Endpoint %s AND TopicArn %s and protocol %s ",key['Endpoint'], key['TopicArn'],
+                                                                                      key['Protocol'])
+                        if TopicArn == key['TopicArn'] and key['Protocol'] == 'lambda':
+                            logger.info("TopicArn match, publishToSNS function...")
+                            msgResponse = publishToSNS(message, key['TopicArn'])
+                            logger.debug("msgResponse %s and time is %s",msgResponse, datetime.datetime)
             # If tasks are NOT running...
             elif tasksRunning == 0:
                 completeHook = 1

Production ready solution

This code is only an example that doesn't seem to be maintained as well.

We at GetSocial wrote production-ready (we are using it in production for years) and more feature rich (spot instances draining) version: https://github.com/getsocial-rnd/ecs-drain-lambda

What do you think about archiving this repo and creating a link to our tool? Thanks!

ECS instance terminates before docker container exits on host with extended ECS_CONTAINER_STOP_TIMEOUT

When a container instance is draining, the boto3 api for list-tasks defaults to include all tasks that have a desiredStatus of RUNNING. However, the api (as of 2019-03-06) will return a desiredStatus of STOPPED before the container has exited on the instance. If the ECS cluster has a longer than default (30 seconds) ECS_CONTAINER_STOP_TIMEOUT, and your container takes a while to clean up and exit, this template will complete complete the Auto Scaling Lifecycle Hook before the container exits.

This leads to hard-to-diagnose issues where the container instance (and its logs) are terminated before expected.

Is there a way to run an equivalent of docker ps from the Lambda job to confirm a task has exited?

CFN Template has Mac CRs

Hey Madhuri,

Thanks for the great project & blog post.
It looks like your text editor left Mac CRs (^M) in the CFN template.
ecs-cid-sample/cform/ecs.yaml

AWS Lambda 'recursive loop protection' might consider the lambda function a problem

I have a variant of this code that has been working well for several years.

In the last 24h I got an email from AWS saying that AWS Lambda is introducing 'recursive loop detection'.

Starting July 5, 2023, AWS Lambda is launching recursive loop detection, a capability that automatically detects and stops certain recursive invocations. With this launch, Lambda will stop invocations for functions that utilize supported SDK versions [1] and are part of a recursive loop composed of the underlying Lambda function and Amazon SQS queue and/or Amazon SNS topic after 16 recursive calls.

To prevent potential disruption to your account, we have turned off this new feature for the accounts in the regions listed in this notification because we previously detected that you have one or more Lambda functions in the xxxxx Region that are being invoked in a recursive loop with other AWS resources. These recursive loops were detected some time during the last three months. Recursive invocations within these accounts will not be stopped.

[1] https://docs.aws.amazon.com/lambda/latest/dg/invocation-recursion.html

Code such as this draining example will be considered by AWS as a recursive loop because it publishes a message to the same SNS topic that launches the lambda function as per here.

You might want to update the example code with a suitable mitigation (perhaps some sort of decreasing counter?)

Boto3 version on lambda doesn't support the API

Thank you for this example, but it would be useful to add in the readme that the current version of Boto3 installed on the lambda AMI doesn't include the update_container_instances_state() method for the ecs client.
I know that adding the zip file with all boto3 included solves the problem, but when I tried to just deploy the python code without realizing that there was a reason for putting the entire zip there, I was a bit confused.
Hopefully the lambda AMI will be updated soon, and this can be closed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.