Giter Site home page Giter Site logo

ecscale's Introduction

ECSCALE

A serverless app removing underutilized hosts from ECS clusters

Scaling ECS down is not a straightforward task;Based on one metric solely, an instance could be taken down causing 2 effects:

  1. Forcefully removing a host when a container is running will cut off active connections causing service downtime
  2. Removing an instance based on utilization / capacity metric may cause an endless loop of scale

To such an end, this tool will look for scaleable clusters based on multiple metrics

Once identified, the target is moved to "draining" state, where a new instance of the same task is raised on an available host. Once the new containers are ready, the draining instsnce will start draining connection from active tasks. Once the draining process is complete, the instance will be terminated.

Usage:

  1. Throw ecscale.py code to AWS Lambda providing relevant role to handle ECS and autoscaling (Instrcutions ahead)
  2. Set repeated run (recommended every 60 minutes using a cloudwatch events trigger for Lambda)
  3. That's it... Your ECS hosts are being gracefully removed if needed. No metrics/alarms needed

Changable Parameters:

  • SCALE_IN_CPU_TH = 30 # Below this EC2 average metric scaling would take action
  • SCALE_IN_MEM_TH = 60 # Below this cluster average metric scaling would take action
  • FUTURE_MEM_TH = 70 # Below this future metric scaling would take action
  • DRAIN_ALL_EMPTY_INSTANCES = True # Set to False to prevent scaling in more than one instance at a time
  • ASG_PREFIX = '' # Use this when your ASG naming convention requires a prefix (e.g. 'ecs-')
  • ASG_SUFFIX = '' # Use this when your ASG naming convention requires a suffix (e.g. '-live')
  • ECS_AVOID_STR = 'awseb' # Use this to avoid clusters containing a specific string (i.e ElasticBeanstalk clusters)
How to create a role to run ecscale:
  1. When creating the Lambda function, you'll be asked to select a role or create a new one, choose a new role
  2. Provide the json from policy.json to the role policy
  3. All set to allow ecscale to do its work
Creating a Lambda function step by step:

Flow logic

  • Iterate over existing ECS clusters
  • Check a cluster's ability to scale-in based on predicted future memory reservation capacity
  • Look for empty hosts the can be scaled
  • Look for least utilized host
  • Choose a candidate and put in draining state
  • Terminate a draining host that has no running tasks and decrease the desired number of instances

Read about it some more on Medium

ecscale's People

Contributors

cbankston avatar nathanpeck avatar omerxx avatar sannithibalaji avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

ecscale's Issues

Sequencing of actions

More of an observation that an out-and-out issue but whilst experimenting with your lambda something struck me. In the main procedure you currently:

  • Evaluate minimum state of the cluster
  • Evaluate if scaling-in is appropriate on reserved memory grounds
  • Evaluate if scaling-in is appropriate on reserved CPU grounds
  • Terminate any already-drained host instances

So in our scenario I have two hosts running tasks (one per availability zone), so a cluster with a minimum and desired size of 2. I then manually trigger our scale-up alarm (which sets desired size to 4) to force some new hosts to be added. The on the first time I run the lambda then it sees the new hosts as being surplus, and starts draining them.

The interesting part is on the next run of the lambda where the minimum state evaluation (asg_on_min_state) doesn't take into account the number of draining instances about to be terminated (i.e. desired size is still deemed to be 4). Now as we step further into the code then (as i'm in currently only experimenting in a dev environment and the containers i'm running do pretty much nothing) then the reserved memory evaluation then actually decides to start draining one of my remaining minimum 2 active boxes! Finally it also terminates our two instances that were set to draining from the first run.

So with this kind of scenario in mind would it not make sense to make termination of drained instances one of the first things to be evaluated (or at least do it before evaluating minimum state so we have a desired size figure reflective of hosts that aren't on the cusp of termination) and not the last? Either that or make the asg_on_min_state procedure take into account instances that are draining/about to be terminated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.