Giter Site home page Giter Site logo

artilleryio / chaos-lambda Goto Github PK

View Code? Open in Web Editor NEW
288.0 14.0 26.0 48 KB

Serverless chaos monkey for AWS (runs on AWS Lambda) ☁️ 💥

License: Mozilla Public License 2.0

JavaScript 100.00%
reliability-engineering aws fault-tolerance chaos-monkey

chaos-lambda's Introduction

      _                       _                 _         _
  ___| |__   __ _  ___  ___  | | __ _ _ __ ___ | |__   __| | __ _
 / __| '_ \ / _` |/ _ \/ __| | |/ _` | '_ ` _ \| '_ \ / _` |/ _` |
| (__| | | | (_| | (_) \__ \ | | (_| | | | | | | |_) | (_| | (_| |
 \___|_| |_|\__,_|\___/|___/ |_|\__,_|_| |_| |_|_.__/ \__,_|\__,_|

Meet Chaos Lambda

Chaos Lambda is a serverless implementation of Netflix's Chaos Monkey.

It will wreak havoc* on your AWS infrastructure to help you build systems that are lean, mean, and resilient to failure.

* - in an extremely controlled manner - Chaos Lambda is disabled by default

About

Chaos Lambda is a small tool for testing resiliency and recoverability of AWS-based architectures. Once configured and deployed, it will randomly terminate or otherwise interfere* with the operation of your EC2 instances and ECS tasks. It is inspired by Netflix's Chaos Monkey, but instead of requiring an EC2 instance to run on, it uses AWS Lambda. Think of it as Chaos Monkey rebuilt with modern tech.

Installation

You need Node.js to use Chaos Lambda (we will rewrite the CLI in Golang ats some point):

# npm comes bundled with Node.js
npm install -g chaos-lambda

Setting Up

AWS Configuration

An IAM user and a role for the lambda need to be set up first.

IAM User

Must be set up and credentials set up in ~/.aws/credentials

Lambda Role

Required policies:

  • AmazonEC2FullAccess

Setting up Chaos Lambda

To create the AWS Lambda function run:

chaos-lambda deploy -r $lambda-role-arn

This will create a state file (chaos_lambda_config.json) which is needed for subsequent re-deploys, and deploy Chaos Lambda to AWS. It will be configured to run once an hour, but it won't do anything every time it runs.

To configure termination rules, run deploy with a Chaosfile:

chaos-lambda deploy -c Chaosfile.json

Chaosfile.json

Example Chaosfile.json:

{
  "interval": "60",
  "enableForASGs": [
  ],
  "disableForASGs": [
  ]
}

Options:

  • interval (in minutes) - how frequently Chaos Lambda should run. Minimum value is 5. Default value is 60.
  • enableForASGs - whitelist of names of ASGs to pick an instance from. Instances in other ASGs will be left alone. Empty list ([]) means Chaos Lambda won't do anything.
  • disableForASGs - names of ASGs that should not be touched; instances in any other ASG are eligible for termination.

If both enableForASGs and disableForASGs are specified, then only enableForASGs rules are applied.

Enable/Disable/Status: Once deployed you can enable and disable Chaos Lambda without redeploying.

  • chaos-lambda disable - Will disable Chaos Lambda
  • chaos-lambda enable - Will enable Chaos Lambda
  • chaos-lambda status - Will display current status

Chaos Lambda vs Chaos Monkey

Chaos Lambda is inspired by Netflix’s Chaos Monkey. Curious about the differences? Here’s a handy summary:

Lambda Monkey
Serverless (runs on AWS Lambda) - no maintenance Needs EC2 instances to run on
Extremely easy to deploy Needs quite a bit of setup and config (»»»)
Small codebase, easy to understand and extend (<400 SLOC) Large codebase (thousands of SLOC)
Written in JS Written in Go
New on the scene Mature project
Small feature set Many features
Open source under MPL 2.0 / MIT Open source under APL 2.0
Developed by Shoreditch Ops Developed by Netflix

Why Use Chaos Lambda?

Failures happen, and they inevitably happen when least desired. If your application can't tolerate a system failure would you rather find out by being paged at 3am or after you are in the office having already had your morning coffee? Even if you are confident that your architecture can tolerate a system failure, are you sure it will still be able to next week, how about next month? Software is complex and dynamic, that "simple fix" you put in place last week could have undesired consequences. Do your traffic load balancers correctly detect and route requests around system failures? Can you reliably rebuild your systems? Perhaps an engineer "quick patched" a live system last week and forgot to commit the changes to your source repository?

(source: Chaos Monkey wiki)

Further reading: Principles Of Chaos Engineering

Current Limitations

Supported AWS Regions

Chaos Lambda will only work in these regions (due to a limitation with AWS Lambda Schedules):

  • US East (Northern Virginia)
  • US West (Oregon)
  • Europe (Ireland)
  • Asia Pacific (Tokyo)

Features

Right now, Chaos Lambda only knows how to terminate instances and does not support more advanced interference modes, like introducing extra latency (but it's on the roadmap and being worked on, see Issue #4).

Bonus Points

Want to go further in your pursuit of indestructible systems? Combine Chaos Lambda with stress testing with Artillery.io to ship systems that just keep going.

Support

File an issue or drop us a line on [email protected].

Contributing

Please see the Contributor's Guide

License

MPL 2.0 - see LICENSE.txt for details.

The lambda/index.js file is dual-licensed under MPL 2.0 and MIT and can be used under the terms of either of those licenses.

Contributors


A project by Shoreditch Ops, creators of artillery.io ⚡️ - simple & powerful load-testing framework.

chaos-lambda's People

Contributors

anubhavmishra avatar davinerd avatar hassy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chaos-lambda's Issues

InvalidParameterValueException: runtime no longer supported

InvalidParameterValueException: The runtime parameter of nodejs is no longer supported for creating or updating AWS Lambda functions. We recommend you use the new runtime (nodejs4.3) while creating or updating functions.

Node 6.9

Im guessing im not alone here.

Error upon deploying: AccessDeniedException: Cross-account pass role is not allowed.

Our accounts are setup via an organisational accounts, with no users directly in development accounts. This is a common practice. Users switch role to desired accounts.

Perhaps this is the reason when deploying in a development account.

chaos-lambda deploy -r arn:aws:iam::50000000:role/ChaosLambda-Accessrole
AWS_REGION not set, defaulting to eu-west-1
Something went wrong:
{ AccessDeniedException: Cross-account pass role is not allowed.
    at Object.extractError (/usr/local/lib/node_modules/chaos-lambda/node_modules/aws-sdk/lib/protocol/json.js:43:27)
    at Request.extractError (/usr/local/lib/node_modules/chaos-lambda/node_modules/aws-sdk/lib/protocol/rest_json.js:37:8)
    at Request.callListeners (/usr/local/lib/node_modules/chaos-lambda/node_modules/aws-sdk/lib/sequential_executor.js:105:20)
    at Request.emit (/usr/local/lib/node_modules/chaos-lambda/node_modules/aws-sdk/lib/sequential_executor.js:77:10)
    at Request.emit (/usr/local/lib/node_modules/chaos-lambda/node_modules/aws-sdk/lib/request.js:596:14)
    at Request.transition (/usr/local/lib/node_modules/chaos-lambda/node_modules/aws-sdk/lib/request.js:21:10)
    at AcceptorStateMachine.runTo (/usr/local/lib/node_modules/chaos-lambda/node_modules/aws-sdk/lib/state_machine.js:14:12)
    at /usr/local/lib/node_modules/chaos-lambda/node_modules/aws-sdk/lib/state_machine.js:26:10
    at Request.<anonymous> (/usr/local/lib/node_modules/chaos-lambda/node_modules/aws-sdk/lib/request.js:37:9)
    at Request.<anonymous> (/usr/local/lib/node_modules/chaos-lambda/node_modules/aws-sdk/lib/request.js:598:12)
  message: 'Cross-account pass role is not allowed.',
  code: 'AccessDeniedException',
  time: 2019-06-21T13:53:13.545Z,
  requestId: 'e992148c-942b-11e9-9727-097a78ea9fd9',
  statusCode: 403,
  retryable: false,

Only run during business hours by default

It's best to only trigger failures when there are people around to fix them. Llama should only interfere with the environment during business hours (specifically Mon-Fri, 9:30am to 3pm - to allow for the morning coffee and to avoid being the reason someone would have to stay behind to fix a failure)

Adopt C4

👉 ZeroMQ's C4 Process, a collaboration model for free software projects that focuses on community success and shared ownership.

What needs to be done before we can say that we follow C4:

  • Pre-commit hooks to run go fmt for Go, and eslint for JS code
  • Add automated tests + CI integration

index.js does not get updated

I think this is related to git and how nodejs deals with readFileSync() (but I may be wrong, I know a little about nodejs).

What I've done: I've cloned the repo, created a new branch, edited the code. When I re-deployed it (by typing chaos-lambda deploy -c Chaosfile.json) the relative code in the AWS Console didn't show the changes. Same goes while working in 'master' branch.

I had to remove the .git folder, made the changes and then deploy it.

chaos-lambda in EU-West (Ireland)

Hi,

Given the support of scheduling events to Lambda (AWS Lambda Schedules) is available in EU-West region, I'm wondering why would not chaos-lambda work in that region.

Can you please throw some light on this and advise if any other limitations that would hinder us using it full features in the region?

Terminate instances by tags

Add support in the Chaosfile.json to terminate instances by tags instead of names (implementing 'simianarmy.chaos.ASGtag.key' and 'simianarmy.chaos.ASGtag.value').

Add "up" & "down" command

It should be possible to disable the llama for a while (e.g. for when a failure is being dealt with). It is currently possible with enableForASGs set to [] but there should be a convenient command for stopping the llama altogether.

About that AmazonEC2FullAccess requirement

I was wondering how you would justify such a far-ranging authorization policy. In the code I only see ec2.describeInstances() and ec2.terminateInstances() being used. From what I can see, if I give those two -- and only those -- to my Lambda, it will work. So why would you say that full access is required?

Error: connect ETIMEDOUT

Hi - trying to deploy this to my aws account; have everything setup on the AWS side and I am behind a corporate proxy; but I have set my HTTP-PROXY & HTTPS-PROXY env vars in my shell and can run the aws cli w/ out a problem, yet still receiving this error...any suggestions to get around this? thanks.

LIBP45P-16600WL:.aws n0224265$ llama deploy -r $LAMBDA_ROLE_ARN

Something went wrong:
{ Error: connect ETIMEDOUT 54.236.138.8:443
at Object.exports._errnoException (util.js:1022:11)
at exports._exceptionWithHostPort (util.js:1045:20)
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1090:14)
message: 'connect ETIMEDOUT 54.236.138.8:443',
code: 'NetworkingError',
errno: 'ETIMEDOUT',
syscall: 'connect',
address: '54.236.138.8',
port: 443,
region: 'us-east-1',
hostname: 'lambda.us-east-1.amazonaws.com',
retryable: true,
time: 2017-03-16T18:55:39.380Z }

Rewrite the CLI in Go

The current CLI is written in Node.js (to help get a working version out quickly), however, as popular as Node.js is, it's not installed everywhere, and it's especially less likely to be installed on DevOps engineers machines (who would be the ones setting up Llama), which limits its appeal.

Going forward, the CLI should be rewritten in Go:

  • A self-contained binary is easier to distribute - as a download or as a deb/rpm package
  • There's an official AWS SDK for Go
  • There's a number of solid libraries for building CLIs to make the job easier

Update README w/ Slack Integration Instructions

Hey guys, really enjoying using this project.

Noticed you built a Slack integration in this PR: Add slack notification #21

It would be fantastic if the README details how I might be able to wire up Chaos Lambda running on my AWS account with my Slack. Any thoughts on how soon this can get done?

InvalidParameterValueException: The runtime parameter of nodejs6.10 is no longer supported

Hello, I'm getting this error when I try to execute chaos-lambda deploy ~~~~~
'The runtime parameter of nodejs6.10 is no longer supported for creating or updating AWS Lambda functions. We recommend you use the new runtime (nodejs10.x) while creating or updating functions.',
code: 'InvalidParameterValueException'
Is there an updated code to fix this?

Thanks in advance,
EDIT: Updated the deploy.js file with nodejs10.x and now the issues is fixed. Thanks!

Slack integration

It would be useful to have Llama post to a Slack channel (with a Slack Webhook) to announce when an instance has been terminated.

  1. The Slack webhook would be an option in Llamafile, e.g. slackWebhook
  2. The lambda definition would use that (if defined) to post a message.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.