alexcasalboni / aws-lambda-power-tuning Goto Github PK

AWS Lambda Power Tuning is an open-source tool that can help you visualize and fine-tune the memory/power configuration of Lambda functions. It runs in your own AWS account - powered by AWS Step Functions - and it supports three optimization strategies: cost, speed, and balanced.

License: Apache License 2.0

JavaScript 85.13% Shell 1.09% HCL 7.23% C# 1.73% TypeScript 1.11% Python 1.59% Batchfile 0.22% Go 1.89%

aws aws-lambda serverless stepfunctions cost performance cloud lambda

aws-lambda-power-tuning's Introduction

AWS Lambda Power Tuning

AWS Lambda Power Tuning is a state machine powered by AWS Step Functions that helps you optimize your Lambda functions for cost and/or performance in a data-driven way.

The state machine is designed to be easy to deploy and fast to execute. Also, it's language agnostic so you can optimize any Lambda functions in your account.

Basically, you can provide a Lambda function ARN as input and the state machine will invoke that function with multiple power configurations (from 128MB to 10GB, you decide which values). Then it will analyze all the execution logs and suggest you the best power configuration to minimize cost and/or maximize performance.

Please note that the input function will be executed in your AWS account - performing real HTTP requests, SDK calls, cold starts, etc. The state machine also supports cross-region invocations and you can enable parallel execution to generate results in just a few seconds.

What does the state machine look like?

It's pretty simple and you can visually inspect each step in the AWS management console.

What results can I expect from Lambda Power Tuning?

The state machine will generate a visualization of average cost and speed for each power configuration.

For example, this is what the results look like for two CPU-intensive functions, which become cheaper AND faster with more power:

How to interpret the chart above: execution time goes from 35s with 128MB to less than 3s with 1.5GB, while being 14% cheaper to run.

How to interpret the chart above: execution time goes from 2.4s with 128MB to 300ms with 1GB, for the very same average cost.

How to deploy the state machine

There are 5 deployment options for deploying the tool using Infrastructure as Code (IaC).

The easiest way is to deploy the app via the AWS Serverless Application Repository (SAR).
Using the AWS SAM CLI
Using the AWS CDK
Using Terraform by Hashicorp and SAR
Using native Terraform

Read more about the deployment options here.

State machine configuration (at deployment time)

The CloudFormation template (used for option 1 to 4) accepts the following parameters:

Parameter	Description
PowerValues type: list of numbers default: [128,256,512,1024,1536,3008]	These power values (in MB) will be used as the default in case no `powerValues` input parameter is provided at execution time
visualizationURL type: string default: `lambda-power-tuning.show`	The base URL for the visualization tool, you can bring your own visualization tool
totalExecutionTimeout type: number default: `300`	The timeout in seconds applied to all functions of the state machine
lambdaResource type: string default: `*`	The `Resource` used in IAM policies; it's `*` by default but you could restrict it to a prefix or a specific function ARN
permissionsBoundary type: string	The ARN of a permissions boundary (policy), applied to all functions of the state machine
payloadS3Bucket type: string	The S3 bucket name used for large payloads (>256KB); if provided, it's added to a custom managed IAM policy that grants read-only permission to the S3 bucket; more details below in the S3 payloads section
payloadS3Key type: string default: `*`	The S3 object key used for large payloads (>256KB); the default value grants access to all S3 objects in the bucket specified with `payloadS3Bucket`; more details below in the S3 payloads section
layerSdkName type: string	The name of the SDK layer, in case you need to customize it (optional)
logGroupRetentionInDays type: number default: `7`	The number of days to retain log events in the Lambda log groups. Before this parameter existed, log events were retained indefinitely
securityGroupIds type: list of SecurityGroup IDs	List of Security Groups to use in every Lambda function's VPC Configuration (optional); please note that your VPC should be configured to allow public internet access (via NAT Gateway) or include VPC Endpoints to the Lambda service
subnetIds type: list of Subnet IDs	List of Subnets to use in every Lambda function's VPC Configuration (optional); please note that your VPC should be configured to allow public internet access (via NAT Gateway) or include VPC Endpoints to the Lambda service
stateMachineNamePrefix type: string default: `powerTuningStateMachine`	Allows you to customize the name of the state machine. Maximum 43 characters, only alphanumeric (plus `-` and `_`). The last portion of the `AWS::StackId` will be appended to this value, so the full name will look like `powerTuningStateMachine-89549da0-a4f9-11ee-844d-12a2895ed91f`. Note: `StateMachineName` has a maximum of 80 characters and 36+1 from the `StackId` are appended, allowing 43 for a custom prefix.

Please note that the total execution time should stay below 300 seconds (5 min), which is the default timeout. You can easily estimate the total execution timeout based on the average duration of your functions. For example, if your function's average execution time is 5 seconds and you haven't enabled parallelInvocation, you should set totalExecutionTimeout to at least num * 5: 50 seconds if num=10, 500 seconds if num=100, and so on. If you have enabled parallelInvocation, usually you don't need to tune the value of totalExecutionTimeout unless your average execution time is above 5 min. If you have a sleep between invocations set, you should include that in your timeout calculations.

How to execute the state machine

You can execute the state machine manually or programmatically, see the documentation here.

State machine input (at execution time)

Each execution of the state machine will require an input where you can define the following input parameters:

Parameter	Description
lambdaARN (required) type: string	Unique identifier of the Lambda function you want to optimize
num (required) type: integer	The # of invocations for each power configuration (minimum 5, recommended: between 10 and 100)
powerValues type: string or list of integers	The list of power values to be tested; if not provided, the default values configured at deploy-time are used; you can provide any power values between 128MB and 10,240MB (⚠️ New AWS accounts have reduced concurrency and memory quotas (3008MB max))
payload type: string, object, or list	The static payload that will be used for every invocation (object or string); when using a list, a weighted payload is expected in the shape of `[{"payload": {...}, "weight": X }, {"payload": {...}, "weight": Y }, {"payload": {...}, "weight": Z }]`, where the weights `X`, `Y`, and `Z` are treated as relative weights (not percentages); more details below in the Weighted Payloads section
payloadS3 type: string	A reference to Amazon S3 for large payloads (>256KB), formatted as `s3://bucket/key`; it requires read-only IAM permissions, see `payloadS3Bucket` and `payloadS3Key` below and find more details in the S3 payloads section
parallelInvocation type: boolean default: `false`	If true, all the invocations will be executed in parallel (note: depending on the value of `num`, you may experience throttling when setting `parallelInvocation` to true)
strategy type: string default: `"cost"`	It can be `"cost"` or `"speed"` or `"balanced"`; if you use `"cost"` the state machine will suggest the cheapest option (disregarding its performance), while if you use `"speed"` the state machine will suggest the fastest option (disregarding its cost). When using `"balanced"` the state machine will choose a compromise between `"cost"` and `"speed"` according to the parameter `"balancedWeight"`
balancedWeight type: number default: `0.5`	Parameter that express the trade-off between cost and time. Value is between 0 & 1, 0.0 is equivalent to `"speed"` strategy, 1.0 is equivalent to `"cost"` strategy
autoOptimize type: boolean default: `false`	If `true`, the state machine will apply the optimal configuration at the end of its execution
autoOptimizeAlias type: string	If provided - and only if `autoOptimize` if `true`, the state machine will create or update this alias with the new optimal power value
dryRun type: boolean default: `false`	If true, the state machine will execute the input function only once and it will disable every functionality related to logs analysis, auto-tuning, and visualization; the dry-run mode is intended for testing purposes, for example to verify that IAM permissions are set up correctly
preProcessorARN type: string	It must be the ARN of a Lambda function; if provided, the function will be invoked before every invocation of `lambdaARN`; more details below in the Pre/Post-processing functions section
postProcessorARN type: string	It must be the ARN of a Lambda function; if provided, the function will be invoked after every invocation of `lambdaARN`; more details below in the Pre/Post-processing functions section
discardTopBottom type: number default: `0.2`	By default, the state machine will discard the top/bottom 20% of "outliers" (the fastest and slowest), to filter out the effects of cold starts that would bias the overall averages. You can customize this parameter by providing a value between 0 and 0.4, with 0 meaning no results are discarded and 0.4 meaning that 40% of the top/bottom results are discarded (i.e. only 20% of the results are considered).
sleepBetweenRunsMs type: integer	If provided, the time in milliseconds that the tuner function will sleep/wait after invoking your function, but before carrying out the Post-Processing step, should that be provided. This could be used if you have aggressive downstream rate limits you need to respect. By default this will be set to 0 and the function won't sleep between invocations. Setting this value will have no effect if running the invocations in parallel.
disablePayloadLogs type: boolean default: `false`	If provided and set to a truthy value, suppresses `payload` from error messages and logs. If `preProcessorARN` is provided, this also suppresses the output payload of the pre-processor.
includeOutputResults type: boolean default: `false`	If provided and set to true, the average cost and average duration for every power value configuration will be included in the state machine output.

Here's a typical execution input with basic parameters:

{
    "lambdaARN": "your-lambda-function-arn",
    "powerValues": [128, 256, 512, 1024],
    "num": 50,
    "payload": {}
}

State Machine Output

The state machine will return the following output:

{
  "results": {
    "power": "128",
    "cost": 0.0000002083,
    "duration": 2.9066666666666667,
    "stateMachine": {
      "executionCost": 0.00045,
      "lambdaCost": 0.0005252,
      "visualization": "https://lambda-power-tuning.show/#<encoded_data>"
    },
    "stats": [{ "averagePrice": 0.0000002083, "averageDuration": 2.9066666666666667, "value": 128}, ... ]
  }
}

More details on each value:

results.power: the optimal power configuration (RAM)
results.cost: the corresponding average cost (per invocation)
results.duration: the corresponding average duration (per invocation)
results.stateMachine.executionCost: the AWS Step Functions cost corresponding to this state machine execution (fixed value for "worst" case)
results.stateMachine.lambdaCost: the AWS Lambda cost corresponding to this state machine execution (depending on num and average execution time)
results.stateMachine.visualization: if you visit this autogenerated URL, you will be able to visualize and inspect average statistics about cost and performance; important note: average statistics are NOT shared with the server since all the data is encoded in the URL hash (example), which is available only client-side
results.stats: the average duration and cost for every tested power value configuration (only included if includeOutputResults is set to a truthy value)

Data visualization

You can visually inspect the tuning results to identify the optimal tradeoff between cost and performance.

The data visualization tool has been built by the community: it's a static website deployed via AWS Amplify Console and it's free to use. If you don't want to use the visualization tool, you can simply ignore the visualization URL provided in the execution output. No data is ever shared or stored by this tool.

Website repository: matteo-ronchetti/aws-lambda-power-tuning-ui

Optionally, you could deploy your own custom visualization tool and configure the CloudFormation Parameter named visualizationURL with your own URL.

Additional features, considerations, and internals

Here you can find out more about some advanced features of this project, its internals, and some considerations about security and execution cost.

Contributing

Feature requests and pull requests are more than welcome!

How to get started with local development?

For this repository, install dev dependencies with npm install. You can run tests with npm test, linting with npm run lint, and coverage with npm run coverage. Unit tests will run automatically on every commit and PR.

aws-lambda-power-tuning's People

Contributors

Stargazers

Watchers

Forkers

jmeekhof spuds51 vfonsecapv nickksun sajan-caissa thekant ambroseus hscheib ardeshir punith002 vitbyst vsaliy merapar abozgeyik orenbochman dmytrorybka samhays sofiaisha infor-hct jevans3 rpidanny gitrekm william-tai tudes abduljaleel eelzinaty kkomada benoitsiles yilingyiling lauchlinmac forkkit nikolabravo jmuhire jazzyf ernestjgon ashwindora josefbutts jgisler cferrer101 ishug86 hassaku63 lucadavid075 raghavender4345 mukteshkrmishra mohamedelephant aks1610 aniketh-maddipati phersh montbra kumologicahq sysguard gyj0825 prannoy47 devopsutils anilchinnam marshallm dldinternet leewalter chatchai-komrangded yogaglo trucnguyenlam sseshachala yamnay gabriel7131 gino247 excoriate nideveloper bardawilpeter syeutyu doddipriyambodo iothack gayansampathmanamendra maduflavins simondutertre olileach mikewootini dinakaranonline vieirinhaabr jappievw bassista petersmythe dz902 kiransterling noenthu tusharkalecam sheelkhanna esurov mt-kelvintaywl edrinb jasonmajor ik2sb sagarghagarenhs buttnomaan9 pedramha sent2020 sebastduval l1ahim stacyjeptha claudiopastorini inghumberto

aws-lambda-power-tuning's Issues

Missing or empty optimal value

Running through tuning app with version 3.2.3 I get an error possibly related to new dryRun parameter
I get the following error:

{
  "errorType": "Error",
  "errorMessage": "Missing or empty optimal value",
  "trace": [
    "Error: Missing or empty optimal value",
    "    at validateInput (/var/task/optimizer.js:40:15)",
    "    at Runtime.module.exports.handler (/var/task/optimizer.js:14:5)",
    "    at Runtime.handleOnce (/var/runtime/Runtime.js:66:25)"
  ]
}

when the input is:

{
  "lambdaARN": "lambdaARN",
  "powerValues": [
    128,
    256,
    512,
    1024,
    2048,
    3008
  ],
  "num": 30,
  "payload": {},
  "strategy": "speed",
  "dryRun": true,
  "parallelInvocation": false,
  "stats": [
    {
      "averagePrice": 2.08e-7,
      "averageDuration": 36.37111111111112,
      "totalCost": 0.0000066560000000000045,
      "value": 128
    },
    {
      "averagePrice": 4.16e-7,
      "averageDuration": 2.3211111111111116,
      "totalCost": 0.000012480000000000008,
      "value": 256
    },
    {
      "averagePrice": 8.32e-7,
      "averageDuration": 2.4144444444444444,
      "totalCost": 0.000024960000000000015,
      "value": 512
    },
    {
      "averagePrice": 0.000001664,
      "averageDuration": 2.355,
      "totalCost": 0.00004992000000000003,
      "value": 1024
    },
    {
      "averagePrice": 0.000003328,
      "averageDuration": 2.364444444444444,
      "totalCost": 0.00009984000000000006,
      "value": 2048
    },
    {
      "averagePrice": 0.0000048880000000000005,
      "averageDuration": 2.255555555555555,
      "totalCost": 0.00014664000000000005,
      "value": 3008
    }
  ],
  "analysis": null
}

Not sure why the analysis field is null? In my input to the Step Function I don't define this so whatever generating it seems to have a problem with the dryRun.
Removing dryRun the run works fine

Set of test payloads

Lets take the hello world of lambda world for example: compress image from s3 using sharp.

Having a function that does complicated thing which depends so heavily on the input, forces us to test it using multiple different variants, ie:

big file
small file
format 1, format 2, format 3
extremely big file
transformations passed as options

It would be nice if Power Tuning could take multiple events as an inputs, and before recommending power, take into consideration output from all the different tests.

Ideally, I could tell the percentage of tests run by any given test (ie. medium file with jpeg format - 50%, extremely big file - 3%, transformations - 5%) - so that extremely big file would not skew the results too much (average likes to do that).

Restrict Lambda IAM permissions

The current role has full access to AWS Lambda:

iamRoleStatements:
    - Effect: Allow
      Action:
        - 'lambda:*'
      Resource: '*'

Since we want the lambdaARN to be given at runtime, we can't really restrict the Resource parameter. We could restrict the set of actions, though. Also, experienced users can always force Resource to be the Lambda Function(s) they want to optimize.

As far as actions are concerned, Initializer, Executor, Finalizer and Cleaner need the following Lambda permissions (only 7 out of 28):

GetAlias
UpdateFunctionConfiguration
PublishVersion
DeleteFunction (always with Qualifier)
CreateAlias
DeleteAlias
Invoke

Regional base price selection (Step Functions)

Similarly to #77, we should use the correct regional price of Step Functions based on where the state machine is executed (which might be different from the input function's region!).

Each state transition costs $0.000025 in almost every region, with some exceptions:

default: $0.025
eu-south-1: $0.02625
us-west-1: $0.0279
af-south-1: $0.02975
ap-east-1: $0.0275
ap-south-1: $0.0285
ap-northeast-2: $0.0271
eu-south-1: $0.02625
eu-west-3: $0.0297
me-south-1: $0.0275
sa-east-1: $0.0375
us-gov-east-1: $0.03
us-gov-west-1: $0.03

The state machine execution cost is computed here:

module.exports.stepFunctionsCost = (nPower) => +(0.000025 * (6 + nPower)).toFixed(5);

The formula to compute the # of state transitions is: 6 + COUNT(POWERVALUES), therefore the Step Functions cost will be REGIONAL_COST * NUMBER_OF_TRANSITIONS.

When cost are the same, should return higher performing power

Here's an example where raising power improves the performance and kept the cost the same:

But in this, the power tuning tool still returned 128MB but logically it should have returned 256MB.

Use the correct regional base price

The base price for Lambda executions (128MB, 100ms) is $0.0000002083 in almost every region.

Here are the regions where the price is slightly different:

Hong Kong (ap-east-1): $0.0000002865 (+37%)
Cape Town (af-south-1): $0.0000002763 (+32%)
Bahrain (me-south-1): $0.0000002583 (+24%)

This difference should be considered in two places:

in the state machine output results.stateMachine.lambdaCost
in the visualization (charts)

We should update the utilities utils. computePrice(...) and utils. computeTotalCost(...), used by the Executor function here.

Thanks #75 for bringing this up.

ResourceNotFoundException: Functions from 'us-east-1' are not reachable in this region ('us-west-1')

Seems like cross region usage is not available for lambdas so, the Step Function should be deployed on each region, am I right, or is it just a coding restriction and can be improved?

Full trace from CloudWatch Logs:

`START RequestId: c48d4975-d571-46d7-9835-9a7f84e0300f Version: $LATEST
2019-05-14T13:32:20.857Z c48d4975-d571-46d7-9835-9a7f84e0300f { ResourceNotFoundException: Functions from 'us-east-1' are not reachable in this region ('us-west-1')
at Object.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:48:27)
at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/rest_json.js:52:8)
at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:105:20)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:77:10)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:683:14)
at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)
at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)
at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10
at Request. (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)
at Request. (/var/runtime/node_modules/aws-sdk/lib/request.js:685:12)
message: 'Functions from 'us-east-1' are not reachable in this region ('us-west-1')',
code: 'ResourceNotFoundException',
time: 2019-05-14T13:32:20.857Z,
requestId: 'b3418c32-764c-11e9-ab60-7505d4afde13',
statusCode: 404,
retryable: false,
retryDelay: 52.181729400208376 }
2019-05-14T13:32:20.932Z c48d4975-d571-46d7-9835-9a7f84e0300f Error: Interrupt
at /var/task/initializer.js:68:27
at
at process._tickDomainCallback (internal/process/next_tick.js:228:7)
2019-05-14T13:32:20.932Z c48d4975-d571-46d7-9835-9a7f84e0300f Error: Interrupt
at /var/task/initializer.js:68:27
at
at process._tickDomainCallback (internal/process/next_tick.js:228:7)
2019-05-14T13:32:20.932Z c48d4975-d571-46d7-9835-9a7f84e0300f Error: Interrupt
at /var/task/initializer.js:68:27
at
at process._tickDomainCallback (internal/process/next_tick.js:228:7)
2019-05-14T13:32:20.932Z c48d4975-d571-46d7-9835-9a7f84e0300f Error: Interrupt
at /var/task/initializer.js:68:27
at
at process._tickDomainCallback (internal/process/next_tick.js:228:7)
2019-05-14T13:32:20.932Z c48d4975-d571-46d7-9835-9a7f84e0300f Error: Interrupt
at /var/task/initializer.js:68:27
at
at process._tickDomainCallback (internal/process/next_tick.js:228:7)
2019-05-14T13:32:20.934Z c48d4975-d571-46d7-9835-9a7f84e0300f
{
"errorMessage": "Interrupt",
"errorType": "Error",
"stackTrace": [
"/var/task/initializer.js:68:27",
"",
"process._tickDomainCallback (internal/process/next_tick.js:228:7)"
]
}

END RequestId: c48d4975-d571-46d7-9835-9a7f84e0300f
REPORT RequestId: c48d4975-d571-46d7-9835-9a7f84e0300f Duration: 1510.06 ms Billed Duration: 1600 ms Memory Size: 128 MB Max Memory Used: 59 MB `

AWS no longer supports nodejs4.3

Deploying gives this error: An error occurred: CleanerLambdaFunction - The runtime parameter of nodejs4.3 is no longer supported for creating or updating AWS Lambda functions. We recommend you use the new runtime (nodejs8.10) while creating or updating functions. (Service: AWSLambdaInternal; Status Code: 400; Error Code: InvalidParameterValueException; Request ID: aa86828b-4748-11e9-9731-7d5c960221fe)

Changing all references to be nodejs8.10 appears to work

Implement dynamic parallelism

This new feature will make Lambda Power Tuning much more flexible: https://aws.amazon.com/blogs/aws/new-step-functions-support-for-dynamic-parallelism/

Now you can't easily test new memory configurations for each state machine execution, as memory configurations are hard-coded in the state machine structure.

With dynamic parallelism, we'll be able to provide a list of memory configurations as input and dynamically test only those configurations (without any deploy-time parameter).

Provide a context to Lambda function when running the lambda power tuning

The lambda function that is being tuned requires a context.

How can we pass the context to this lambda function when running the tuning step function using the following:
{
"lambdaARN": "lambda ARN",
"powerValues": [128, 256],
"num": 5,
"payload": {},
"parallelInvocation": true,
"strategy": "balanced"
}

Invocation error with Cannot read property of undefined

Though the execution succeeded, it shows error and empty object in Output

The input i passed is below as mentioned in guide: I changed the arn with my lambda arn.

{
"lambdaARN": "your-lambda-function-arn",
"powerValues": [128, 256, 512, 1024, 2048, 3008],
"num": 10,
"payload": {},
"parallelInvocation": true,
"strategy": "cost"
}

Should i pass payload?

Here is the complete error:

{
"error": "Error",
"cause": {
"errorType": "Error",
"errorMessage": "Invocation error: {"errorType":"TypeError","errorMessage":"Cannot read property 'id' of undefined","trace":["TypeError: Cannot read property 'id' of undefined"," at Runtime.module.exports.get [as handler] (/var/task/todos/get.js:13:32)"," at Runtime.handleOnce (/var/runtime/Runtime.js:66:25)"]}",
"trace": [
"Error: Invocation error: {"errorType":"TypeError","errorMessage":"Cannot read property 'id' of undefined","trace":["TypeError: Cannot read property 'id' of undefined"," at Runtime.module.exports.get [as handler] (/var/task/todos/get.js:13:32)"," at Runtime.handleOnce (/var/runtime/Runtime.js:66:25)"]}",
" at /var/task/executor.js:114:19",
" at processTicksAndRejections (internal/process/task_queues.js:97:5)",
" at async Promise.all (index 1)",
" at async runInParallel (/var/task/executor.js:119:5)",
" at async Runtime.module.exports.handler (/var/task/executor.js:31:19)"
]
}
}

Use new AWS SAM resource: AWS::Serverless::StateMachine

This will simplify the state machine readability and maintainability (potentially even as an external file, if we can handle the Lambda functions references correctly).

Documentation here: https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/sam-resource-statemachine.html

Use a unique payload for every run

Hi @alexcasalboni

#85 issue didn't seem to convey my problem properly.

Currently, when 6 types of powerValues are given and 5 is given to num, the Lambda Function is invoked 30 times in total.
If this Lambda Function behaves like deleting one record of the ID passed by Payload for each Invoke, the type of Payload needs to prepare more than the total number of calls.

However, according to the current specifications, it is not possible to give more payloads than the value of num.

Therefore, when the execution of the first PowerValues ends, the record with the ID specified in Payload has been deleted from the DB, and the specified ID does not exist when the second and subsequent PowerValues are executed. The correct processing is not performed and the required PowerValue cannot be checked.

In order to solve this problem, it is necessary to modify Payload so that it can be given more than the total number of executions.

Alternatively, by implementing a function that allows you to specify a Lambda Function that performs processing before and after each execution, you can execute it by giving only one type of payload. I think that such a function is very useful.

Make Node.js 10.x runtime an option

As of May 15 2019 AWS added support for Node.js 10.x in Lambdas. I'm thinking it would be nice to have an option to run the power-tuning Lambdas in a similar Runtime as the Lambda someone is writing.

Link to article AWS What's new

Weighted optimization strategy

Based on @pavelloz's feedback in #31, we could have a configurable weight between speed and cost.

Since there are many different use cases and very subjective ways to optimize for cost vs. speed, such a weight would need to be very well documented, imho.

In the long-term, we might be able to "categorize" a given function into some sort of optimization class based on the speed-cost relation across memory configurations, and come up with a globally optimal strategy for each class.

FYI @matteo-ronchetti is already working on the first iteration of this :)

ResourceNotFoundException: Function not found

I see this below error when i run the Power tuning with ParallelInvocation : false. But when i run with ParallelInvocation: true..It works.

MyInput:

{
  "<LambdaARN>",
  "powerValues": [128, 256, 512, 1024, 2048, 3008],
  "num": 100,
  "payload": {
    "headers": {
      "Authorization": "<Auth Token>",
      "x-api-key": "<API Key>"
    }
  },
  "parallelInvocation": false,
  "strategy": "balanced",
  "balanceWeight": 0.5
}

Error:

{
  "error": "Lambda.Unknown",
  "cause": "The cause could not be determined because Lambda did not return an error type."
}

{
  "error": "ResourceNotFoundException",
  "cause": {
    "errorType": "ResourceNotFoundException",
    "errorMessage": "Function not found: arn:aws:lambda:<region>:<accno>:function:GetCustomerProfile:RAM256",
    "trace": [
      "ResourceNotFoundException: Function not found: arn:aws:lambda:<region>-<accno>:function:GetCustomerProfile:RAM256",
      "    at Object.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:51:27)",

Can you please help on this?

Payload does not support GET Methods

I am trying to test with lambda functions that expects Query parameters. I am passing the params via payload, but throws invocation errors with null object references.
Our functions expects Query parameters. Any chance we can support query params to be passed with payload?

Reset $LATEST memory configuration after state machine execution

@alexcasalboni Great job on the tool, very handy & useful!

I ran the test on my lambda function with all possible memory settings. Initially, my function had 512MB memory assigned. After the tests were completed (confirmed that Cleaner & Finalizer are green), function got assigned memory of 3008 MB. I also checked that the versions that got created during the test were removed (which is expected) but the memory got set to the max memory given during the tests.

Is this expected?

Thanks!

Question not an issue: In output, i see cost as 8.32e-7. Is that the cost for that execution we need to pay?

In output, i see cost as 8.32e-7. Is that the cost for that execution we need to pay?

{
  "power": 512,
  "cost": 8.32e-7,
  "duration": 47.3920138888889,
  "stateMachine": {
    "executionCost": 0.0003,
    "lambdaCost": 0.003546192,
    "visualization": "https://lambda-power-tuning.show/#<Embeded code>"
  }
}

API Gateway end-to-end testing?

How could we support this?

Would it be mutually exclusive wrt Lambda or maybe a totally independent branch?

APIGW comes with many configurations that might make performance tuning less reliable such as caching, WAF, endpoint type (regional or edge-optimized), etc.

We could simply invoke the API endpoint instead of the Lambda function, but I'm not 100% sure of what the benefit would be.

Optimise multible functions at once

As stated in #55 it would be nice to be able to optimize multiple chained functions at once. @alexcasalboni stated that he thinks the optimum for the overall chain would in his opinion correlating with the optimum of all indiviual functions.

I researched a bit and there is research (a bachelor thesis I sadly can't share) that the pareto optimum for all functions differs indeed from just the edge optimization of the single functions individually.

It would be nice to be able to optimize a chain of functions or even a step function state machine with this tool.

Is correct the relation between RAM Configuration/Cost by ms?

Hi Alex!

After some testing with the tool I have seen incorrect numbers on the graph.

I have compared the time/costs results of one lambda execution in my region (EU-Ireland) with the result that appears hover the graph. And...

Here my calculation:

Data from AWS Lambda Pricing (EU-Ireland):

RAM Configuration: 128MB
Cost by 100ms: $0,0000002083
Cost by 1ms: $0,0000000020830

State Machine execution graph results:

Size: 128MB
Time: 366ms
Cost: 0,00000083$

I guess cost is incorrect because...

Cost with Lambda Pricing (EU-Ireland): 366ms * $0,0000000020830 cost/ms = $0,0000007623780

Also with other RAM configurations the comparation with my calculations fails...

I don't know if I'm doing smth wrong...It was only a doubt during my researching about lambda performance. 👍

Thank you in advance!

memorySize of the lambdas should be 128.

There is no reason for the tester functions to be so big,

functions:
  initializer:
    handler: lambda/initializer.handler
    memorySize: 128
    timeout: 60
  executor:
    handler: lambda/executor.handler
    memorySize: 128
    timeout: 300
  cleaner:
    handler: lambda/cleaner.handler
    memorySize: 128
    timeout: 60
  finalizer:
    handler: lambda/finalizer.handler
    memorySize: 128
    timeout: 60

Optionally configure a single Lambda ARN at deploy-time

The current IAM statement for initializer, execution and optimizer looks like this:

          Statement:
            - Effect: Allow
              Action:
                - lambda:GetAlias
                - lambda:PublishVersion
                - lambda:UpdateFunctionConfiguration
                - lambda:CreateAlias
                - lambda:UpdateAlias
              Resource: '*'

I think some security-concerned users would rather avoid that Resource: '*'.

We could allow them to optionally configure a Lambda ARN (or prefix) so that these IAM policies are a bit more fine-grained.

Technically, this would be a CFN Parameter (e.g. lambdaResource), directly referenced via !Ref.

Compute and report total cost of state machine execution

The state machine could return its own total cost of execution.

I think this would add more transparency to Lambda Power Tuning.

The cost should include both Lambda execution costs and Step Functions execution costs (even though the max step transitions is always around 15-20, which means less than $0.0005 per state machine execution).

Depending on the value of num, Lambda costs will likely outweigh Step Functions cost. For example, even with a "no-op" function and num: 100, we can expect the overall Lambda cost to be around $0.001.

I will add some more documentation about costs too.

Customizable execution timeout

Currently, the Executor timeout is 300 seconds (5 minutes) and there is no way to customize it if you deploy via SAR.

Also, the same timeout should be configured on the state machine task to make sure the timeout error is properly handled (otherwise Lambda.Unknown is detected instead of States.Timeout).

It could be a simple CloudFormation parameter, used for both values.

"The execution failed (check execution logs)" -- where are the logs?

I don't see a logs folder of any kind. Nor a recently created text file. I'm hunting around in the CLI docs, maybe that's where the answer is but it would be very helpful to have some sort of hint in the execution output here...

InvalidParameterValueException on v3.2.3

Just deployed the latest v3.2.3 and every time I run the new state machine I get the following error:

"error": {
    "Error": "InvalidParameterValueException",
    "Cause": "{\"errorType\":\"InvalidParameterValueException\",\"errorMessage\":\"The role defined for the function cannot be assumed by Lambda.\",\"trace\":[\"InvalidParameterValueException: The role defined for the function cannot be assumed by Lambda.\",\"    at Object.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:51:27)\",\"    at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/rest_json.js:55:8)\",\"    at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)\",\"    at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)\",\"    at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:683:14)\",\"    at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)\",\"    at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)\",\"    at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10\",\"    at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)\",\"    at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:685:12)\"]}"
  }
}

Question around parallelInvocations

When i run the framework with parallelinvocation=false the invocation times are great - in millisecs interval. But when we run with the same flag = true - times are like in 100s of secs. Ours is busines critical app and need to be under 75 ms. Can you please explain a bit more around how parallelinvocation is implemented and what happens under the hood? Though reviewing code is easy, please share your thoughts and insights as well so we can learn and adjust our code accordingly.

Cannot add memory option

When I deployed via the console, I added a 2048MB option to the comma-separated list:

But the state machine doesn't reflect this:

Is it suppose to work? (thought you were doing something along the line of deploying a custom resource and then using it in the same stack, like what SLS does for event bridge)

Default nodejs version runtime

Looking at https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html

I wonder if https://github.com/alexcasalboni/aws-lambda-power-tuning/blob/master/template.yml#L27 should be 10.x by default.

I know there are differences between AMI and AMI2, and even in my case i could not migrate one lambda that uses aws-chrome-lambda because of incompatibilities.

But there is a case to be made for either setting it to a current AWS recommended version, or writing one sentence on it in documentation as a heads up. :)

Or maybe i dont understand how it works yet, im not the best in reading SAM/CF manifests.

--dry-run

Per discussion in #37 (comment)

This might be an overkill, because num=1 would cover it, but having a possibility to verify if user's setup is correct could benefit a lot of people.

I personally always run dry runs if they exist, especially when im a beginner in an area, and/or stakes of mistake is high.

Automatically set optimal power configuration

Based on the discussion in #36, we could implement an opt-in feature to automatically set the optimal configuration value at the end of the state machine.

Upgrade to Node.js 10

Simple YAML edit:

Runtime: nodejs10.x

Plus some integration tests :)

Document IAM role permissions

As mentioned in #5, the Resource attribute of the default IAM role could be restricted so that the state machine can only interact with the configured Lambda Function(s).

Resource is set to * by default because the original goal was to provide any lambdaARN at runtime.

We should document how to update such configuration manually, or eventually implement an additional parameter at generation-time.

The new parameter could look like this:

$ npm run generate -- -A ACCOUNT_ID -L arn:aws:lambda:*:*:function:MyFunctionName

Cleanup after failed execution

I noticed if a lambda is long (200 seconds) running it in default non parallel mode, will cause the executioner to time out, because it will run it in sequence.

note: documentation is wrong, the parameter is parallelInvocation not enableParallel

The failure results in executioner exceeding 300sec run time after the 2nd invocation. If you set num to 10, it will take 200 * 10 seconds to finish which it will never do.

In case of failure, it should still clean up the aliases/versions it created.

And the error message is unknown, it should say that the executioner timed out, or something to indicate that the person should run these in parallel mode.

Error

Lambda.Unknown
Cause

The cause could not be determined because Lambda did not return an error type.

Side note, would be nice to finish the rest and not cancel on first failed. If you have an array of lambdas 128 to whatever, and 128 is too small, it breaks the testing for the rest of them.

Integrate visualization data with custom data ingestion?

We could allow users to provide a custom URL or SNS/SQS ARN to be notified about new performance tests and eventually collect data over time for further analysis.

Improve mocking with Sinon.JS

Replace hard-coded mocking with Sinon.JS to avoid side effects and ugly workarounds.

3.1.1: Execution output: {} after running execute.sh

Steps:

./deploy.sh
Result (already ran the deploy, just wanted to make sure it was deployed):

Waiting for changeset to be created..
Error: No changes to deploy. Stack lambda-power-tuning is up to date

./execute.sh
Result:

-n .
// etc
SUCCEEDED
Execution output:
{}

Checked the logs: aws stepfunctions get-execution-history --profile default --execution-arn $EXECUTION_ARN

There are 96 entries in the logs like:

        {
            "timestamp": 1578505766.01,
            "type": "LambdaFunctionFailed",
            "id": 124,
            "previousEventId": 96,
            "lambdaFunctionFailedEventDetails": {
                "error": "Error",
                "cause": "{\"errorType\":\"Error\",\"errorMessage\":\"Invocation error: {\\\"errorType\\\":\\\"string\\\",\\\"errorMessage\\\":\\\"{\\\\\\\"statusCode\\\\\\\":\\\\\\\"500\\\\\\\",\\\\\\\"message\\\\\\\":\\\\\\\"An unexpected error occurred\\\\\\\"}\\\",\\\"trace\\\":[]}\",\"trace\":[\"Error: Invocation error: {\\\"errorType\\\":\\\"string\\\",\\\"errorMessage\\\":\\\"{\\\\\\\"statusCode\\\\\\\":\\\\\\\"500\\\\\\\",\\\\\\\"message\\\\\\\":\\\\\\\"An unexpected error occurred\\\\\\\"}\\\",\\\"trace\\\":[]}\",\"    at utils.range.map (/var/task/executor.js:67:19)\",\"    at process._tickCallback (internal/process/next_tick.js:68:7)\"]}"
            }
        },

So it appears to have failed but not caught it and then given an empty output.

Config:

{
    "lambdaARN": "arn:aws:lambda:us-west-2:etcetc",
    "powerValues": "ALL",
    "num": 5,
    "parallelInvocation": true,
    "strategy": "speed",
    "payload": [ {...} ]
}

Tried with explicit powerValues and parallelInvocation: false as well.

Refactor Node.js code (new ES)

The current implementation is not very readable because of promises & callbacks hell.

I'd like to refactor it to use async/await syntax.

Add state machine invocation command

For now, you are to manually start the state machine and provide the correct input.

There should be a simple command that would take care of:

generate the input object based on user-provided params
start the state machine and monitor its status
fetch the state machine output and clearly visualize it
eventually, the script could also set the new power level to the optimal one (or reset it to the original value)

Use AWS Step Functions plugin for Serverless Framework

Use this plugin instead of a custom CloudFormation resource.

Update runtime to Node.js 6.10

The project was created before the new runtime announcement.

Switching to Node 6.10 shouldn't involve any code change, but it would be nice to make the code more ES6-friendly once we do it.

Improve weighted payload logging in case of invocation error

I prepared each function for CRUD operation of data.
Among them, I wanted to collect the statistics of the function for deletion, but the deletion target put in Payload is deleted by the first execution, and an error occurs from the second time.
In order to avoid this, I prepared data to be deleted more than the total number of executions and gave the payload a weight of all 1 and executed it, but I could not execute it because of an Invalid payload weight (num is too small) error was.

Please tell me a good way to handle such a function.

Provide results for all power levels

Soetimes it is preferable to opt for a power level that takes shorter execution time rather than one that minimizes the cost. (https://www.jeremydaly.com/15-key-takeaways-from-the-serverless-talk-at-aws-startup-day/). Accordingly it will be nice if the tool produces a CSV file containing for each power level, the corresponding execution time and the cost.

Feature Request - Leverage X-Ray to analyze different segments

I have a lambda that makes an external network call as part of it's execution which is a variable I can't control. It would be awesome if this tool leveraged AWS X-Ray to be able to report on the effect of memory tuning on various segments of execution.

Questions this approach could answer:

How do different memory allocations affect my initialization segement? This is a primary factor in cold start times
How do different memory allocations affect TLS connection setup times? I've heard that it can be significant, would be helpful want to quantify

Multiple optimization strategies

As mentioned in #30, new optimization strategies could be more useful in specific use cases.

The default strategy could remain cost, but a few more can be implemented.

The second most straightforward strategy is speed, and we should implement it in a way that's easy for new contributors to add implement strategies.

The Finalizer function takes all the statistics as input and will return the optimal configuration, so everything can be implemented there.

Integrate stats visualization (chart)

@matteo-ronchetti has developed a simple web interface that we can integrate in the state machine output. This way, users can simplify click on a link/URL and visualize useful numbers about cost and performance.

This should be easy to implement in the finalizer function, or maybe as a third parallel step.

I've considered making this an opt-in feature, but I think most users will benefit from it and I can't see any relevant data privacy concern since you can simply not click on that link.

The UI is currently hosted as an Amplify Console app here: https://master.d19f2a8daatc3f.amplifyapp.com

You can provide input data including it in the URL hash: https://master.d19f2a8daatc3f.amplifyapp.com/index.html#gAAAAQACAAQABg==;AACAQQAAAEEAAIBAMzMzQGZmBkA=;CtcjPG8SAzwK16M7vHQTPKabRDw=

The hash structure is as follows: <encode(power_values)>;<encode(execution_time);<encode(execution_cost)>.

For example:

let sizes = [128, 256, 512, 1024, 1536];
let times = [16.0, 8.0, 4.0, 2.8, 2.1];
let costs = [0.01, 0.008, 0.005, 0.009, 0.012];
let hash = encode(sizes, Int16Array) + ";" + encode(times) + ";" + encode(costs);

where

const base64js = require('base64-js');

function encode(input, c = Float32Array) {
    input = new c(input);
    if (!(input instanceof Uint8Array)) {
        input = new Uint8Array(input.buffer)
    }
    return base64js.fromByteArray(input);
}

Can we use the tool for stress testing?

If you test only one power configuration and use a very large num, this tool could be used for stress testing Lambda functions and visualize average cost and execution time.

We could design a different visualization to be used when testing only one power configuration, where we could visualize more detailed statistics.

Environment variable minRAM must contain string

July 4, 2018
serverless/serverless#5094 (comment)
After updating today to 1.28.0, Serverless (or a dependency) now expects all environment variables to be strings. This sounds reasonable, but it's a breaking change so I'm making people aware.

Serverless: Excluding development dependencies...

  Serverless Error ---------------------------------------

  Environment variable minRAM must contain string

  Get Support --------------------------------------------
     Docs:          docs.serverless.com
     Bugs:          github.com/serverless/serverless/issues
     Issues:        forum.serverless.com

  Your Environment Information -----------------------------
     OS:                     win32
     Node Version:           8.11.1
     Serverless Version:     1.28.0

Manually changing the serverless.base.yml file, to strings, fixes that issue.

    minRAM: '128'
    minCost: '0.000000208'