Inspired by this article - sometimes you may find the need to process very large newline-delimited text files within AWS - examples include large CSV files, or jsonl files. Or your files may not even be that large but you may want to do something time consuming or computationally intensive with each line. Either way, the limits imposed on lambdas, although recently increased, may preclude you from being able to use a single lambda invocation to process your very large file.
Enter AWS step functions - a very simple but powerful way to stitch cloud services together and orchestrate more complex workflows from serverless components. This project uses a step function and a simple lambda to process, line by line, an arbitarily sized file of newline-delimited text.
To use this, you'll need an AWS account and the CLI installed and configured with an access key. In addition, you'll need the serverless framework and npm installed.
Then follow these steps,
Clone the repository
https://github.com/changamire/stepfunction-lambda-fileprocessor.git
Install serverlerless plugins
npm install
Deploy to the cloud
sls deploy
One the application is deployed, the step-function can be started, passing it an event containing details of the file in S3 to be processed in the structure of an S3 event, e.g.:-
{
"Records": [
{
"s3": {
"object": {
"key": "BigBoy.csv"
},
"bucket": {
"name": "my-bucket"
}
}
}
]
}