Giter Site home page Giter Site logo

stepfunction-lambda-fileprocessor's Introduction

Process arbitarily large newline-delimited text files on AWS

Inspired by this article - sometimes you may find the need to process very large newline-delimited text files within AWS - examples include large CSV files, or jsonl files. Or your files may not even be that large but you may want to do something time consuming or computationally intensive with each line. Either way, the limits imposed on lambdas, although recently increased, may preclude you from being able to use a single lambda invocation to process your very large file.

Enter AWS step functions - a very simple but powerful way to stitch cloud services together and orchestrate more complex workflows from serverless components. This project uses a step function and a simple lambda to process, line by line, an arbitarily sized file of newline-delimited text.

To use this, you'll need an AWS account and the CLI installed and configured with an access key. In addition, you'll need the serverless framework and npm installed.

Then follow these steps,

Clone the repository

https://github.com/changamire/stepfunction-lambda-fileprocessor.git

Install serverlerless plugins

npm install

Deploy to the cloud

sls deploy

One the application is deployed, the step-function can be started, passing it an event containing details of the file in S3 to be processed in the structure of an S3 event, e.g.:-

{
  "Records": [
    {
      "s3": {
        "object": {
          "key": "BigBoy.csv"
        },
        "bucket": {
          "name": "my-bucket"
        }
      }
    }
  ]
}

stepfunction-lambda-fileprocessor's People

Contributors

changamire avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.