Giter Site home page Giter Site logo

brnaba-aws / project-lakechain Goto Github PK

View Code? Open in Web Editor NEW

This project forked from awslabs/project-lakechain

0.0 0.0 0.0 56.68 MB

:zap: A cloud-native, AI-powered, document processing framework built on top of the AWS CDK.

Home Page: https://awslabs.github.io/project-lakechain/

License: Apache License 2.0

Shell 0.24% JavaScript 3.55% Python 12.86% TypeScript 82.72% Dockerfile 0.63%

project-lakechain's Introduction





Project Lakechain ย Static Badge

A cloud-native, AI-powered, document processing framework built on top of the AWS CDK.

Github Codespaces


๐Ÿ”– Features

  • ๐Ÿค– Composable โ€” Composable API to express document processing pipelines using middlewares.
  • โ˜๏ธ Scalable โ€” Scales out-of-the box. Process millions of documents, scale to zero automatically when done.
  • โšก Cost Efficient โ€” Uses cost-optimized architectures to reduce costs and drive a pay-as-you-go model.
  • ๐Ÿš€ Ready to use โ€” 40+ built-in middlewares for common document processing tasks, ready to be deployed.
  • ๐ŸฆŽ GPU and CPU Support โ€” Use the right compute type to balance between performance and cost.
  • ๐Ÿ“ฆ Bring Your Own โ€” Create your own transform middlewares to process documents and extend Lakechain.
  • ๐Ÿ“™ Ready Made Examples - Quickstart your journey by leveraging 40+ examples we've built for you.

๐Ÿš€ Getting Started

Important

๐Ÿ‘‰ Head to our documentation which contains all the information required to understand the project, and quickly start building!

What's Lakechain โ“

Project Lakechain is an experimental framework based on the AWS Cloud Development Kit (CDK) that makes it easy to express and deploy scalable document processing pipelines on AWS using infrastructure-as-code. It emphasizes on modularity of pipelines, and provides 40+ ready to use components for prototyping complex document pipelines that can scale out of the box to millions of documents.

This project has been designed to help AWS customers build and scale different types of document processing pipelines, ranging a wide array of use-cases including Metadata Extraction, Document Conversion, NLP analysis, Text Summarization, Text Translation, Audio Transcription, Computer Vision, Retrieval Augmented Generation pipelines, and much more!

Show me the code โ—

Below is an example of a pipeline built with Lakechain that deploys the infrastructure required to automatically transcribe audio files uploaded to S3, in just a few lines of code.

๐Ÿ‘‡ This pipeline will scale to millions of documents.

export class TranscriptionStack extends cdk.Stack {
  constructor(scope: Construct, id: string) {

    // Listens for new documents on S3.
    const trigger = new S3EventTrigger.Builder()
      .withScope(this)
      .withIdentifier('Trigger')
      .withCacheStorage(cache)
      .withBucket(bucket)
      .build();

    // Transcribes uploaded audio files with Amazon Transcribe,
    // and stores the result in a destination bucket.
    trigger
      .pipe(new TranscribeAudioProcessor.Builder()
        .withScope(this)
        .withIdentifier('Transcribe')
        .withCacheStorage(cache)
        .build()
      )
      .pipe(new S3StorageConnector.Builder()
        .withScope(this)
        .withIdentifier('Storage')
        .withCacheStorage(cache)
        .withDestinationBucket(destination)
        .build()
      );
  }
}

LICENSE

See LICENSE.

project-lakechain's People

Contributors

hqarroum avatar brnaba-aws avatar github-actions[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.