Giter Site home page Giter Site logo

aws-samples / serverless-retrieval-augmented-generation-rag-on-aws Goto Github PK

View Code? Open in Web Editor NEW
39.0 6.0 13.0 9.12 MB

A full-stack serverless RAG workflow. This is thought for running PoCs, prototypes and bootstrap your MVP.

License: MIT No Attribution

Python 16.01% Shell 0.68% JavaScript 59.58% HTML 0.29% CSS 1.52% Dockerfile 0.48% TypeScript 21.44%
aws aws-lambda full-stack lancedb reactjs retrieval-augmented-generation serverless

serverless-retrieval-augmented-generation-rag-on-aws's People

Contributors

amazon-auto avatar brnaba-aws avatar dependabot[bot] avatar giusedroid avatar kirtandudhatra avatar shafkevi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

serverless-retrieval-augmented-generation-rag-on-aws's Issues

Bug: Allow users to ask a question even when the table does not exist.

Currently, upon user registration, thier LanceDB table does not exist. This causes an issue if they try to ask a question with an empty knowledge-base. Fix: use Cognito post confirmation trigger to create an empty table or manage the exception and provide a user friendly message.

Feature: Introducing Kúzu graph database for extra-planar relationships

Kúzu is an embedded graph database. We want to explore graph-RAG capabilites to include more relevant information when retrieving semantically.

A good MVP for this would be mapping obvious relationships at ingestion that we cannot store semantically as vectors, for example

page -> next() : page
page -> previous() : page
page -> belongsToDocument() : document
page -> sectionStartsAt() : page
page -> sectionEndsAt() : page
document -> relatesToDocument() : document[]
document -> belongsToCollection() : document[] 
document -> abstract() : string

and at semantic retrieval use the relationships mapped for the retrieved vectors to provide additonal context or exclude other vectors from context placement if they are not related to the most relevant collection. Probably using them in the context of re-ranking.

Show which documents are still processing

As of right now, you get a toast when the document starts processing, and then when the document completes processing, but there is no state showing what is currently processing. If a document takes a long time and say you refresh the page, you will have no idea if the document completed processing or not.

A simple column in the documents table showing the current state would solve this issue from the user experience side.
I do believe the list is retrieved from S3, which makes this slightly tricker... Maybe use metadata or join the data together with the list that is stored in DynamoDB?

Additional Knowledge Bases

Allow users to create additional knowledge bases.
Currently users can have one knowledge base. This is implemented as a path on S3.
We want to change this by allowing users to create a new LanceDB path and decide which one to plug in at inference time. A few changes are needed to allow this.

  • change the authenticated user IAM policy so that they can have access to paths in S3 like kb/${cognito_id}/${kb-name}
  • allow the creation and selection of kb on the front-end
  • send the kb id along with the inference request, making sure the user has access to the selected path

We should also consider how to share kb with other users. Probably we won't be able to do it only via IAM...

Share knowledge bases with other users

depends on #26
Once users are able to create additional knowledge bases, create a management system to share a knowledge base with other users in read only mode or contributor mode.
Read only will allow users to only read from the lancedb table.
Contributor mode will allow users to upload documents to a specific knowledge base.
Admin mode will allow users to delete documents from a knowledge base.

This implies a change on the way we ingest documents and maintain them in the document registry. For example, we'll have to assert uniqueness based on MD5(kb_id, content) rather than MD5(user_id, content). This should not be a big deal, as we're currently using a user's cognito_id to identify their default knowledge base.

Architecture: evaluate VPC endpoints and network topology impacts

We want to evaluate what's the (performance gain+cost saving) vs (cold-start performance impact+operational overhead) of introducing the usage of VPC endpoints (S3, Bedrock). It could potentially suck because customers would end up dealing with VPCs, NAT GW, ENIs for Lambda, subnets, but cost and performance should improve. We need to evaluate changes to the overall network topology and check if a NAT gateway is needed. A NAT gateway is a showstopper for me, because it's introducing static charges and defeats the purpose of a fully serverless architecture.

Load Test: concurrent usage

right now we have limited the execution of the ingestion function to 1. LanceDB has a native lock system based on DynamoDB locks currently in beta. We should experiment with this and remove the artificial limitation of 1 max concurrent execution for the writer.

Once this is implemented, we should run load tests to understand impact on retrieval performance against the same (user+knowledge-base).

Model Identifier Issue

Description:

I'm encountering an issue with the lambdaDocumentProcessorFun function. When invoking the model, I receive the following error:

Error raised by inference endpoint: An error occurred (ResourceNotFoundException) when calling the InvokeModel operation: Could not resolve the foundation model from the provided model identifier

Steps to reproduce:

  1. Deploy the solution as described in the repository using eu-west-3 (with all Bedrock models activated).
  2. Update AWS_REGION env from us-west-2 to eu-west-3.
  3. Invoke the lambdaDocumentProcessorFun function.

It seems like the model identifier isn't recognized or isn't the best model. Could you provide guidance on how to correctly select a model for use cases outside us-west-2?

Environment:

  • AWS Region: eu-west-3

Model Identifier Test KO:

  • anthropic.claude-3-haiku-20240307-v1:0
  • anthropic.claude-3-haiku-20240307-v1:0:200k
  • amazon.titan-text-express-v1

Model Identifier Test OK:

  • amazon.titan-text-lite-v1

Thank you for your assistance!

Feature: web scraper

Give users the ability to scrape content from a URL and include it into their knowledge base.

  • what does this mean for the Document registry?
  • on demand? scheduled? leave it up to the user?

Enhancement: large PDF splitting

Running some tests we found out that embedding large documents will cause the system to time out. The timeout for the ingestion lambda is set to 300 seconds. Rather than just increase it, we would like to split large pdfs into few predictable parts and process them in parallel. We're also artificially limiting the concurrency of the processor function to 1. We'd love to remove this once the locking system for LanceDB is out of Beta.

Chore: Split main construct into sub constructs

lib/serverless-rag-on-aws-stack.ts could use some love to break it down in sub-constructs.
Thinking out loud, Front-end stack (hosting and building), ML stack (queues, processors), Front-end support stack (websockets, cognito and IAM).

Load Test: scale to billions of vectors

  • retrieval performance impact when getting to billions of vectors
  • any network bottlenecks?
  • what's the maximum number of vector per users per knowledge-base we can whitstand?

Chore: bump up dependencies

We have (at the time of writing) 5 open PRs for dependency management. We should address each of them independently, run acceptance tests, and merge.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.