aws-samples / serverless-retrieval-augmented-generation-rag-on-aws Goto Github PK

View Code? Open in Web Editor NEW

39.0 6.0 13.0 9.12 MB

A full-stack serverless RAG workflow. This is thought for running PoCs, prototypes and bootstrap your MVP.

License: MIT No Attribution

Python 16.01% Shell 0.68% JavaScript 59.58% HTML 0.29% CSS 1.52% Dockerfile 0.48% TypeScript 21.44%

aws aws-lambda full-stack lancedb reactjs retrieval-augmented-generation serverless

serverless-retrieval-augmented-generation-rag-on-aws's People

Contributors

Stargazers

Watchers

Forkers

brnaba-aws arrietafernando bennydanielt giusedroid kirtandudhatra shubcodes jonmadison-amzn softeamcloud ewave33 dcartist badrabu

serverless-retrieval-augmented-generation-rag-on-aws's Issues

Bug: Allow users to ask a question even when the table does not exist.

Currently, upon user registration, thier LanceDB table does not exist. This causes an issue if they try to ask a question with an empty knowledge-base. Fix: use Cognito post confirmation trigger to create an empty table or manage the exception and provide a user friendly message.

Feature: CloudWatch dashboards!

create cloudwatch dashboards and integrate them into the front-end.

Feature: Introducing Kúzu graph database for extra-planar relationships

Kúzu is an embedded graph database. We want to explore graph-RAG capabilites to include more relevant information when retrieving semantically.

A good MVP for this would be mapping obvious relationships at ingestion that we cannot store semantically as vectors, for example

page -> next() : page
page -> previous() : page
page -> belongsToDocument() : document
page -> sectionStartsAt() : page
page -> sectionEndsAt() : page
document -> relatesToDocument() : document[]
document -> belongsToCollection() : document[] 
document -> abstract() : string

and at semantic retrieval use the relationships mapped for the retrieved vectors to provide additonal context or exclude other vectors from context placement if they are not related to the most relevant collection. Probably using them in the context of re-ranking.

Show which documents are still processing

As of right now, you get a toast when the document starts processing, and then when the document completes processing, but there is no state showing what is currently processing. If a document takes a long time and say you refresh the page, you will have no idea if the document completed processing or not.

A simple column in the documents table showing the current state would solve this issue from the user experience side.
I do believe the list is retrieved from S3, which makes this slightly tricker... Maybe use metadata or join the data together with the list that is stored in DynamoDB?

Load Test : move knowledge base to S3Express

Measure what's the gain on latency
Any fundamental architectural changes that would make it less serverless?
impact on costs?
impact on resilience?

Additional Knowledge Bases

Allow users to create additional knowledge bases.
Currently users can have one knowledge base. This is implemented as a path on S3.
We want to change this by allowing users to create a new LanceDB path and decide which one to plug in at inference time. A few changes are needed to allow this.

change the authenticated user IAM policy so that they can have access to paths in S3 like kb/${cognito_id}/${kb-name}
allow the creation and selection of kb on the front-end
send the kb id along with the inference request, making sure the user has access to the selected path

We should also consider how to share kb with other users. Probably we won't be able to do it only via IAM...

Share knowledge bases with other users

depends on #26
Once users are able to create additional knowledge bases, create a management system to share a knowledge base with other users in read only mode or contributor mode.
Read only will allow users to only read from the lancedb table.
Contributor mode will allow users to upload documents to a specific knowledge base.
Admin mode will allow users to delete documents from a knowledge base.

This implies a change on the way we ingest documents and maintain them in the document registry. For example, we'll have to assert uniqueness based on MD5(kb_id, content) rather than MD5(user_id, content). This should not be a big deal, as we're currently using a user's cognito_id to identify their default knowledge base.

Re-ranking of sources

Implement re-ranking of sources as soon as this issue on Langchain is closed

Architecture: evaluate VPC endpoints and network topology impacts

We want to evaluate what's the (performance gain+cost saving) vs (cold-start performance impact+operational overhead) of introducing the usage of VPC endpoints (S3, Bedrock). It could potentially suck because customers would end up dealing with VPCs, NAT GW, ENIs for Lambda, subnets, but cost and performance should improve. We need to evaluate changes to the overall network topology and check if a NAT gateway is needed. A NAT gateway is a showstopper for me, because it's introducing static charges and defeats the purpose of a fully serverless architecture.

Feature: Dynamic System Prompt Management for multi-tenant users

Allow users to manage thier system prompts with a default prompt for all users. This should be done in the front-end. Suggestion: this could use SSM Parameter Store.

Load Test: concurrent usage

right now we have limited the execution of the ingestion function to 1. LanceDB has a native lock system based on DynamoDB locks currently in beta. We should experiment with this and remove the artificial limitation of 1 max concurrent execution for the writer.

Once this is implemented, we should run load tests to understand impact on retrieval performance against the same (user+knowledge-base).

Model Identifier Issue

Description:

I'm encountering an issue with the lambdaDocumentProcessorFun function. When invoking the model, I receive the following error:

Error raised by inference endpoint: An error occurred (ResourceNotFoundException) when calling the InvokeModel operation: Could not resolve the foundation model from the provided model identifier

Steps to reproduce:

Deploy the solution as described in the repository using eu-west-3 (with all Bedrock models activated).
Update AWS_REGION env from us-west-2 to eu-west-3.
Invoke the lambdaDocumentProcessorFun function.

It seems like the model identifier isn't recognized or isn't the best model. Could you provide guidance on how to correctly select a model for use cases outside us-west-2?

Environment:

AWS Region: eu-west-3

Model Identifier Test KO:

anthropic.claude-3-haiku-20240307-v1:0
anthropic.claude-3-haiku-20240307-v1:0:200k
amazon.titan-text-express-v1

Model Identifier Test OK:

amazon.titan-text-lite-v1

Thank you for your assistance!

Feature: web scraper

Give users the ability to scrape content from a URL and include it into their knowledge base.

what does this mean for the Document registry?
on demand? scheduled? leave it up to the user?

Enhancement: large PDF splitting

Running some tests we found out that embedding large documents will cause the system to time out. The timeout for the ingestion lambda is set to 300 seconds. Rather than just increase it, we would like to split large pdfs into few predictable parts and process them in parallel. We're also artificially limiting the concurrency of the processor function to 1. We'd love to remove this once the locking system for LanceDB is out of Beta.

Chore: Split main construct into sub constructs

lib/serverless-rag-on-aws-stack.ts could use some love to break it down in sub-constructs.
Thinking out loud, Front-end stack (hosting and building), ML stack (queues, processors), Front-end support stack (websockets, cognito and IAM).

Load Test: scale to billions of vectors

retrieval performance impact when getting to billions of vectors
any network bottlenecks?
what's the maximum number of vector per users per knowledge-base we can whitstand?

Chore: bump up dependencies

We have (at the time of writing) 5 open PRs for dependency management. We should address each of them independently, run acceptance tests, and merge.

When navigating directly to `/Documents` you get an access denied error

You can navigate using the React router by clicking around the screen, but if you navigate directly to the /Documents path you get an Access Denied error. This likely just requires a redirection rule in the Cloudfront/S3 connection.