Comments (7)
The issue I raised here is more like a question than a bug (how do we adjust/cap the input to the FM), but couldn't remove the bug
label so update the title instead.
from generative-ai-application-builder-on-aws.
Thanks @ihmaws:
The information you've provided here were super helpful!
Hi @joyfulelement , thanks for the great details! It really helps us debug the situation with you!
We have a note about Bedrock model access in the IG here, but if you weren't able to easily find it, it means we can do a better job at showing it. Thanks for the feedback, and we'll see what we can do.
Thanks for the pointer about the documentation that points out the needs to request access to models in Amazon Bedrock, I have definitely missed that!
Regarding the failure:
The final input that we send to the model is the completed prompt. The prompt can contain the user’s input, previous interactions (i.e., chat history), and document excerpts sourced from the configured knowledge base(Since you are using a RAG configuration, the document excerpts are an important consideration!)
I see. When I enabled RAG, the S3 bucket that Kendra is connected to only contains 27 documents with the size of just 79.3 KB.
![Screenshot 2023-12-21 at 11 19 56 PM](https://private-user-images.githubusercontent.com/5963818/292372912-121fac90-2d95-4acc-afc9-32a940eb9e67.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjMxNzczMzAsIm5iZiI6MTcyMzE3NzAzMCwicGF0aCI6Ii81OTYzODE4LzI5MjM3MjkxMi0xMjFmYWM5MC0yZDk1LTRhY2MtYWZjOS0zMmE5NDBlYjllNjcucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDgwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA4MDlUMDQxNzEwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9N2YzYWQwMDA4NmZiYTlkZDMxNzBiODI1YTY4NmE2ODNjOTFmNGJiNzc2MTMwNTZkMTNhYWNjMjcwZGUwMTRiMyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.6g4dy9AKqgM7mhy6dqIE2sTVWeRLpUAsxsMBO838EPY)
![Screenshot 2023-12-21 at 11 19 47 PM](https://private-user-images.githubusercontent.com/5963818/292372915-c9e6dfc0-e34d-45df-a83b-1864aba0e7c8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjMxNzczMzAsIm5iZiI6MTcyMzE3NzAzMCwicGF0aCI6Ii81OTYzODE4LzI5MjM3MjkxNS1jOWU2ZGZjMC1lMzRkLTQ1ZGYtYTgzYi0xODY0YWJhMGU3YzgucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDgwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA4MDlUMDQxNzEwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9YTVjNzhlODVmYTA4MjE3ZjU4MmI5MzVkNDI3MTE4OTM5OTcyMDU3NWZkYjRjNzYzNzdkYzc0MTdjODhjOTZjZCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.M-0OvBt058t9ooMPDrefNSk6NRQtoK20AwVuLpgdPvg)
So I had the Maximum number of documents
configuration set to 30
.
The label of the setting on the UI Optional:the max number of documents to use from the knowledge base
, gave me the impression that this is should be the number of documents that needs to be closely matched to the number of documents uploaded in S3 which FM will use of it in RAG. It seems this was misunderstood by me.
Relationship: Maximum number of documents
configuration Vs. Quality of response from FM?
What is this setting's relationship to the quality of the answer we can get from FM for this setting?
e.g.
- The higher the
maximum number of documents to retrieve
, the prompt will be formulated with more relevant/accurate context, so we should expect better answer from FM? However, the size of the overall prompt will be larger (and likely to exceed the maximum allowable input imposed by FM) and could take a while for FM to respond? - The lower the
maximum number of documents to retrieve
, the prompt will be formulated with less relevant context, so we should expect less accurate answer from FM? However, the size of the overall prompt will be small (and unlikely to exceed the maximum allowable input imposed by FM) and FM should respond quickly?
Is my above interpretation correct?
So if using the Amazon Titan Express model, the max input prompt allowed is ~8000 tokens. 8000 tokens is in the ballpark of about 42000 characters which is what I believe the error message is indicating.
My initial guess as to why you are seeing the error so soon is because you have a large number set for the max documents (in the Knowledge Base configuration page in the wizard). If this is the case, I would recommend starting off with a much smaller number (try the default of 2). Here are some tips we provide within the IG: https://docs.aws.amazon.com/solutions/latest/generative-ai-application-builder-on-aws/use-the-solution.html#tips-for-managing-model-token-limits
What's the general rule of thumb in order to configure the maximum number of documents to retrieve
setting right?
E.g. If the content of the knowledge base grows, and say it has 1000 documents, I assume the prompt could only be generated based on a fraction of total size of the knowledge base?
My Experiment
I tried again by setting it to just 2
and indeed as you pointed out, I no longer getting those Chat service failed to respond...
errors, but I did find the responses from the bot are quite concise.
I also tried using Claude v2.1
FM and set the maximum number of documents to retrieve
setting to 80
, I sometimes get Chat service failed to respond...
error, but when I retry the same question couple times, it will eventually run through. The answers I got were lot more comprehensive compare to the Titan
+ 2 max doc
combination.
Suggestion to Improve the Robustness
I wonder if the Lambda we use in the solution can help cap the prompt being passed to FM to help reduce the error that will occur from misconfigurations I did above? Thanks
from generative-ai-application-builder-on-aws.
Hi @joyfulelement , sorry for the delay in getting back to you, the team has been out for the end of year holidays. Happy new year!
The label of the setting on the UI Optional:the max number of documents to use from the knowledge base, gave me the impression that this is should be the number of documents that needs to be closely matched to the number of documents uploaded in S3 which FM will use of it in RAG. It seems this was misunderstood by me.
Yea it's tricky because there is a bit of a dependency on the specific knowledge base being used. For now since it is Kendra, it isn't the number of documents directly in your S3 bucket, but instead the number of document excerpts that the Kendra Retrieve API returns. Kendra can return multiple excerpts from the same document, so this would filter on that final return set from Kendra. Sorry for the confusion, we'll have a look at the wording to see how we can help make it more clear.
Is my above interpretation correct?
Yes, the general gist of it is correct. The only thing I would caution is around assuming that more documents automatically means more relevant/accurate. The nuances really depend on your use case whether or not synthesizing a response across a large document set is useful.
For example, imagine your were running a web search to determine who won the most gold medals at the Rio Olympics. Multiple results probably aren't that useful, and in fact you'd probably only look at the top 1 result.
However, if you were searching for how to make the best chocolate chip cookies, then you may choose to look at the top 3-5 results to get inspiration and asses which recipe, techniques, and ingredients would yield the best results. Hope that makes sense.
What's the general rule of thumb in order to configure the maximum number of documents to retrieve setting right?
Personally, I'd recommend starting off with a smaller number (like the default of 2) and then slowly working your way up/down based on some tests. If you are finding that additional documents are yielding more complete responses, then increase it. However, if you start to see more failures or longer responses times, then consider reducing. It really is a tuning exercise you'll need to go through specific to your use cases.
My Experiment
The results make sense and are as expected. The claude v2.1 model supports (last I checked!) 200,000 input tokens, which is significantly larger then some of the other models. So if your use case benefits from the larger context window, then consider using that family of models. However, as I'm sure you are aware, the larger models typically have higher latency and a higher costs, so you want to ensure you select the right model for your needs (which I hope is what the solution is helping you to do in the first place!)
Suggestion to Improve the Robustness
We have a couple ideas we have in our backlog to cap various elements of the prompt (similar to the document cap) but none that specifically look to manually cap the overall prompt. There are a lot of considerations here that we don't think would improve the customer experience and would result in unintended side effects.
However, we are open to ideas, so if you have some thoughts about what can be done, please open a separate feature request ticket to discuss.
from generative-ai-application-builder-on-aws.
Hi @joyfulelement,
We do maintain a Cost Section in our Implementation Guide to try and help customers gauge what the cost would be. I'm sorry to hear that it came as a surprise. Please do let me know if you have any ideas you have to make the costs more clear so that others don't have to learn the hard way as well.
The issue I found with the cost section in the documentation is it's a bit misleading with the example it provides. e.g. the table entry for Amazon Kendra mentioned that it will cost $1,008.00 with 8,000 queries a day and up to 100,000 documents with Kendra Enterprise Edition with 50 data sources.
I was then under the false assumption that is my serverless
app isn't using as much query as the example provided in the table, so it shouldn't cost ~$1000. But in fact, it was pretty close even with just 27 documents with the size of just 79.3 KB for only few days, which is where I found the cost been prohibitive for prototyping purpose.
Something that the team is looking to explore and consider adding to our roadmap is integration with Amazon Bedrock Knowledge bases. This service feature was made generally available at re:Invent and one of the key datastores it integrates with OpenSearch Serverless (OSS). I don't think OSS is 100% serverless as may have some base costs that are billed hourly; however, it may provide a lower base cost when compared to Kendra.
Thank you, I also tried Amazon Bedrock Knowledge bases, it wasn't obvious or visible from the surface which search service it uses. So thanks for pointing out that it is integrated with OSS. And indeed, it didn't cost a lot less compare to Kendra.
All that being said, we don't have an exact timeline for when other knowledge bases will be natively supported by the solution. I'll try to keep this issue in mind and share an update when that time comes.
Hope that helps send you in the right direction! If you don't mind, please open a feature request so that we can formally track this request.
Thanks for the follow up replies, I think I'll have to investigate a bit more to understand the available options for building serverless gen AI app on AWS. I believe it would be beneficial and attractive to AWS customers if AWS can continue making different tech stack + price options available, especially for ones who needs to try out with the prototype VS. the need to build a production ready app.
from generative-ai-application-builder-on-aws.
Hi @joyfulelement , thanks for the great details! It really helps us debug the situation with you!
We have a note about Bedrock model access in the IG here, but if you weren't able to easily find it, it means we can do a better job at showing it. Thanks for the feedback, and we'll see what we can do.
Regarding the failure:
The final input that we send to the model is the completed prompt. The prompt can contain the user’s input, previous interactions (i.e., chat history), and document excerpts sourced from the configured knowledge base(Since you are using a RAG configuration, the document excerpts are an important consideration!)
So if using the Amazon Titan Express model, the max input prompt allowed is ~8000 tokens. 8000 tokens is in the ballpark of about 42000 characters which is what I believe the error message is indicating.
My initial guess as to why you are seeing the error so soon is because you have a large number set for the max documents (in the Knowledge Base configuration page in the wizard). If this is the case, I would recommend starting off with a much smaller number (try the default of 2). Here are some tips we provide within the IG: https://docs.aws.amazon.com/solutions/latest/generative-ai-application-builder-on-aws/use-the-solution.html#tips-for-managing-model-token-limits
If none of those help, take a look at your cloudwatch logs (may need to enable verbosity through the wizard) and see what the final prompt sent to the LLM was. From there, see where you can decrease the size.
Let me know if any of those help and we can go from there.
from generative-ai-application-builder-on-aws.
@ihmaws thanks for the coments and validation on my experiment. Really appreciate your guidance along the way.
Cost Feedback - Lessons I Learned The Hard Way
I later realised even the experiment with super small scale knowledge base with only 27 documents with the size of just 79.3 KB from above, the use of Amazon Kendra developer edition already cause a huge spike of the AWS bill with couple hundreds of dollars. As long as there is an index created with Kendra, regardless of its usage and size, the charge is the same and will continue unless index is completely removed.
This prohibive cost from Amazon Kendra hinders the current prototyping effort and I'm now considering to switch out using the current setup and find an alternative way to build the true serverless AI-RAG app without relying on Amazon Kendra, that can really based on the actual usage (pay for what you use). Would love to hear your thought if you know there is another AWS service we could use other than Amazon Kendra for building a serverless RAG enabled AI app, thank you.
from generative-ai-application-builder-on-aws.
Hi @joyfulelement,
We do maintain a Cost Section in our Implementation Guide to try and help customers gauge what the cost would be. I'm sorry to hear that it came as a surprise. Please do let me know if you have any ideas you have to make the costs more clear so that others don't have to learn the hard way as well.
Something that the team is looking to explore and consider adding to our roadmap is integration with Amazon Bedrock Knowledge bases. This service feature was made generally available at re:Invent and one of the key datastores it integrates with OpenSearch Serverless (OSS). I don't think OSS is 100% serverless as may have some base costs that are billed hourly; however, it may provide a lower base cost when compared to Kendra.
All that being said, we don't have an exact timeline for when other knowledge bases will be natively supported by the solution. I'll try to keep this issue in mind and share an update when that time comes.
Hope that helps send you in the right direction! If you don't mind, please open a feature request so that we can formally track this request.
from generative-ai-application-builder-on-aws.
Related Issues (20)
- don't use a static login UI and use the Cognito UI for more functionalities HOT 2
- Add language parameter for Kendra search HOT 6
- CORS error after deployment HOT 12
- Incorrect README doucmentation HOT 2
- Initial deployment failed to create UseCasesTableXXX based on KMS validation error: com.amazonaws.services.kms.model.NotFoundException: Key 'arn:aws:kms:us-east-1:xxx:key/xxxx' does not exist HOT 2
- Source code highlighting for chatbot. HOT 3
- Chat over single uploaded document HOT 1
- Option to use Bedrock knowledge base instead of kendra HOT 2
- UI: add option to see the Prompt history per session HOT 1
- Additional permissions are required HOT 3
- OpenAPI doc for the REST API? HOT 2
- Ability to work with Bedrock Agents HOT 1
- Add option to use Bedrock in a different AWS region HOT 3
- Deployment of use case fails when email address is provided for the use case. HOT 2
- Conversation with RAG enabled Claude V2 is ending abruptly HOT 4
- support for query of non-english documents HOT 2
- support for claude 3 / sonnet & haiku HOT 2
- UI/Backend misalignment of prompt limit enforcement HOT 2
- Chat Failed with RAG for Cohere and Meta models HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from generative-ai-application-builder-on-aws.