Describe the bug Deployed the Version:

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

[Question] Chat service failed to respond. Please contact your administrator for support and quote the following trace id. about generative-ai-application-builder-on-aws HOT 7 CLOSED

aws-solutions commented on August 16, 2024 1

[Question] Chat service failed to respond. Please contact your administrator for support and quote the following trace id.

from generative-ai-application-builder-on-aws.

Comments (7)

joyfulelement commented on August 16, 2024 1

The issue I raised here is more like a question than a bug (how do we adjust/cap the input to the FM), but couldn't remove the bug label so update the title instead.

from generative-ai-application-builder-on-aws.

joyfulelement commented on August 16, 2024 1

Thanks @ihmaws:

The information you've provided here were super helpful!

Hi @joyfulelement , thanks for the great details! It really helps us debug the situation with you!

We have a note about Bedrock model access in the IG here, but if you weren't able to easily find it, it means we can do a better job at showing it. Thanks for the feedback, and we'll see what we can do.

Thanks for the pointer about the documentation that points out the needs to request access to models in Amazon Bedrock, I have definitely missed that!

Regarding the failure:

The final input that we send to the model is the completed prompt. The prompt can contain the user’s input, previous interactions (i.e., chat history), and document excerpts sourced from the configured knowledge base(Since you are using a RAG configuration, the document excerpts are an important consideration!)

I see. When I enabled RAG, the S3 bucket that Kendra is connected to only contains 27 documents with the size of just 79.3 KB.

So I had the Maximum number of documents configuration set to 30.

The label of the setting on the UI Optional:the max number of documents to use from the knowledge base, gave me the impression that this is should be the number of documents that needs to be closely matched to the number of documents uploaded in S3 which FM will use of it in RAG. It seems this was misunderstood by me.

Relationship: `Maximum number of documents` configuration Vs. Quality of response from FM?

What is this setting's relationship to the quality of the answer we can get from FM for this setting?

e.g.

The higher the maximum number of documents to retrieve, the prompt will be formulated with more relevant/accurate context, so we should expect better answer from FM? However, the size of the overall prompt will be larger (and likely to exceed the maximum allowable input imposed by FM) and could take a while for FM to respond?
The lower the maximum number of documents to retrieve, the prompt will be formulated with less relevant context, so we should expect less accurate answer from FM? However, the size of the overall prompt will be small (and unlikely to exceed the maximum allowable input imposed by FM) and FM should respond quickly?

Is my above interpretation correct?

So if using the Amazon Titan Express model, the max input prompt allowed is ~8000 tokens. 8000 tokens is in the ballpark of about 42000 characters which is what I believe the error message is indicating.

My initial guess as to why you are seeing the error so soon is because you have a large number set for the max documents (in the Knowledge Base configuration page in the wizard). If this is the case, I would recommend starting off with a much smaller number (try the default of 2). Here are some tips we provide within the IG: https://docs.aws.amazon.com/solutions/latest/generative-ai-application-builder-on-aws/use-the-solution.html#tips-for-managing-model-token-limits

What's the general rule of thumb in order to configure the maximum number of documents to retrieve setting right?
E.g. If the content of the knowledge base grows, and say it has 1000 documents, I assume the prompt could only be generated based on a fraction of total size of the knowledge base?

My Experiment

I tried again by setting it to just 2 and indeed as you pointed out, I no longer getting those Chat service failed to respond... errors, but I did find the responses from the bot are quite concise.

I also tried using Claude v2.1 FM and set the maximum number of documents to retrieve setting to 80, I sometimes get Chat service failed to respond... error, but when I retry the same question couple times, it will eventually run through. The answers I got were lot more comprehensive compare to the Titan + 2 max doc combination.

Suggestion to Improve the Robustness

I wonder if the Lambda we use in the solution can help cap the prompt being passed to FM to help reduce the error that will occur from misconfigurations I did above? Thanks

from generative-ai-application-builder-on-aws.

ihmaws commented on August 16, 2024 1

Hi @joyfulelement , sorry for the delay in getting back to you, the team has been out for the end of year holidays. Happy new year!

The label of the setting on the UI Optional:the max number of documents to use from the knowledge base, gave me the impression that this is should be the number of documents that needs to be closely matched to the number of documents uploaded in S3 which FM will use of it in RAG. It seems this was misunderstood by me.

Yea it's tricky because there is a bit of a dependency on the specific knowledge base being used. For now since it is Kendra, it isn't the number of documents directly in your S3 bucket, but instead the number of document excerpts that the Kendra Retrieve API returns. Kendra can return multiple excerpts from the same document, so this would filter on that final return set from Kendra. Sorry for the confusion, we'll have a look at the wording to see how we can help make it more clear.

Is my above interpretation correct?

Yes, the general gist of it is correct. The only thing I would caution is around assuming that more documents automatically means more relevant/accurate. The nuances really depend on your use case whether or not synthesizing a response across a large document set is useful.

For example, imagine your were running a web search to determine who won the most gold medals at the Rio Olympics. Multiple results probably aren't that useful, and in fact you'd probably only look at the top 1 result.

However, if you were searching for how to make the best chocolate chip cookies, then you may choose to look at the top 3-5 results to get inspiration and asses which recipe, techniques, and ingredients would yield the best results. Hope that makes sense.

What's the general rule of thumb in order to configure the maximum number of documents to retrieve setting right?

Personally, I'd recommend starting off with a smaller number (like the default of 2) and then slowly working your way up/down based on some tests. If you are finding that additional documents are yielding more complete responses, then increase it. However, if you start to see more failures or longer responses times, then consider reducing. It really is a tuning exercise you'll need to go through specific to your use cases.

My Experiment

The results make sense and are as expected. The claude v2.1 model supports (last I checked!) 200,000 input tokens, which is significantly larger then some of the other models. So if your use case benefits from the larger context window, then consider using that family of models. However, as I'm sure you are aware, the larger models typically have higher latency and a higher costs, so you want to ensure you select the right model for your needs (which I hope is what the solution is helping you to do in the first place!)

Suggestion to Improve the Robustness

We have a couple ideas we have in our backlog to cap various elements of the prompt (similar to the document cap) but none that specifically look to manually cap the overall prompt. There are a lot of considerations here that we don't think would improve the customer experience and would result in unintended side effects.

However, we are open to ideas, so if you have some thoughts about what can be done, please open a separate feature request ticket to discuss.

from generative-ai-application-builder-on-aws.

joyfulelement commented on August 16, 2024 1

Hi @joyfulelement,

We do maintain a Cost Section in our Implementation Guide to try and help customers gauge what the cost would be. I'm sorry to hear that it came as a surprise. Please do let me know if you have any ideas you have to make the costs more clear so that others don't have to learn the hard way as well.

The issue I found with the cost section in the documentation is it's a bit misleading with the example it provides. e.g. the table entry for Amazon Kendra mentioned that it will cost $1,008.00 with 8,000 queries a day and up to 100,000 documents with Kendra Enterprise Edition with 50 data sources. I was then under the false assumption that is my serverless app isn't using as much query as the example provided in the table, so it shouldn't cost ~$1000. But in fact, it was pretty close even with just 27 documents with the size of just 79.3 KB for only few days, which is where I found the cost been prohibitive for prototyping purpose.

Something that the team is looking to explore and consider adding to our roadmap is integration with Amazon Bedrock Knowledge bases. This service feature was made generally available at re:Invent and one of the key datastores it integrates with OpenSearch Serverless (OSS). I don't think OSS is 100% serverless as may have some base costs that are billed hourly; however, it may provide a lower base cost when compared to Kendra.

Thank you, I also tried Amazon Bedrock Knowledge bases, it wasn't obvious or visible from the surface which search service it uses. So thanks for pointing out that it is integrated with OSS. And indeed, it didn't cost a lot less compare to Kendra.

All that being said, we don't have an exact timeline for when other knowledge bases will be natively supported by the solution. I'll try to keep this issue in mind and share an update when that time comes.

Hope that helps send you in the right direction! If you don't mind, please open a feature request so that we can formally track this request.

Thanks for the follow up replies, I think I'll have to investigate a bit more to understand the available options for building serverless gen AI app on AWS. I believe it would be beneficial and attractive to AWS customers if AWS can continue making different tech stack + price options available, especially for ones who needs to try out with the prototype VS. the need to build a production ready app.

from generative-ai-application-builder-on-aws.

ihmaws commented on August 16, 2024

Hi @joyfulelement , thanks for the great details! It really helps us debug the situation with you!

We have a note about Bedrock model access in the IG here, but if you weren't able to easily find it, it means we can do a better job at showing it. Thanks for the feedback, and we'll see what we can do.

Regarding the failure:

The final input that we send to the model is the completed prompt. The prompt can contain the user’s input, previous interactions (i.e., chat history), and document excerpts sourced from the configured knowledge base(Since you are using a RAG configuration, the document excerpts are an important consideration!)

So if using the Amazon Titan Express model, the max input prompt allowed is ~8000 tokens. 8000 tokens is in the ballpark of about 42000 characters which is what I believe the error message is indicating.

My initial guess as to why you are seeing the error so soon is because you have a large number set for the max documents (in the Knowledge Base configuration page in the wizard). If this is the case, I would recommend starting off with a much smaller number (try the default of 2). Here are some tips we provide within the IG: https://docs.aws.amazon.com/solutions/latest/generative-ai-application-builder-on-aws/use-the-solution.html#tips-for-managing-model-token-limits

If none of those help, take a look at your cloudwatch logs (may need to enable verbosity through the wizard) and see what the final prompt sent to the LLM was. From there, see where you can decrease the size.

Let me know if any of those help and we can go from there.

from generative-ai-application-builder-on-aws.

joyfulelement commented on August 16, 2024

@ihmaws thanks for the coments and validation on my experiment. Really appreciate your guidance along the way.

Cost Feedback - Lessons I Learned The Hard Way

I later realised even the experiment with super small scale knowledge base with only 27 documents with the size of just 79.3 KB from above, the use of Amazon Kendra developer edition already cause a huge spike of the AWS bill with couple hundreds of dollars. As long as there is an index created with Kendra, regardless of its usage and size, the charge is the same and will continue unless index is completely removed.

This prohibive cost from Amazon Kendra hinders the current prototyping effort and I'm now considering to switch out using the current setup and find an alternative way to build the true serverless AI-RAG app without relying on Amazon Kendra, that can really based on the actual usage (pay for what you use). Would love to hear your thought if you know there is another AWS service we could use other than Amazon Kendra for building a serverless RAG enabled AI app, thank you.

from generative-ai-application-builder-on-aws.

ihmaws commented on August 16, 2024

Hi @joyfulelement,

We do maintain a Cost Section in our Implementation Guide to try and help customers gauge what the cost would be. I'm sorry to hear that it came as a surprise. Please do let me know if you have any ideas you have to make the costs more clear so that others don't have to learn the hard way as well.

Something that the team is looking to explore and consider adding to our roadmap is integration with Amazon Bedrock Knowledge bases. This service feature was made generally available at re:Invent and one of the key datastores it integrates with OpenSearch Serverless (OSS). I don't think OSS is 100% serverless as may have some base costs that are billed hourly; however, it may provide a lower base cost when compared to Kendra.

All that being said, we don't have an exact timeline for when other knowledge bases will be natively supported by the solution. I'll try to keep this issue in mind and share an update when that time comes.

Hope that helps send you in the right direction! If you don't mind, please open a feature request so that we can formally track this request.

from generative-ai-application-builder-on-aws.

[Question] Chat service failed to respond. Please contact your administrator for support and quote the following trace id. about generative-ai-application-builder-on-aws HOT 7 CLOSED

Comments (7)

Regarding the failure:

Relationship: `Maximum number of documents` configuration Vs. Quality of response from FM?

My Experiment

Suggestion to Improve the Robustness

Regarding the failure:

Cost Feedback - Lessons I Learned The Hard Way

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Comments (7)

Regarding the failure:

Relationship: Maximum number of documents configuration Vs. Quality of response from FM?

My Experiment

Suggestion to Improve the Robustness

Regarding the failure:

Cost Feedback - Lessons I Learned The Hard Way

Related Issues (20)

Recommend Projects

Recommend Topics

Recommend Org

Relationship: `Maximum number of documents` configuration Vs. Quality of response from FM?