aws-samples / amazon-bedrock-rag-workshop Goto Github PK

Workshop Studio

Home Page: https://catalog.us-east-1.prod.workshops.aws/workshops/77e0888c-7086-478b-af44-4562c55b1faf/en-US

License: MIT No Attribution

Jupyter Notebook 99.39% Python 0.61%

amazon-bedrock-rag-workshop's Introduction

Amazon Bedrock Retrieval-Augmented Generation (RAG) Workshop

The goal of this workshop is to give you hands-on experience leveraging foundation models (FMs) and retrieval-augmented generation (RAG) with Amazon Bedrock. Amazon Bedrock is a fully managed service that provides access to FMs from third-party providers and Amazon; available via an API. With Bedrock, you can choose from a variety of models to find the one that’s best suited for your use case.

Within this series of labs, you will be taken through some of the most common RAG usage patterns we are seeing with our customers for Generative AI. We will explore RAG techniques for generating text, creating value for organizations by improving productivity. This is achieved by leveraging foundation models to help in retrieving and searching documents. You will gain hands-on experience using Bedrock APIs, SDKs, and open-source software for example LangChain and FAISS to implement these usage patterns.

This workshop is intended for developers and solution builders.

Workshop Overview

There are two primary types of knowledge for LLMs:

Parametric knowledge: refers to everything the LLM learned during training and acts as a frozen snapshot of the world for the LLM.
Source (external) knowledge: covers any information fed into the LLM via the input prompt.

Fine-tuning, explored in other workshops, improves the LLM's parametric knowledge through fine-tuning. Since fine-tuning is a resouce intensive operation, this option is well suited for infusing static domain-specific information like domain-specific langauage/writing styles (medical domain, science domain, ...) or optimizing performance towards a very specific task (classification, sentiment analysis, RLHF, instruction-finetuning, ...).

Retrieval-augmented generation (RAG) uses source (external) knowledge dynamically from domain-specific data stores to ingest up-to-date and relevnt information into the LLM through the prompt. This is often called prompt augmentation. RAG is particularily well suited for dynamic data which is updated relatively-frequently.

This workshop is focused on retrieval-augmented generation (RAG) to ingest domain-specific information through the source knowledge and augment the prompt which is passed to the LLM.

Abstractions

The key abstractions used in this workshop are as follows:

LLM (Large Language Model): Anthropic Claude V2 available through Amazon Bedrock

This model will be used to understand the document chunks and provide an answer in human friendly manner.
Embeddings Model: Amazon Titan Embeddings available through Amazon Bedrock

This model will be used to generate a numerical representation of the textual documents
Document Loader:
- PDF Loader available through LangChain for PDFs
- TextLoader available through LangChain for txts
These are loaders that can load the documents from a source, for the sake of this notebook we are loading the sample files from a local path. This could easily be replaced with a loader to load documents from enterprise internal systems.
Vector Store: FAISS, LlamaIndex, ChromaDB - available through LangChain

Most labs use an in-memory vector-store to store both the docuemnts and their embeddings. In an enterprise context this could be replaced with a persistent store such as AWS OpenSearch, RDS Postgres with pgVector, ChromaDB, Pinecone or Weaviate.
Chunking: Splits of data

The original data is split into smaller chunks of text for more fine-grained relevancy searching. The chunk size is something that needs to be determined based on your dataset and use case.
Index: VectorIndex

The index helps to compare the input embedding and the document embeddings to find relevant document.
Wrapper: wraps index, vector store, embeddings model and the LLM to abstract away the logic from the user.
Retrieval and Search: Retrieval Question-Answer (QA), Semantic Similarity Search
Orchestrator: LangChain, LlamaIndex The orchestrator coordinates all parts of the RAG workflow.

Labs

Each lab handles its own data ingestion (PDFs, text, etc), vector storage (FAISS, LlamaIndex, ChromaDB, etc), and RAG orchestration (LangChain, LlamaIndex). As such, each lab can be run independently - and does not depend on a previous lab.

Below is a high-level overview of the labs in this workshop which follow the RAG workflow as shown in the figure below:

Introduction [5 mins]

Introduction to the lab environment which includes prerequisites.

Semantic Similarity Search [30 mins]

Semantic search uses vector embedding representations of documents to perform searches in higher-dimensional vector spaces. Semantic Similarity Search often out performs basic keyword search which simply compares the number of keywords and phrases shared between documents.

Semantic Similarity Search with Metadata Filtering [30 mins]

When ingesting data into your system, you can add optional metadata such as "year" or "department". This metadata can be used to filter your queries when retrieving documents through Semantic Similarity Search, for example. This reduces the data used to augment the prompt - and ultimately helps to improve the relevancy of the results from the LLM.

Semantic Similarity Search with Document Summaries [30 mins]

When ingesting your documents, you can build an index of document summaries along with the document. These summaries can be used by the Semantic Similarity Search algorithm (e.g. K-Nearest Neighbor) to improve retrieval results and reduce retrieval latency.

Semantic Similarity Search with Re-Ranking [30 mins]

You can improve result relevancy by adding an extra re-rank step in the retrieval process. The re-ranking often includes a diversification factor to introduce a bit of diversity in the results. This allows for some results to rank higher - even if they're not scored highest by the Semantic Similarity Search algorithm.

amazon-bedrock-rag-workshop's People

Contributors

Stargazers

Watchers

amazon-bedrock-rag-workshop's Issues

Lab 05 got IndexError: list index out of range when calling get_retrieved_nodes()

For notebooks in 05_Semantic_Search_with_Reranking, when calling get_retrieved_nodes() with with_reranker=True, got IndexError: list index out of range.
Detailed error message:

IndexError Traceback (most recent call last)
Cell In[26], line 1
----> 1 retrieved_nodes1_withreranker = get_retrieved_nodes(
2 "How has AWS evolved?",
3 vector_top_k=1,
4 reranker_top_n=1,
5 with_reranker=True,
6 )

Cell In[22], line 30, in get_retrieved_nodes(query_str, vector_top_k, reranker_top_n, with_reranker)
21 if with_reranker:
22 # configure reranker
23 reranker = LLMRerank(
24 choice_batch_size=10, # 5,
25 top_n=reranker_top_n,
(...)
28
29 )
---> 30 retrieved_nodes = reranker.postprocess_nodes(retrieved_nodes, query_bundle)
32 return retrieved_nodes

File /opt/conda/lib/python3.10/site-packages/llama_index/indices/postprocessor/llm_rerank.py:85, in LLMRerank.postprocess_nodes(self, nodes, query_bundle)
78 # call each batch independently
79 raw_response = self.service_context.llm_predictor.predict(
80 self.choice_select_prompt,
81 context_str=fmt_batch_str,
82 query_str=query_str,
83 )
---> 85 raw_choices, relevances = self._parse_choice_select_answer_fn(
86 raw_response, len(nodes_batch)
87 )
88 choice_idxs = [int(choice) - 1 for choice in raw_choices]
89 choice_nodes = [nodes_batch[idx] for idx in choice_idxs]

File /opt/conda/lib/python3.10/site-packages/llama_index/indices/utils.py:104, in default_parse_choice_select_answer_fn(answer, num_choices, raise_error)
98 else:
99 raise ValueError(
100 f"Invalid answer line: {answer_line}. "
101 "Answer line must be of the form: "
102 "answer_num: , answer_relevance: "
103 )
--> 104 answer_num = int(line_tokens[0].split(":")[1].strip())
105 if answer_num > num_choices:
106 continue

IndexError: list index out of range

Pydantic Error

Getting the following error on import for Setup.ipynb after the build.

Error:

---------------------------------------------------------------------------
PydanticUserError                         Traceback (most recent call last)
Cell In[2], line 9
      7 module_path = ".."
      8 sys.path.append(os.path.abspath(module_path))
----> 9 from utils import bedrock, print_ww
     12 # ---- ⚠️ Un-comment and edit the below lines as needed for your AWS setup ⚠️ ----
     14 os.environ["AWS_DEFAULT_REGION"] =  # E.g. "us-east-1"

File ~/BedrockRag/amazon-bedrock-rag-workshop/utils/bedrock.py:91
     87 class BedrockModel(Enum):
     88     STABLE_DIFFUSION = "stability.stable-diffusion-xl"
---> 91 class Bedrock:
     92     __DEFAULT_EMPTY_EMBEDDING = [
     93         0.0
     94     ] * 4096  # - we need to return an array of floats 4096 in size
     95     __RETRY_BACKOFF_SEC = 3

File ~/BedrockRag/amazon-bedrock-rag-workshop/utils/bedrock.py:105, in Bedrock()
    102         assert str(type(client)) == "<class 'botocore.client.Bedrock'>", f"The client passed in not a valid boto3 bedrock client, got {type(client)}"
    103         self.client = client
--> 105 @root_validator()
    106 def validate_environment(cls, values: Dict) -> Dict:
    107     bedrock_client = get_bedrock_client(assumed_role=None) #boto3.client("bedrock")
    108     values["client"] = bedrock_client

File /opt/conda/lib/python3.10/site-packages/pydantic/deprecated/class_validators.py:228, in root_validator(pre, skip_on_failure, allow_reuse, *__args)
    226 mode: Literal['before', 'after'] = 'before' if pre is True else 'after'
    227 if pre is False and skip_on_failure is not True:
--> 228     raise PydanticUserError(
    229         'If you use `@root_validator` with pre=False (the default) you MUST specify `skip_on_failure=True`.'
    230         ' Note that `@root_validator` is deprecated and should be replaced with `@model_validator`.',
    231         code='root-validator-pre-skip',
    232     )
    234 wrap = partial(_decorators_v1.make_v1_generic_root_validator, pre=pre)
    236 def dec(f: Callable[..., Any] | classmethod[Any, Any, Any] | staticmethod[Any, Any]) -> Any:

PydanticUserError: If you use `@root_validator` with pre=False (the default) you MUST specify `skip_on_failure=True`. Note that `@root_validator` is deprecated and should be replaced with `@model_validator`.

For further information visit https://errors.pydantic.dev/2.3/u/root-validator-pre-skip

'Bedrock' object has no attribute 'invoke_model'

Getting the following error on 01_RAG_with_Semantic_Search_Titan_Embeddings_Claude.ipynb

looks like api is change

Error:
`---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
File ~/.local/lib/python3.11/site-packages/langchain/embeddings/bedrock.py:121, in BedrockEmbeddings._embedding_func(self, text)
120 try:
--> 121 response = self.client.invoke_model(
122 body=body,
123 modelId=self.model_id,
124 accept="application/json",
125 contentType="application/json",
126 )
127 response_body = json.loads(response.get("body").read())

File ~/.local/lib/python3.11/site-packages/botocore/client.py:888, in BaseClient.getattr(self, item)
886 return event_response
--> 888 raise AttributeError(
889 f"'{self.class.name}' object has no attribute '{item}'"
890 )

AttributeError: 'Bedrock' object has no attribute 'invoke_model'

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
/home/user/amazon-bedrock-rag-workshop-main/02_Semantic_Search/01_RAG_with_Semantic_Search_Titan_Embeddings_Claude.ipynb 儲存格 23 line 1
----> 1 sample_embedding = np.array(bedrock_embeddings.embed_query(docs[0].page_content))
...
128 return response_body.get("embedding")
129 except Exception as e:
--> 130 raise ValueError(f"Error raised by inference endpoint: {e}")

ValueError: Error raised by inference endpoint: 'Bedrock' object has no attribute 'invoke_model'`