Chat-with-Document-s-using-ChatGPT-API-and-Text-Embedding

Main idea

The short answer is that they convert documents that are over 100 or even 1,000 pages long into a numeric representation of data and related context (vector embedding) and store them in a vector search engine. When a user chats with the document (i.e., asks questions), the system searches and returns similar text to a question (i.e., chat) using an algorithm called Approximate Nearest Neighbor search (ANN). It looks something like this.

Fig.1 What is vector search?

Resources (images, documents, audio)
vector representation
Nearest neighbor
Output/results

The program then includes the returned text that is similar to a question (i.e., chat) in a prompt and asks the same question again to the OpenAI GPT-3 API. This returns a nice response that you are used to with ChatGPT. The benefits of this approach are that the prompt is much smaller, not 1,000 pages of documents, and the user gets the answer they are looking for.

On a side note, if you are worried about something like ChatGPT providing incorrect answers to a question, then you can point it to your organization’s knowledge base and provide answers from there. The accuracy of the answers will be as good as your knowledge base.

Instance of Chatbot using ChatGPT API and Text Embedding

We provide an example which help you handle the related tech: the project (repo) is a chatbot example based on GPT-3.5 API and ChromaDB. The chatbot covers conversation topics related to data science, machine learning, and deep learning. The data (documents/text) comes from Ocademy (an open-source project) and is continually updated to optimize and improve the project, including using a more appropriate database, adding more user-friendly prompts, and increasing the number of GPT channels, among others. In the near future, we plan to deploy the chatbot project to AWS and combine it with a data science education website to provide users with more services.

It is worth mentioning that we do not simply call the API, but we also focus on the engineering implementation and management of the API. We take a comprehensive approach in our implementation of the API, ensuring that the chatbot can handle various scenarios with high accuracy and efficiency. For instance, we have carefully designed the vector database to provide optimal search and retrieval performance. We also leverage advanced techniques in machine learning and natural language processing to refine the responses generated by the chatbot.

To cater to diverse conversation scenarios, we have set up multiple prompt engineering and GPT-3.5 pipeline/chain types.

Stuff chain type
Map_reduce chain type
Refine chain type At the same time, we also consider how to deal with special conversation scenarios, such as inhumane questions or questions completely unrelated to data science.

The Stuff chain type focuses on retrieving and presenting the most relevant information in a concise manner. The Map_reduce chain type is used for summarization, allowing the chatbot to provide a brief overview of a particular topic or concept. The Refine chain type is dedicated to improving the coherence and fluency of the chatbot's responses, ensuring a smooth and natural conversation.

Moreover, we understand that conversations may deviate from the expected topics, or may contain inappropriate or irrelevant content. Therefore, we have implemented special mechanisms to filter out such questions and provide appropriate responses to steer the conversation back to the intended topic.

Overall, our focus on engineering implementation and management ensures that the chatbot not only delivers accurate and relevant responses but also provides a seamless user experience.

Tutorial

Tools

We will be using three tools in this tutorial:

OpenAI GPT-3, specifically the new ChatGPT API (gpt-3.5-turbo). Not because this model is any better than other models, but because it is cheaper ($0.002 / 1K tokens) and good enough for this use case.
Chroma, the AI-native open-source embedding database (i.e., vector search engine). Chroma is an easy-to-use vector database when used in conjunction with LangChain; otherwise, it’s kind of unusable. If you want to deploy these types of applications in production, I recommend using Elasticsearch because it has wide adoption and has been around for years. Not because Elasticsearch is better than competitors, but because not many organizations like to add a new technology stack*.*
LangChain, is a library that aims to assist developers in building applications that use Large Language Models (LLMs) by allowing them to integrate these models with other sources of computation or knowledge.

Data

We will be using the data from Project Gutenberg’s “Romeo and Juliet by William Shakespeare”, which consists of 55,985 tokens. This makes it a nicely sized dataset.

Python code

Installation of packages:

$ writefile requirements.txt
openai
chromadb
langchain
tiktoken

$ pip install -r requirements.txt

Note: administrator privileges may be required to install these packages.

Import Python Packages

import os
import platform

import openai
import chromadb
import langchain

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import TokenTextSplitter
from langchain.llms import OpenAI
from langchain.chains import ChatVectorDBChain
from langchain.document_loaders import GutenbergLoader

print('Python: ', platform.python_version())

Mount Google Drive on Colab

from google.colab import drive
drive.mount('/content/drive')

OpenAI API Key

os.environ["OPENAI_API_KEY"] = 'your openai api key'

Configure Chroma

Chroma uses both of my favorite technologies for their backend — DuckDB and Apache Parquet — but by default, it uses an in-memory database. This is fine for this tutorial, but I want to give you the option of storing the database file on a disk so you can reuse the database without paying for embedding it every single time.

persist_directory="/content/drive/My Drive/Colab Notebooks/chroma/romeo"

Convert Document to Embedding

Convert the document, i.e., the book, to vector embedding and store it in a vector search engine, i.e., a vector database.

def get_gutenberg(url):
    loader = GutenbergLoader(url)
    data = loader.load()
    return data

romeoandjuliet_data = get_gutenberg('https://www.gutenberg.org/cache/epub/1513/pg1513.txt')

text_splitter = TokenTextSplitter(chunk_size=1000, chunk_overlap=0)
romeoandjuliet_doc = text_splitter.split_documents(romeoandjuliet_data)

embeddings = OpenAIEmbeddings()
vectordb = Chroma.from_documents(romeoandjuliet_doc, embeddings, persist_directory=persist_directory)
vectordb.persist()

The first step is a bit self-explanatory, but it involves using ‘from langchain.document_loaders import GutenbergLoader’ to load a book from Project Gutenberg.
The second step is more involved. To obtain an embedding, we need to send the text string, i.e., the book, to OpenAI’s embeddings API endpoint along with a choice of embedding model ID, e.g., text-embedding-ada-002. The response will contain an embedding. However, since the book consists of 55,985 tokens and the token limit for the text-embedding-ada-002 model is 2,048 tokens, we use the ‘text_splitter’ utility (from ‘langchain.text_splitter import TokenTextSplitter’) to split the book into manageable 1,000-token chunks. The following is an illustration of a sample embedding response from OpenAI. If you’re wondering, the pricing for the embedding model is $0.0004 / 1K tokens.
The third step is pretty straightforward: we store the embedding in Chroma, our vector search engine, and persist it on a file system.

Configure LangChain QA

To configure LangChain QA with Chroma, use the OpenAI GPT-3 model (model_name='gpt-3.5-turbo') and ensure that the response includes the intermediary step of a result from a vector search engine, i.e., Chroma (set return_source_documents=True).

romeoandjuliet_qa = ChatVectorDBChain.from_llm(OpenAI(temperature=0, model_name="gpt-3.5-turbo"), vectordb, return_source_documents=True)

Questions & Answers with “Romeo and Juliet” Book

Generating questions and answers from the book is a straightforward process. To assess the accuracy of the results, I will be comparing the answers with those from SparkNotes.

SparkNotes editors. “Romeo and Juliet” SparkNotes.com, SparkNotes LLC, 2005

I hope you have enjoyed this simple tutorial. If you have any questions or comments, please provide them here.

Resources

Modules of LangChain

There are several main modules that LangChain provides support for. For each module we provide some examples to get started, how-to guides, reference docs, and conceptual guides. These modules are, in increasing order of complexity:

Models: LLMs, Chat models, Text embedding models
Prompts: LLM prompts templates, Chat prompt templates, Example selectors, Output Parers
Indexes: The primary index and retrieval types supported by LangChain are currently centered around vector databases, and therefore a lot of the functionality we dive deep on those topics.
Memory: LangChain provides a standard interface for memory, a collection of memory implementations, and examples of chains/agents that use memory.
Chains: Chains go beyond just a single LLM call, and are sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications.
Agents: Tools, Agents, Toolkits, Agent exucutor

zhoule / chat-with-document-s-using-chatgpt-api-and-text-embedding Goto Github PK