- This repo is my personal implementation of Langchain's sample implementation of a chatbot on their training documents.
- Originally, this was an implementation of a chatbot specifically focused on question answering over the LangChain documentation.
- Make this work over a few other documentaitons
- Set this up to work over a text corpus that I manually create
- Weaviate: Vector database
- OpenAI: Embeddings
There are two components: ingestion and question-answering.
Ingestion has the following steps:
- Pull html from documentation site
- Parse html with BeautifulSoup
- Split documents with LangChain's TextSplitter
- Create a vectorstore of embeddings, using LangChain's vectorstore wrapper (with OpenAI's embeddings and Weaviate's vectorstore).
Question-Answering has the following steps:
- Given the chat history and new user input, determine what a standalone question would be (using GPT-3).
- Given that standalone question, look up relevant documents from the vectorstore.
- Pass the standalone question and relevant documents to GPT-3 to generate a final answer.