This is a Python application that allows you to chat with multiple PDF documents using natural language. It utilizes OpenAI's API and Facebook's Faiss library for semantic search to provide accurate responses based on the content of the loaded PDFs.
The app follows these key steps:
-
PDF Loading: The user can load multiple PDFs into the application.
-
Text Extraction: The text content is extracted from the PDFs.
-
Embedding Generation: Vector embeddings are generated for the extracted text chunks using OpenAI's text embedding API.
-
Indexing: The text embeddings are indexed using Faiss for efficient similarity search.
-
User Input: The user provides a natural language query via the chat interface.
-
Embedding Lookup: The user query is converted to an embedding vector and compared against the indexed PDF embeddings.
-
Response Generation: The most similar PDF text chunks are retrieved and used to generate a response with OpenAI's text generation API.
It's recommended to run this app in a Python 3.8 virtual environment.
Clone the repository to your local machine.
git clone https://github.com/corymullins/PDF_chatbot.git
Create and activate a virtual environment:
python3 -m venv pdfbot
source pdfbot/bin/activate
Install requirements:
pip install -r requirements.txt
-
Obtain an API key for OpenAI, add it to
.env
-
Run
streamlit run .\app.py
-
Load PDFs using the upload button.
-
Chat with the bot in natural language about the PDF contents.