DocQA is an interactive web application built using Streamlit, designed to provide question-answering capabilities on uploaded documents. It utilizes pre-trained language models from Hugging Face's Transformers library to extract answers from PDF documents.
Details about the fine-tuned BigBird model can be found here
- Model Selection: Allows to copy any Extractive QA model from Hugging Face link.
- Document Upload: Users can upload PDF documents to the system.
- Interactive Q&A: Users can ask questions and receive answers based on the content of the uploaded document.
- Highlighted Answers: The application highlights answers directly in the uploaded document for better context.
A demo of the system functionalities can be found here
-
Clone the Repository
git clone https://github.com/najjarfred/DocQA.git cd DocQA
-
Install Dependencies
Ensure you have Python 3.6+ installed, then run:
pip install -r requirements.txt
This will install Transfomers, Pinecone, Streamlit, PyPDF2, Pandas, and other necessary libraries.
To run the application:
streamlit run app.py
Navigate to the local URL provided by Streamlit, typically http://localhost:8501
.
- Upload a PDF: Start by uploading a PDF file.
- Select a Model: Choose a question-answering model from the sidebar.
- Ask Questions: Type your question into the input box.
- View Answers: The application processes the document and displays the answer, highlighting the relevant section in the document.
app.py
: Main Streamlit application script.document_uploader.py
: Handles document uploading and processing.qa_system.py
: Contains the logic for the question-answering system using Hugging Face models.requirements.txt
: Lists all the Python libraries that the project depends on.style.css
: Contains custom CSS for styling the Streamlit app.
Contributions to DocQA are welcome! Please read our Contributing Guide for details on how to contribute.
- BigBird for the pre-trained QA model.
- Streamlit for the web framework.
- Hugging Face's Transformers for pre-trained models.
- PyPDF2 for handling PDF files.