Giter Site home page Giter Site logo

najjarfred / docqa Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 3.21 MB

Extractive question answering streamlit portal utilising LLM BigBird and HuggingFace for interactive response generation from uploaded PDF documents.

Python 91.74% CSS 8.26%
bigbird capstone-project huggingface llm ml nlp pinecone question-answering

docqa's Introduction

DocQA: Document Question Answering System

DocQA is an interactive web application built using Streamlit, designed to provide question-answering capabilities on uploaded documents. It utilizes pre-trained language models from Hugging Face's Transformers library to extract answers from PDF documents.

Details about the fine-tuned BigBird model can be found here

Features

  • Model Selection: Allows to copy any Extractive QA model from Hugging Face link.
  • Document Upload: Users can upload PDF documents to the system.
  • Interactive Q&A: Users can ask questions and receive answers based on the content of the uploaded document.
  • Highlighted Answers: The application highlights answers directly in the uploaded document for better context.

Demo 📽️

A demo of the system functionalities can be found here

Installation

  1. Clone the Repository

    git clone https://github.com/najjarfred/DocQA.git
    cd DocQA
  2. Install Dependencies

    Ensure you have Python 3.6+ installed, then run:

    pip install -r requirements.txt

    This will install Transfomers, Pinecone, Streamlit, PyPDF2, Pandas, and other necessary libraries.

Usage

To run the application:

streamlit run app.py

Navigate to the local URL provided by Streamlit, typically http://localhost:8501.

How It Works

  • Upload a PDF: Start by uploading a PDF file.
  • Select a Model: Choose a question-answering model from the sidebar.
  • Ask Questions: Type your question into the input box.
  • View Answers: The application processes the document and displays the answer, highlighting the relevant section in the document.

File Structure

  • app.py: Main Streamlit application script.
  • document_uploader.py: Handles document uploading and processing.
  • qa_system.py: Contains the logic for the question-answering system using Hugging Face models.
  • requirements.txt: Lists all the Python libraries that the project depends on.
  • style.css: Contains custom CSS for styling the Streamlit app.

Contributing

Contributions to DocQA are welcome! Please read our Contributing Guide for details on how to contribute.

Acknowledgements

docqa's People

Contributors

najjarfred avatar

Stargazers

Saba Hesaraki avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.