Giter Site home page Giter Site logo

aiplanet's Introduction

Full-Stack PDF NLP Application

Overview

This project is a full-stack application that allows users to upload PDF documents and ask questions regarding the content of these documents. The backend processes these documents and utilizes natural language processing (NLP) to provide answers to the questions posed by the users.

Table of Contents

  1. Prerequisites
  2. Installation
  3. Setup
  4. Usage
  5. API Documentation
  6. Application Architecture
  7. Demo

Prerequisites

  • Python 3.9+: Required for the backend.
  • Node.js 14+: Required for the frontend.
  • PostgreSQL or SQLite: Required if using a database for storing document metadata.
  • Docker: Optional, for containerized deployment.

Installation

Backend

Structure

  1. Clone the repository:

    git clone https://github.com/atulyadav745/aiplanet.git
    cd aiplanet/backend
  2. Create and activate a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
  3. Install the required packages:

    pip install -r requirements.txt

Frontend

  1. Navigate to the frontend directory:

    cd ../frontend
  2. Install the dependencies:

    npm install

Setup

Backend Configuration

  1. Environment Variables: Create a .env file in the backend directory with the following:

    DATABASE_URL=sqlite:///./test.db  # or your PostgreSQL connection string
    STORAGE_PATH=./uploads
  2. Initialize the Database:

    python main.py db init
  3. Run the Backend:

    uvicorn main:app --reload

Frontend Configuration

  1. Run the Frontend:

    npm start

Usage

Uploading PDFs

  1. Navigate to the home page.
  2. Click on the "Upload PDF" button.
  3. Select and upload a PDF document.

Asking Questions

  1. After uploading a PDF, go to the question input section.
  2. Enter your question in the input field and submit.
  3. View the answer below the question field.

API Documentation

Endpoints

Upload PDF

  • URL: /upload

  • Method: POST

  • Description: Uploads a PDF document.

  • Request:

    • Headers: Content-Type: multipart/form-data
    • Body: file: the PDF file.
  • Response:

    • 200: Success, returns document metadata.
    • 400: Error, returns error details.

Ask Question

  • URL: /ask

  • Method: POST

  • Description: Receives a question and returns an answer based on the uploaded PDF.

  • Request:

    • Headers: Content-Type: application/json
    • Body: { "doc_id": "<document_id>", "question": "<user_question>" }
  • Response:

    • 200: Success, returns the answer.
    • 400: Error, returns error details.

Application Architecture

You can view Application Architecture here.

Backend

  • Framework: FastAPI
  • NLP: LangChain for processing questions and generating answers.
  • PDF Processing: PyMuPDF for extracting text from PDFs.
  • Data Management: SQLite/PostgreSQL for metadata storage.
  • File Storage: Local filesystem or AWS S3 for PDF storage.

Frontend

  • Framework: React.js
  • State Management: Context API.
  • HTTP Client: Axios for API requests.
  • Styling: CSS Modules or styled-components.

Data Flow

  1. PDF Upload: The frontend sends the PDF file to the backend.
  2. PDF Processing: The backend extracts text and stores metadata.
  3. Question Processing: The frontend sends questions, which the backend processes and responds to with answers.

Demo

You can view a live demo here.

Alternatively, watch a screencast here.


If you have any questions or need further assistance, feel free to open an issue.


Happy coding! ๐ŸŽ‰

aiplanet's People

Contributors

atulyadav745 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.