Giter Site home page Giter Site logo

cognitext's Introduction

Todo

  • Create Folder structure
  • Install packages
  • Create Jupyter Notebook

Project: Textbook Content Extraction, Indexing, and Retrieval

Task 1: Textbook Selection and Content Extraction

  • Select three textbooks with more than 300 pages each.

  • Extract all relevant text content from the selected textbooks.

Task 2: Hierarchical Tree-based Indexing

  • Analyze textbook structures to identify hierarchical organization (chapters, sections, subsections).

  • Create a hierarchical tree-based index for each textbook:

    • Root node represents the entire textbook.
    • Intermediate nodes represent chapters, sections, subsections.
    • Leaf nodes contain text content chunks.
  • Assign unique identifiers to nodes and establish parent-child relationships.

  • Store the hierarchical tree-based index in a suitable data structure or database.

Task 3: Retrieval Techniques

  • Implement query expansion techniques (synonym expansion, stemming, external knowledge bases).

  • Combine BM25 and BERT/bi-encoder based retrieval methods (DPR, SPIDER).

  • Experiment with different retrieval strategies and evaluate effectiveness.

  • Re-rank retrieved data based on relevance and similarity to the query.

Task 4: Multi-document/Topic/Section-based RAG

  • Develop a Retrieval Augmented Generation (RAG) system:

    • Identify relevant documents using implemented retrieval techniques.
    • Traverse hierarchical tree-based indexes to retrieve pertinent sections.
    • Generate informative and coherent responses to user queries.
  • Implement retrieval and ranking algorithms for selecting relevant content.

Task 5: Question Answering

  • Pass retrieved content from RAG system to an LLM for question answering.

  • Generate accurate and relevant answers based on retrieved content.

Task 6: User Interface (Optional)

  • Create a user interface using Streamlit or Gradio.

  • Allow users to input queries and display retrieved answers with relevant details.

  • Write scripts (Backend)

  • Create a fastAPI backend

  • NextJS frontend

  • Create Extensive README

  • Dockerize

  • Create Extensive README

cognitext's People

Contributors

zmusaddique avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.