cognitext's Introduction

Todo

Create Folder structure
Install packages
Create Jupyter Notebook

Project: Textbook Content Extraction, Indexing, and Retrieval

Task 1: Textbook Selection and Content Extraction

Select three textbooks with more than 300 pages each.
Extract all relevant text content from the selected textbooks.

Task 2: Hierarchical Tree-based Indexing

Analyze textbook structures to identify hierarchical organization (chapters, sections, subsections).
Create a hierarchical tree-based index for each textbook:
- Root node represents the entire textbook.
- Intermediate nodes represent chapters, sections, subsections.
- Leaf nodes contain text content chunks.
Assign unique identifiers to nodes and establish parent-child relationships.
Store the hierarchical tree-based index in a suitable data structure or database.

Task 3: Retrieval Techniques

Implement query expansion techniques (synonym expansion, stemming, external knowledge bases).
Combine BM25 and BERT/bi-encoder based retrieval methods (DPR, SPIDER).
Experiment with different retrieval strategies and evaluate effectiveness.
Re-rank retrieved data based on relevance and similarity to the query.

Task 4: Multi-document/Topic/Section-based RAG

Develop a Retrieval Augmented Generation (RAG) system:
- Identify relevant documents using implemented retrieval techniques.
- Traverse hierarchical tree-based indexes to retrieve pertinent sections.
- Generate informative and coherent responses to user queries.
Implement retrieval and ranking algorithms for selecting relevant content.

Task 5: Question Answering

Pass retrieved content from RAG system to an LLM for question answering.
Generate accurate and relevant answers based on retrieved content.

Task 6: User Interface (Optional)

Create a user interface using Streamlit or Gradio.
Allow users to input queries and display retrieved answers with relevant details.
Write scripts (Backend)
Create a fastAPI backend
NextJS frontend
Create Extensive README
Dockerize
Create Extensive README

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

zmusaddique / cognitext Goto Github PK

cognitext's Introduction

Todo

Project: Textbook Content Extraction, Indexing, and Retrieval

Task 1: Textbook Selection and Content Extraction

Task 2: Hierarchical Tree-based Indexing

Task 3: Retrieval Techniques

Task 4: Multi-document/Topic/Section-based RAG

Task 5: Question Answering

Task 6: User Interface (Optional)

cognitext's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent