The fastrag from toanparadox

Build and explore efficient retrieval-augmented generative models and applications

Key Features • Installation • Components • Examples • How To Use • Benchmarks

fastRAG is a research framework designed to facilitate the building of retrieval augmented generative pipelines. Its main goal is to make retrieval augmented generation as efficient as possible through the use of state-of-the-art and efficient retrieval and generative models. The framework includes a variety of sparse and dense retrieval models, as well as different extractive and generative information processing models. fastRAG aims to provide researchers and developers with a comprehensive tool-set for exploring and advancing the field of retrieval augmented generation.

🎩 Key Features

Retrieval Augmented X: A framework for developing efficient and fast retrieval augmented generative applications using the latest transformer-based NLP models (but not only).
Optimized Models: Includes optimized models of supported pipelines with greater compute efficiency.
Intel Optimizations (TBA): Leverage the latest optimizations developed by Intel for running pipelines with maximum hardware utilization, reduced latency, and increased throughput, using frameworks such as Intel extensions for PyTorch (IPEX) and Intel extension for Transformers.
Customizable: Built using Haystack and HuggingFace. All of fastRAG's components are 100% Haystack compatible.

📍 Installation

Preliminary requirements:

Python 3.8+
PyTorch

In a new virtual environment, run:

pip install .

There are various dependencies, based on usage:

# Additional engines/components
pip install .[faiss-cpu]           # CPU-based Faiss
pip install .[faiss-gpu]           # GPU-based Faiss
pip install .[qdrant]              # Qdrant support
pip install libs/colbert           # ColBERT/PLAID indexing engine
pip install .[image-generation]    # Stable diffusion library
pip install .[knowledge_graph]     # spacy and KG libraries

# REST API + UI
pip install .[ui]

# Benchmarking
pip install .[benchmark]

# Dev tools
pip install .[dev]

📚 Components

For a short overview of the different models see Models Overview.

Unique components in fastRAG:

PLAID - An extremely efficient engine for late interaction retrieval.
ColBERT - A Retriever (used with PLAID) and re-ranker (used with dense embeddings) utilizing late interaction for relevancy scoring.
Fusion-in-Decoder (FiD) - A generative reader for multi-document retrieval augmented tasks.
Stable Diffusion Generator - A text-to-image generator. Pluggable to any pipeline output.
Retrieval-Oriented Knowledge Graph Construction - A pipeline component for extracting named-entities and creating a graph of all the entities specified in the retrieved documents, with the relations between each pair of related entities.

Addition components:

Retrieval Augmented Summarization with T5 family models (such as LongT5, FLAN-T5) - An encoder-decoder model based on T5 with support for long input, supporting summarization/translation prompts.

🚀 Example Use Cases

Efficient Open Domain Question-Answering

Generate answers to questions answerable by using a corpus of knowledge.

Retrieval with fast lexical retrieval with BM25 or late-interaction dense retrieval with PLAID
Ranking with Sentence Transformers or ColBERT
Generation with Fusion-in-Decoder

flowchart LR
    id1[(Elastic<br>/PLAID)] <--> id2(BM25<br>/ColBERT) --> id3(ST<br>/ColBERT) --> id4(FiD)
    style id1 fill:#E1D5E7,stroke:#9673A6
    style id2 fill:#DAE8FC,stroke:#6C8EBF
    style id4 fill:#D5E8D4,stroke:#82B366

📓 Simple generative open-domain QA with BM25 and ST
📓 Efficient and fast ODQA with PLAID, ColBERT and FiD

ChatGPT Open Domain Reranking and QA

Use ChatGPT API to both rerank the documents for any query, and provide an answer to the query using the chosen documents.

flowchart LR
    id2(.. Retrieval pipeline ..) --> id4(ChatGPT)
    style id2 fill:#DAE8FC,stroke:#6C8EBF
    style id4 fill:#D5E8D4,stroke:#82B366

📓 GPT as both Reranker and Reader

Open Domain Summarization

Summarize topics given free-text input and a corpus of knowledge. Retrieval with BM25 or other retrievers
Ranking with Sentence Transformers or other rankers
Generation Using "summarize: " prompt, all documents concatenated and FLAN-T5 generative model

flowchart LR
    id1[(Elastic)] <--> id2(BM25) --> id3(SentenceTransformer) -- summarize--> id4(FLAN-T5)
    style id1 fill:#E1D5E7,stroke:#9673A6
    style id2 fill:#DAE8FC,stroke:#6C8EBF
    style id4 fill:#D5E8D4,stroke:#82B366

📓 Open Domain Summarization

Retrieval-Oriented Knowledge Graph Construction

Use with any retrieval pipeline to extract Named Entities (NER) and generate relation-maps using Relation Classification Model (RC).

flowchart LR
    id2(.. Retrieval pipeline ..) --> id4(NER) --> id5(RC)
    style id2 fill:#DAE8FC,stroke:#6C8EBF
    style id4 fill:#D5E8D4,stroke:#82B366
    style id5 fill:#F3CECC,stroke:#B25450

📓 Knowledge Graph Construction

Retrieval-Oriented Answer Image Generation

Use with any retrieval pipeline to generate a dynamic image from the answer to the query, using a diffusion model.

flowchart LR
    id2(.. Retrieval pipeline ..) --> id4(FiD) --> id5(Diffusion)
    style id2 fill:#DAE8FC,stroke:#6C8EBF
    style id4 fill:#D5E8D4,stroke:#82B366
    style id5 fill:#F3CECC,stroke:#B25450

📓 Answer Image Generation

🏃 How to Use

fastRAG has a modular architecture that enables the user to build retrieval-augmented pipelines with different components. The components are python classes that take a set of parameters. We provide multiple examples of sets of parameters used to build common pipelines; the parameters are organized in YAML files in folders such as store, retriever and reader, all under the Configuration folder.

Pipeline Configuration Generation

The pipeline is built using Haystack pipeline API and is built dynamically according to the components the user is interested in. Use the Pipeline Generation script to generate a Haystack pipeline which can be run by the stand-alone REST server as a service, see REST API.

Here is an example of using the script to generate a pipeline with a ColBERT retriever, an SBERT reranker and an FiD reader:

python generate_pipeline.py --path "retriever,reranker,reader"  \
       --store config/store/plaid-wiki.yaml                     \
       --retriever config/retriever/colbert-v2.yaml             \
       --reranker config/reranker/sbert.yaml                    \
       --reader config/reader/FiD.yaml                          \
       --file pipeline.yaml

⚠️ PLAID Requirements ⚠️

If GPU is needed it should be of type RTX 3090 or newer and PyTorch should be installed with CUDA support using:
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

Running Pipelines

Pipelines can be run inline (code, service, notebook) once initialized properly. For a concrete example see this notebook.

Standalone UI Demos

See Demo for a script creating stand alone demos for several workflows; the script creates a REST service and a UI service, ready to be used. Continue reading for more details on these services.

Serve a pipeline via a REST service

One can start a REST server with a defined pipeline YAML and send queries for processing or benchmarking. A pipeline is generated according to Pipeline Generation step; see Usage.

Run the following:

python -m fastrag.rest_api.application --config=pipeline.yaml

This will start a uvicorn server and build a pipeline as defined in the YAML file.

There is support for Swagger. One can observe and interact with endpoints in a simple UI by vising http://localhost:8000/docs (might need to forward ports locally, if working on a cluster).

These are the following endpoint:

status: sanity.
version: project version, as defined in __init__.
query: a general query, used for debugging.

Run a demo UI

Define the endpoint address according to where the web server is; e.g. localhost if you start the web server on the same machine; and run the following:

API_ENDPOINT=http://localhost:8000 \
             python -m streamlit run fastrag/ui/webapp.py

Creating Indexes

See Indexing Scripts for information about how to create different types of indexes.

Pre-training/Fine-tuning Models

We offer an array of training scripts, to finetune models of your choice for various usecases. See Models Overview for examples, model descriptions, and more.

📈 Benchmarks

Benchmarks scripts and results can be found here: Benchmarks.

License

The code is licensed under the Apache 2.0 License.

Disclaimer

This is not an official Intel product.

toanparadox / fastrag Goto Github PK

fastrag's Introduction