intellabs / fastrag Goto Github PK

Efficient Retrieval Augmentation and Generation Framework

License: Apache License 2.0

Python 99.86% CSS 0.11% Shell 0.03%

nlp benchmark colbert information-retrieval semantic-search sentence-transformers summarization transformers diffusion knowledge-graph

fastrag's Introduction

Build and explore efficient retrieval-augmented generative models and applications

📍 Installation • 🚀 Components • 📚 Examples • 🚗 Getting Started • 💊 Demos • ✏️ Scripts • 📊 Benchmarks

fastRAG is a research framework for efficient and optimized retrieval augmented generative pipelines, incorporating state-of-the-art LLMs and Information Retrieval. fastRAG is designed to empower researchers and developers with a comprehensive tool-set for advancing retrieval augmented generation.

Comments, suggestions, issues and pull-requests are welcomed! ❤️

📣 Updates

2023-12: Gaudi2, ONNX runtime and LlamaCPP support; Optimized Embedding models; Multi-modality and Chat demos; REPLUG text generation.
2023-06: ColBERT index modification: adding/removing documents; see IndexUpdater.
2023-05: RAG with LLM and dynamic prompt synthesis example.
2023-04: Qdrant DocumentStore support.

Key Features

Optimized RAG: Build RAG pipelines with SOTA efficient components for greater compute efficiency.
Optimized for Intel Hardware: Leverage Intel extensions for PyTorch (IPEX), 🤗 Optimum Intel and 🤗 Optimum-Habana for running as optimal as possible on Intel® Xeon® Processors and Intel® Gaudi® AI accelerators.
Customizable: fastRAG is built using Haystack and HuggingFace. All of fastRAG's components are 100% Haystack compatible.

🚀 Components

For a brief overview of the various unique components in fastRAG refer to the Components Overview page.

*LLM Backends*
Intel Gaudi Accelerators	Running LLMs on Gaudi 2
ONNX Runtime	Running LLMs with optimized ONNX-runtime
Llama-CPP	Running RAG Pipelines with LLMs on a Llama CPP backend
*Optimized Components*
Embedders	Optimized int8 bi-encoders
Rankers	Optimized/sparse cross-encoders
*RAG-efficient Components*
ColBERT	Token-based late interaction
Fusion-in-Decoder (FiD)	Generative multi-document encoder-decoder
REPLUG	Improved multi-document decoder
PLAID	Incredibly efficient indexing engine

📍 Installation

Preliminary requirements:

Python 3.8 or higher.
PyTorch 2.0 or higher.

To set up the software, clone the project and run the following, preferably in a newly created virtual environment:

pip install .

There are several dependencies to consider, depending on your specific usage:

# Additional engines/components
pip install .[intel]               # Intel optimized backend [Optimum-intel, IPEX]
pip install .[elastic]             # Support for ElasticSearch store
pip install .[qdrant]              # Support for Qdrant store
pip install .[colbert]             # Support for ColBERT+PLAID; requires FAISS
pip install .[faiss-cpu]           # CPU-based Faiss library
pip install .[faiss-gpu]           # GPU-based Faiss library
pip install .[knowledge_graph]     # Libraries for working with spacy and KG

# User interface (for demos)
pip install .[ui]

# Benchmarking
pip install .[benchmark]

# Development tools
pip install .[dev]

License

The code is licensed under the Apache 2.0 License.

Disclaimer

This is not an official Intel product.

fastrag's People

Contributors

Stargazers

Watchers

Forkers

moonisali tuanacelik cheesama bsc-health-data conceptofmind haoyusoong singularityus rahulunair rogervaas a98zhang ericschuma cabelo xuhuiren toanparadox amalzubidat mailmahee erdal-pb techthiyanes brunoscaglione parsamorsal apollohuang1 raphak mdeora nuwancw sayan8759 alok-joshi abeusher rodrigomasiniai krishna999 bharathreddyinsightaiq mame0521 sandy4321 ai-mou mvandermeulen salrowili tuhinmallick ogogte johncleveland sohaib0399 bkatwal startime-h aiarpi atom153 entelecheia entelecheia-archives adambear kai-hubs shpetimhaxhiu codeaudit potgie polya20 xiechengmude wuguo5982 guanbaoy mihaiii jimfhahn heimafeitian nitin-mane ego sablokgaurav larrykwok dkarthicks27 epeshared d9j

fastrag's Issues

Cannot run the examples, haystack issues

Can your examples include the haystack installation lines (with compatible versions)? I get errors on the newest farm-haystack and haystack-ai packages. When I downgraded to farm-haystack==1.17.2 (saw in another issue), then basic haystack 'getting started' code doesn't work. Are these projects synced?

Tried to run the notebook 'client_inference_with_Llama_cpp.ipynb'. Got the following error:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[1], line 2
      1 from haystack import Pipeline
----> 2 from haystack.nodes.prompt import PromptNode
      3 from haystack.nodes import PromptModel
      4 from haystack.nodes.prompt.prompt_template import PromptTemplate

File ~/miniconda3/envs/jupyter/lib/python3.10/site-packages/haystack/nodes/__init__.py:1
----> 1 from haystack.nodes.base import BaseComponent
      3 from haystack.nodes.document_classifier import BaseDocumentClassifier, TransformersDocumentClassifier
      4 from haystack.nodes.extractor import EntityExtractor, simplify_ner_for_qa

File ~/miniconda3/envs/jupyter/lib/python3.10/site-packages/haystack/nodes/base.py:11
      8 import logging
     10 from haystack.schema import Document, MultiLabel
---> 11 from haystack.errors import PipelineSchemaError
     12 from haystack.utils.reflection import args_to_kwargs
     15 logger = logging.getLogger(__name__)

ImportError: cannot import name 'PipelineSchemaError' from 'haystack.errors' (/root/miniconda3/envs/jupyter/lib/python3.10/site-packages/haystack/errors.py)

ImportError: cannot import name 'Seq2SeqGenerator' from 'haystack.nodes'

Problem
Hi, I am trying to reproduce the knowledge graph example you provided here

but I am getting the following error:

File "xxx/haystack1.py", line 6, in
from fastrag.readers import FiDReader
File "xxx/lib/python3.8/site-packages/fastrag/init.py", line 4, in
from fastrag import image_generators, kg_creators, rankers, readers, retrievers, stores
File "xxx/lib/python3.8/site-packages/fastrag/readers/init.py", line 6, in
from fastrag.readers.FiD import FiDReader
File "xxx/lib/python3.8/site-packages/fastrag/readers/FiD.py", line 8, in
from haystack.nodes import Seq2SeqGenerator
ImportError: cannot import name 'Seq2SeqGenerator' from 'haystack.nodes' (xxx/haystack/haystack/nodes/init.py)
/usr/lib/python3.8/tempfile.py:957: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpuj6o9jw1'>
_warnings.warn(warn_message, ResourceWarning)

Steps to reproduce
git clone https://github.com/deepset-ai/haystack.git
cd haystack
pip install --upgrade pip
pip install -e '.[all-gpu]'

pip install git+https://github.com/IntelLabs/fastRAG.git

HW specifications:

OS: Ubuntu 20.04.6 LTS
Python version: 3.8.10

Fine tuned Embedding model

Hello,
I just discovered this repo. I was wondering if its possible to plug in custom fine tuned embedding models.

running demo.py raises a connection error

I am trying to get the demo to work, but getting the following error

installed all required dependencies, and executed: python run_demo.py -t QA1
following it the command shell output

Creating services...


Server on  localhost:8000/docs   PID=63217
UI on      localhost:8501        PID=61593
62284
61950
61751

FiDReader Local Model

Hey, I noticed that in your examples that you ask for the user to add a local path for a trained fid model but as far as I can tell, we can just provide a HF model name and it works. At least, I got the knowledge graph notebook to work like that. So maybe there's no need for the assertion. This is what I did:

from fastrag.readers import FiDReader

fid_model_path = None  ## change this to the local FID model
#assert fid_model_path is not None, "Please change fid_model_path to the path of your trained FiD model"

reader = FiDReader(
    input_converter_tokenizer_max_len= 256,
    model_name_or_path="Intel/fid_flan_t5_base_nq", 
    num_beams=1, 
    min_length=2, 
    max_length=100, 
    use_gpu=False
)

Let me know if I'm missing something :) - really cool project btw! It's great to see some custom Haystack nodes 👋

Add fastRAG to our index of 'integrations'

Hey maintainers. I'm trying to curate an index of Haystack integrations and projects (which for now I've simply named 'integrations'). These are custom nodes, document stores, pipelines and so on that can be used with Haystack, or built with Haystack.
I've created a PR adding this repo to that index too: deepset-ai/haystack-integrations#3 and was wondering if any of you would like to take a look and let me know if you're ok having this added there? The idea would be to render the markdown page somewhere on our website later.

Finetuning example

it would be greatly beneficial if there is an example to demonstrate how to use the fine tuning script with a custom proprietary data set.
Thanks in advance.

How to use this repo on other languages?

I want to use passages in different languages, is it possible?

Load performance: latency/RPS

Hi,

Have you load tested fastRAG pipeline with Colbert, PLAID and FiD on CPU instance?

Could you provide examples of latency, RPS relatively to CPU instance characteristics?

Document Lister not imported

In GPT as both retriever and ranker, there is a line of code:

from fastrag.prompters.document_shapers.document_lister import DocumentLister

However if we look at fastrag/prompters/document_shapers/__init__.py the file is empty, the documentLister is not imported.

Not working on large list of documents.

Although the example you've provided "simple_odqa_pipeline.ipynb' works pretty fine on the example list you've provided. I changed that list to a list of 23 elements, each with a pretty long string. It's taking over 6.5 minutes to generate the result.

It's infeasible at runtime to incorporate this overhead. Any solution for this? Or did you try that?

Add updates to `PLAIDDocumentStore`

See here: stanford-futuredata/ColBERT#111, https://github.com/stanford-futuredata/ColBERT/blob/main/colbert/index_updater.py for the pathway

Confusions about REPLUG

How to train the retriever of REPLUG? How to update the embeddings of query and documents?
In your replug_parallel_reader.ipynb, why directly import the default BM25Retriever rather than the trained Retriever？
What does PromptModel do? What is its function? what does ReplugHFLocalInvocationLayer do? It seems this part is not mentioned in the paper.

Thanks for your help!

Support for Haystack v2

Hi,

is there, in the roadmap, a plan to support/migrate to Haystack v2.0?

Multilingual RAG

Hi,

Thanks for this great repo!

Is there any way to use this pipeline in multilingual settings?

Are there multilingual version of Colbert, PLAID and FiD? Else, how would you recommend to proceed?

Is this repo still active?

Wondering if this repo is being actively worked upon? Any future roadmap which can be shared with the community can be very helpful.
Thanks in advance,
Karrtik

Add Chroma

Hi there - is there any desire to add support for Chroma?

I need your help creating an example with fastRAG and MiniAutoGen: Lightweight and Flexible Agents for Multi-Agent Chats

🌐 Hello, amazing community!

I'm exploring the integration of two powerful libraries: MiniAutoGen and fastRAG, and I would greatly appreciate your help and insights!

MiniAutoGen is an innovative open-source library designed to take applications with Large Language Models (LLMs) to the next level. Its differentiators are its lightweight and flexible approach, which allows for a high degree of customization.

Here are some notable features of MiniAutoGen:

Multi-Agent Dialogues: The ability to create complex and nuanced interactions with multiple intelligent agents operating together.
Agent Coordination: A mechanism that ensures harmony and efficient management among the agents.
Customizable Agents: Total freedom to shape agent behaviors according to project needs.
Action Pipeline: Simplifies and automates agent operations, facilitating scalability and maintenance.
Integration with +100 LLMs: Expanding conversational capabilities with over 100 LLMs for intelligent and contextualized responses.

My Challenge: I'm seeking help from the community to develop new integrations and modules.

I Seek Your Help: Do you have examples, tips, or guidance on how I can accomplish this integration? Any insight or shared experience would be extremely valuable!

Check out MiniAutoGen on Google Colab: MiniAutoGen on Google Colab
And here is the GitHub repository for more information: GitHub - brunocapelao/miniAutoGen

I'm looking forward to your ideas and suggestions. Let's shape the future of AI conversations together! 🌟

Missing libs directory

Due to libs deletion the link to IndexUpdater from README doesn't work.

Issues Running Different LLM Models on examples/rag_with_quantized_llm.ipynb

Hello,

I'm relatively new to working with Large Language Models (LLM) and am reaching out through the issues tab as I couldn't find a discussions section. I'm currently exploring the Intel/fastRAG repository to learn more about the implementation of RAG models with quantized LLMs, and I have encountered some challenges that I hope to get guidance on. I've been trying to run the examples/rag_with_quantized_llm.ipynb notebook on a GCP server (c3-standard-8 instance with 8 vCPUs and 32 GB memory, running Ubuntu 22.04)

I've successfully run the example using the facebook/opt-iml-max-1.3b model specified in the notebook. However, when attempting to experiment with other models, specifically open-research/openlm-research/open_llama_3b and openlm-research/open_llama_7b, I've encountered some challenges:

With the open_llama_3b model, the process gets stuck at the "Quantizing" step without progressing further.
Attempting to use the open_llama_7b model results in the process being killed immediately after the "Saving external data to one file..." message. This is surprising, especially considering the relatively small model size.
Given my limited experience, I'm reaching out for some guidance. I'm curious if there are minimum hardware requirements for each model size or specific quantizing precision that I might not be aware of. Any insights or suggestions on how to successfully run these models, or adjustments to my setup that could help, would be greatly appreciated.

Thank you for your time and assistance.

REPLUG compute the LM likelihood

''We use GPT-3 Curie (Brown et al., 2020b) as the supervision LM to compute the LM likelihood.''

How to get or estimate the probability of black-box GPT-3?