Giter Site home page Giter Site logo

docs-qa-playground's Introduction

Docs QA Playground

This repository helps to setup a production-grade RAG workflow with the help of Truefoundry.

Architecture

To deploy the complete workflow, we need to set up various components. Here's an overview of the architecture:

Document Store:

The Document Store is where your documents will be stored. Common options include AWS S3, Google Storage Buckets, or Azure Blob Storage. In some cases, data might come in from APIs, such as Confluence docs.

Indexer Job:

The Indexer Job takes the documents as input, splits them into chunks, calls the embedding model to embed the chunks, and stores the vectors in the VectorDB. The embedding model can be loaded in the job itself or accessed via an API to ensure scalability.

Embedding Model:

If you're using OpenAI or an externally hosted model, you don't need to host a model. However, if you opt for an open-source model, you'll have to deploy it in your cloud environment.

LLM Model:

For OpenAI or hosted model APIs like Cohere and Anthropic, there's no need for additional deployment. Otherwise, you'll need to set up an open-source LLM.

Query Service:

A FastAPI service provides an API to list all indexed document collections and allows users to query over these collections. It also supports triggering new indexing jobs for additional document collections.

VectorDB:

You can use a hosted solution like PineCone or host an open-source VectorDB like Qdrant or Milvus to efficiently retrieve similar document chunks.

Metadata Store:

This store is essential for managing links to indexed documents and storing the configuration used to embed the chunks in those documents.

Setup with Truefoundry:

Truefoundry, a Kubernetes-based platform, simplifies the deployment of ML training jobs and services at an optimal cost. You can deploy all the components mentioned above on your own cloud account using Truefoundry. The final deployment will be a streamlined and powerful system ready to handle your question-answering needs.

Deploy on TrueFoundry

To be able to Query on your own documents, follow the steps below:

  1. Register at TrueFoundry, follow here

    • Fill up the form and register as an organization (let's say <org_name>)
    • On Submit, you will be redirected to your dashboard endpoint ie https://<org_name>.truefoundry.cloud
    • Complete your email verification
    • Login to the platform at your dashboard endpoint ie. https://<org_name>.truefoundry.cloud

    Note: Keep your dashboard endpoint handy, we will refer it as "TFY_HOST" and it should have structure like "https://<org_name>.truefoundry.cloud"

  2. Setup a cluster, use TrueFoundry managed for quick setup

    • Give a unique name to your Cluster and click on Launch Cluster
    • It will take few minutes to provision a cluster for you
    • On Configure Host Domain section, click Register for the pre-filled IP
    • Next, Add a Docker Registry to push your docker images to.
    • Next, Deploy a Model, you can choose to Skip this step
  3. Add a Storage Integration

  4. Create a ML Repo

    • Navigate to ML Repo tab

    • Click on + New ML Repo button on top-right

    • Give a unique name to your ML Repo (say 'docs-qa-llm')

    • Select Storage Integration

    • On Submit, your ML Repo will be created

      For more details: link

  5. Create a Workspace

    • Navigate to Workspace tab
    • Click on + New Workspace button on top-right
    • Select your Cluster
    • Give a name to your Workspace (say 'docs-qa-llm')
    • Enable ML Repo Access and Add ML Repo Access
    • Select your ML Repo and role as Project Admin
    • On Submit, a new Workspace will be created. You can copy the Workspace FQN by clicking on FQN.

    For more details: link

  6. Generate an API Key

    • Navigate to Settings > API Keys tab

    • Click on Create New API Key

    • Give any name to the API Key

    • On Generate, API Key will be gererated.

    • Please save the value or download it

      Note: we will refer it as "TFY_API_KEY"

      For more details: https://docs.truefoundry.com/docs/generate-api-key

  7. In order to use default OpenAI embedder. Please get an OpenAI API Key. You can get your API Key here

  8. Open your Terminal on parent folder

  9. Create a virtual env (**python >= 3.10 required)

      python3 -m venv ./venv
      source ./venv/bin/activate (for Linux/Mac env)
      source .\venv\Scripts\activate (for Windows env)
    
  10. Install our servicefoundry cli

    pip install servicefoundry
    
  11. Login from cli

    sfy login --host <paste your TFY_HOST here>
    

docs-qa-playground's People

Contributors

akashg3627 avatar innoavator avatar chiragjn avatar dependabot[bot] avatar srihari-tf avatar s1lv3rj1nx avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.