Giter Site home page Giter Site logo

dcarpintero / llamaindexchat Goto Github PK

View Code? Open in Web Editor NEW
26.0 2.0 5.0 12.94 MB

LLM Chatbot w/ Retrieval Augmented Generation using Llamaindex

Home Page: https://llamaindexchat.streamlit.app/

Python 100.00%
large-language-models llamaindex retrieval-augmented-generation python streamlit embedding-vectors rag chunking

llamaindexchat's Introduction

Open_inStreamlit Python CodeFactor License

Chat with ๐Ÿฆ™ LlamaIndex Docs ๐Ÿ—‚๏ธ

Chatbot using LlamaIndex to supplement OpenAI GPT-3.5 Large Language Model (LLM) with the LlamaIndex Documentation. Main features:

  • Transparency and Evaluation: by customizing the metadata field of documents (and nodes), the App is able to provide links to the sources of the responses, along with the author and relevance score of each source node. This ensures the answers can be cross-referenced with the original content to check for accuracy.
  • Estimating Inference Costs: tracks 'LLM Prompt Tokens' and 'LLM Completion Tokens' to help keep inference costs under control.
  • Reducing Costs: persists storage including embedding vectors, and caches the questions / responses to reduce the number of calls to the LLM.
  • Usability: includes suggestions for questions, and basic functionality to clear chat history.

๐Ÿฆ™ What's LlamaIndex?

LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models. [...] It helps in preparing a knowledge base by ingesting data from different sources and formats using data connectors. The data is then represented as documents and nodes, where a node is the atomic unit of data in LlamaIndex. Once the data is ingested, LlamaIndex indexes the data into a format that is easy to retrieve. It uses different indexes such as the VectorStoreIndex, Summary Index, Tree Index, and Keyword Table Index. In the querying stage, LlamaIndex retrieves the most relevant context given a user query and synthesizes a response using a response synthesizer. [Response from our Chatbot to the query 'What's LlamaIndex?']

๐Ÿ“‹ How does it work?

LlamaIndex enriches LLMs (for simplicity, we default the ServiceContext to OpenAI GPT-3.5 which is then used for indexing and querying) with a custom knowledge base through a process called Retrieval Augmented Generation (RAG) that involves the following steps:

  • Connecting to a External Datasource: We use the Github Repository Loader available at LlamaHub (an open-source repository for data loaders) to connect to the Github repository containing the markdown files of the LlamaIndex Docs:
def initialize_github_loader(github_token: str) -> GithubRepositoryReader:
    """Initialize GithubRepositoryReader"""	

    download_loader("GithubRepositoryReader")
    github_client = GithubClient(github_token)

    loader = GithubRepositoryReader(github_client, [...])

    return loader
  • Constructing Documents: The markdown files of the Github repository are ingested and automatically converted to Document objects. In addition, we add the dictionary {'filename': '', 'author': ''} to the metadata of each document (which will be inhereited by the nodes). This will allow us to retrieve and display the data sources and scores in the chatbot responses to make our App more transparent:
def load_and_index_data(loader: GithubRepositoryReader) -> :
    """Load Knowledge Base from GitHub Repository"""

    logging.info("Loading data from Github: %s/%s", loader._owner, loader._repo)
    docs = loader.load_data(branch="main")
    for doc in docs:
        doc.metadata = {'filename': doc.extra_info['file_name'], 'author': "LlamaIndex"}
  • Parsing Nodes: Nodes represent a chunk of a source Document, we have defined a chunk size of '1024' with an overlap of '32'. Similar to Documents, Nodes contain metadata and relationship information with other nodes.
    [...]

    logging.info("Parsing documents into nodes...")
    parser = SimpleNodeParser.from_defaults(chunk_size=1024, chunk_overlap=32)
    nodes = parser.get_nodes_from_documents(docs)
  • Indexing: An Index is a data structure that allows to quickly retrieve relevant context for a user query. For LlamaIndex, it's the core foundation for retrieval-augmented generation (RAG) use-cases. LlamaIndex provides different types of indices, such as the VectorStoreIndex, which makes LLM calls to compute embeddings:
    [...]

    logging.info("Indexing nodes...")
    index = VectorStoreIndex(nodes)

    logging.info("Persisting index on ./storage...")
    index.storage_context.persist(persist_dir="./storage")
        
    logging.info("Data-Knowledge ingestion process is completed (OK)")
  • Querying (with cache): Once the index is constructed, querying a vector store index involves fetching the top-k most similar Nodes (by default 2), and passing those into the Response Synthesis module. The top Nodes are then appended to the user's prompt and passed to the LLM. We rely on the Streamlit caching mechanism to optimize the performance and reduce the number of calls to the LLM:
@st.cache_data(max_entries=1024, show_spinner=False)
def query_chatengine_cache(prompt, _chat_engine, settings):
    return _chat_engine.chat(prompt)
  • Parsing Response: The App parses the response source nodes to extract the filename, author and score of the top-k similar Nodes (from which the answer was retrieved):
def get_metadata(response):
    sources = []
    for item in response.source_nodes:
        if hasattr(item, "metadata"):
            filename = item.metadata.get('filename').replace('\\', '/')
            author = item.metadata.get('author')
            score = float("{:.3f}".format(item.score))
            sources.append({'filename': filename, 'author': author, 'score': score})
    
    return sources
  • Transparent Results with Source Citation: The use of metadata enables to display links to the sources along with the author and relevance scores from which the answer was retrieved:

    token_counter = TokenCountingHandler(
        tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode,
        verbose=False
    )
    
    callback_manager = CallbackManager([token_counter])
    service_context = ServiceContext.from_defaults(llm=OpenAI(model="gpt-3.5-turbo"), callback_manager=callback_manager)

๐Ÿš€ Quickstart

  1. Clone the repository:
git clone [email protected]:dcarpintero/chatwithweb3.git
  1. Create and Activate a Virtual Environment:
Windows:

py -m venv .venv
.venv\scripts\activate

macOS/Linux

python3 -m venv .venv
source .venv/bin/activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Ingest Knowledge Base
python ingest_knowledge.py
  1. Launch Web Application
streamlit run ./app.py

๐Ÿ‘ฉโ€๐Ÿ’ป Streamlit Web App

Demo Web App deployed to Streamlit Cloud and available at https://llamaindexchat.streamlit.app/

๐Ÿ“š References

llamaindexchat's People

Contributors

dcarpintero avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.