Giter Site home page Giter Site logo

goabiaryan / llama_index Goto Github PK

View Code? Open in Web Editor NEW

This project forked from run-llama/llama_index

0.0 0.0 0.0 31.35 MB

LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM's with external data.

Home Page: https://gpt-index.readthedocs.io/en/latest/

License: MIT License

Shell 0.12% Python 98.10% Makefile 0.03% Jupyter Notebook 1.76%

llama_index's Introduction

๐Ÿ—‚๏ธ LlamaIndex ๐Ÿฆ™

LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM's with external data.

PyPI:

Documentation: https://gpt-index.readthedocs.io/.

Twitter: https://twitter.com/gpt_index.

Discord: https://discord.gg/dGcwcsnxhU.

Ecosystem

๐Ÿš€ Overview

NOTE: This README is not updated as frequently as the documentation. Please check out the documentation above for the latest updates!

Context

  • LLMs are a phenomenal piece of technology for knowledge generation and reasoning. They are pre-trained on large amounts of publicly available data.
  • How do we best augment LLMs with our own private data?
  • One paradigm that has emerged is in-context learning (the other is finetuning), where we insert context into the input prompt. That way, we take advantage of the LLM's reasoning capabilities to generate a response.

To perform LLM's data augmentation in a performant, efficient, and cheap manner, we need to solve two components:

  • Data Ingestion
  • Data Indexing

Proposed Solution

That's where the LlamaIndex comes in. LlamaIndex is a simple, flexible interface between your external data and LLMs. It provides the following tools in an easy-to-use fashion:

  • Offers data connectors to your existing data sources and data formats (API's, PDF's, docs, SQL, etc.)
  • Provides indices over your unstructured and structured data for use with LLM's. These indices help to abstract away common boilerplate and pain points for in-context learning:
    • Storing context in an easy-to-access format for prompt insertion.
    • Dealing with prompt limitations (e.g. 4096 tokens for Davinci) when context is too big.
    • Dealing with text splitting.
  • Provides users an interface to query the index (feed in an input prompt) and obtain a knowledge-augmented output.
  • Offers you a comprehensive toolset trading off cost and performance.

๐Ÿ’ก Contributing

Interested in contributing? See our Contribution Guide for more details.

๐Ÿ“„ Documentation

Full documentation can be found here: https://gpt-index.readthedocs.io/en/latest/.

Please check it out for the most up-to-date tutorials, how-to guides, references, and other resources!

๐Ÿ’ป Example Usage

pip install llama-index

Examples are in the examples folder. Indices are in the indices folder (see list of indices below).

To build a simple vector store index:

import os
os.environ["OPENAI_API_KEY"] = 'YOUR_OPENAI_API_KEY'

from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader('data').load_data()
index = GPTVectorStoreIndex.from_documents(documents)

To query:

query_engine = index.as_query_engine()
query_engine.query("<question_text>?")

By default, data is stored in-memory. To persist to disk (under ./storage):

index.storage_context.persist()

To reload from disk:

from llama_index import StorageContext, load_index_from_storage

# rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir='./storage')
# load index
index = load_index_from_storage(storage_context)

๐Ÿ”ง Dependencies

The main third-party package requirements are tiktoken, openai, and langchain.

All requirements should be contained within the setup.py file. To run the package locally without building the wheel, simply run pip install -r requirements.txt.

๐Ÿ“– Citation

Reference to cite if you use LlamaIndex in a paper:

@software{Liu_LlamaIndex_2022,
author = {Liu, Jerry},
doi = {10.5281/zenodo.1234},
month = {11},
title = {{LlamaIndex}},
url = {https://github.com/jerryjliu/llama_index},
year = {2022}
}

llama_index's People

Contributors

ahmetkca avatar ajndkr avatar bborn avatar disiok avatar ekzhu avatar eltociear avatar emptycrown avatar filip-halt avatar habaneraa avatar hongyishi avatar iaalm avatar jerryjliu avatar kacperlukawski avatar kahkeng avatar kpister avatar logan-markewich avatar mikkolehtimaki avatar mistapproach avatar mmourafiq avatar nickscamara avatar notauserx avatar ravi03071991 avatar ryanglambert avatar shreyar avatar smyja avatar spartee avatar teoh avatar vivalapanda avatar vr140 avatar yisding avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.