Giter Site home page Giter Site logo

Comments (6)

dosubot avatar dosubot commented on August 16, 2024 2

Hey @Rust-Ninja-Sabi! I'm here to help you with any bugs, questions, or contributions you have for this repository. I'll assist you while waiting for a human maintainer.

index.ref_doc_info does not work with chromadb in version 0.10.41 because the ChromaVectorStore class does not have an attribute or method named ref_doc_info. The class is designed to handle operations related to nodes and their embeddings within a ChromaDB collection, but it does not include any functionality related to ref_doc_info [1][2].

The specific error message encountered is:

raise NotImplementedError(
    "Vector store integrations that store text in the vector store are "
    "not supported by ref_doc_info yet."
)

For managing document references and metadata in vector stores that do not support ref_doc_info, you can follow recommended practices such as storing nodes in the document store and ensuring metadata is properly managed [3][4].

To continue talking to Dosu, mention @dosu.

from llama_index.

dosubot avatar dosubot commented on August 16, 2024 2

Document management is available for the chromadb vector database. The provided code demonstrates how to create, add, delete, and query documents within a ChromaVectorStore using chromadb.

Here is an example of how to set up and use the ChromaVectorStore:

import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore

# Create a Chroma client and collection
chroma_client = chromadb.EphemeralClient()
chroma_collection = chroma_client.create_collection("example_collection")

# Set up the ChromaVectorStore and StorageContext
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

# Example of adding nodes
nodes = [...]  # List of BaseNode instances with embeddings
vector_store.add(nodes)

# Example of querying nodes
query = VectorStoreQuery(query_embedding=[...], similarity_top_k=5)
result = vector_store.query(query)

# Example of deleting nodes
vector_store.delete_nodes(node_ids=["node_id_1", "node_id_2"])

Additionally, the ChromaReader class from the llama_index library allows you to retrieve documents from existing persisted Chroma collections. Here is an example of how to use the ChromaReader:

from llama_index.legacy.readers.chroma import ChromaReader

# Initialize the ChromaReader
chroma_reader = ChromaReader(
    collection_name="example_collection",
    persist_directory="/path/to/persisted/collection"
)

# Load data from the collection
documents = chroma_reader.load_data(query_embedding=[...], limit=10)

For persistent storage, you can use the PersistentClient to manage your Chroma collections. Here is an example:

from typing import Optional
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

def get_vector_store(docset_id) -> Optional[ChromaVectorStore]:
    db = chromadb.PersistentClient(path="/path/to/persisted/collection")
    chroma_collection = db.get_or_create_collection(docset_id)
    return ChromaVectorStore(
        chroma_collection=chroma_collection, embed_model=EMBEDDINGS
    )

This setup ensures that your document management with chromadb is both flexible and persistent [1][2][3][4].

To continue talking to Dosu, mention @dosu.

from llama_index.

dosubot avatar dosubot commented on August 16, 2024 2

To query and retrieve all nodes from the ChromaVectorStore, you can use the query method with an empty query. Here is an example:

import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core.vector_stores.types import VectorStoreQuery

# Create a Chroma client and collection
chroma_client = chromadb.Client()
collection = chroma_client.get_or_create_collection("example_collection")

# Set up the ChromaVectorStore
vector_store = ChromaVectorStore.from_collection(collection)

# Retrieve all nodes
query = VectorStoreQuery(query_embedding=[], similarity_top_k=0)
all_nodes_result = vector_store.query(query)

# all_nodes_result.nodes will contain the list of all nodes in the collection
print(all_nodes_result.nodes)

This code will retrieve all nodes from the ChromaVectorStore by calling the query method with an empty query embedding and similarity_top_k set to 0. The query method will return a VectorStoreQueryResult object containing all nodes in the collection [1][2][3].

To continue talking to Dosu, mention @dosu.

from llama_index.

Rust-Ninja-Sabi avatar Rust-Ninja-Sabi commented on August 16, 2024

@dosu When will it work with Chromadb? I use llama index to have one api for all available stores. But as far as I understand document management is not available for chromadb. Is ist available for any Vector database? Thanks.

from llama_index.

Rust-Ninja-Sabi avatar Rust-Ninja-Sabi commented on August 16, 2024

@dosu vector_store.get_notes() is not implemented. Can you show me the query to get all nodes? Thanks.

from llama_index.

logan-markewich avatar logan-markewich commented on August 16, 2024

@Rust-Ninja-Sabi this is correct. ref_doc_info relies on having a docstore, but most vector dbs disable the docstore because they are storing all the nodes in the vector db (simplifies storage)

I'd recommend using an ingestion pipeline with a vector store and docstore attached

pipeline = IngestionPipeline(
  transformations=[SentenceSplitter(), OpenAIEmbedding()], 
  docstore=docstore, 
  vector_store=vector_store
)

pipeline.run(documents=documents)

https://docs.llamaindex.ai/en/stable/module_guides/loading/ingestion_pipeline/?h=ingestion

from llama_index.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.