Comments (6)
Hey @Rust-Ninja-Sabi! I'm here to help you with any bugs, questions, or contributions you have for this repository. I'll assist you while waiting for a human maintainer.
index.ref_doc_info
does not work with chromadb
in version 0.10.41 because the ChromaVectorStore
class does not have an attribute or method named ref_doc_info
. The class is designed to handle operations related to nodes and their embeddings within a ChromaDB collection, but it does not include any functionality related to ref_doc_info
[1][2].
The specific error message encountered is:
raise NotImplementedError(
"Vector store integrations that store text in the vector store are "
"not supported by ref_doc_info yet."
)
For managing document references and metadata in vector stores that do not support ref_doc_info
, you can follow recommended practices such as storing nodes in the document store and ensuring metadata is properly managed [3][4].
To continue talking to Dosu, mention @dosu.
from llama_index.
Document management is available for the chromadb
vector database. The provided code demonstrates how to create, add, delete, and query documents within a ChromaVectorStore
using chromadb
.
Here is an example of how to set up and use the ChromaVectorStore
:
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore
# Create a Chroma client and collection
chroma_client = chromadb.EphemeralClient()
chroma_collection = chroma_client.create_collection("example_collection")
# Set up the ChromaVectorStore and StorageContext
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
# Example of adding nodes
nodes = [...] # List of BaseNode instances with embeddings
vector_store.add(nodes)
# Example of querying nodes
query = VectorStoreQuery(query_embedding=[...], similarity_top_k=5)
result = vector_store.query(query)
# Example of deleting nodes
vector_store.delete_nodes(node_ids=["node_id_1", "node_id_2"])
Additionally, the ChromaReader
class from the llama_index
library allows you to retrieve documents from existing persisted Chroma collections. Here is an example of how to use the ChromaReader
:
from llama_index.legacy.readers.chroma import ChromaReader
# Initialize the ChromaReader
chroma_reader = ChromaReader(
collection_name="example_collection",
persist_directory="/path/to/persisted/collection"
)
# Load data from the collection
documents = chroma_reader.load_data(query_embedding=[...], limit=10)
For persistent storage, you can use the PersistentClient
to manage your Chroma collections. Here is an example:
from typing import Optional
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb
def get_vector_store(docset_id) -> Optional[ChromaVectorStore]:
db = chromadb.PersistentClient(path="/path/to/persisted/collection")
chroma_collection = db.get_or_create_collection(docset_id)
return ChromaVectorStore(
chroma_collection=chroma_collection, embed_model=EMBEDDINGS
)
This setup ensures that your document management with chromadb
is both flexible and persistent [1][2][3][4].
To continue talking to Dosu, mention @dosu.
from llama_index.
To query and retrieve all nodes from the ChromaVectorStore
, you can use the query
method with an empty query. Here is an example:
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core.vector_stores.types import VectorStoreQuery
# Create a Chroma client and collection
chroma_client = chromadb.Client()
collection = chroma_client.get_or_create_collection("example_collection")
# Set up the ChromaVectorStore
vector_store = ChromaVectorStore.from_collection(collection)
# Retrieve all nodes
query = VectorStoreQuery(query_embedding=[], similarity_top_k=0)
all_nodes_result = vector_store.query(query)
# all_nodes_result.nodes will contain the list of all nodes in the collection
print(all_nodes_result.nodes)
This code will retrieve all nodes from the ChromaVectorStore
by calling the query
method with an empty query embedding and similarity_top_k
set to 0. The query
method will return a VectorStoreQueryResult
object containing all nodes in the collection [1][2][3].
To continue talking to Dosu, mention @dosu.
from llama_index.
@dosu When will it work with Chromadb? I use llama index to have one api for all available stores. But as far as I understand document management is not available for chromadb. Is ist available for any Vector database? Thanks.
from llama_index.
@dosu vector_store.get_notes() is not implemented. Can you show me the query to get all nodes? Thanks.
from llama_index.
@Rust-Ninja-Sabi this is correct. ref_doc_info
relies on having a docstore, but most vector dbs disable the docstore because they are storing all the nodes in the vector db (simplifies storage)
I'd recommend using an ingestion pipeline with a vector store and docstore attached
pipeline = IngestionPipeline(
transformations=[SentenceSplitter(), OpenAIEmbedding()],
docstore=docstore,
vector_store=vector_store
)
pipeline.run(documents=documents)
https://docs.llamaindex.ai/en/stable/module_guides/loading/ingestion_pipeline/?h=ingestion
from llama_index.
Related Issues (20)
- [Feature Request]: Convert TransformQueryEngine to use Workflows
- [Feature Request]: Convert BaseQueryEngine to extend workflows
- [Feature Request]: Convert CitationQueryEngine to use Workflows
- [Feature Request]: Create a WorkflowQueryEngine for custom query engines that wrap existing workflows
- [Feature Request]: Convert CustomQueryEngine to use workflows
- [Feature Request]: Create a stream agent + workflows example
- [Feature Request]: Allow streaming of progress during workflow execution
- [Question]: How can I combine Vector DB and a new query engine? HOT 1
- [Question]: Less context than similarity_top_k HOT 3
- [Feature Request]:
- [Question]: Ingestion Pipelines and Workflows? HOT 2
- [Bug]: poetry add llama-index failing for v0.10.65 HOT 6
- [Bug]: impossible to use PDfReader with an S3 file because of Path() casting HOT 1
- [Bug]: impossible to use PDfReader with an S3 file because of Path() casting HOT 1
- [Question]: How to run HuggingFaceEmbedding on multiple available GPUs? HOT 1
- [Question]: HOT 2
- [Question]: Constructing hybrid indices with Qdrant. HOT 10
- [Bug]: NeptuneAnalyticsPropertyGraphStore incorrectly assigning the embedding to the chunk instead of the entity HOT 2
- [Question]: Data disappear after build vector store with VectorStoreIndex.from_documents() HOT 6
- [Bug]: VectorStoreIndex.build_index_from_nodes() missing 1 required positional argument: 'self' HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama_index.