Giter Site home page Giter Site logo

srbhr / vectordb2 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kagisearch/vectordb

0.0 1.0 0.0 632 KB

A minimal Python package for storing and retrieving text using chunking, embeddings, and vector search.

Home Page: https://vectordb.com

License: MIT License

Python 100.00%

vectordb2's Introduction

VectorDB

Twitter License: MIT

VectorDB is a lightweight, low-latency Python package for storing and retrieving text using chunking, embedding, and vector search techniques. It provides an easy-to-use interface for saving, searching, and managing textual data with associated metadata and is designed for use cases where low latency is essential.

Installation

To install VectorDB, use pip (note vectordb2 package name):

pip install vectordb2

Usage

Here's a quick example of how to use VectorDB:

from vectordb import Memory

memory = Memory()

# text = "..."
# metadata = {...}

# Save text with metadata
# This will automatically embed content
memory.save(text, metadata)

# Search for top n relevant chunks
# We will automatically use the fastest vector search backend
results = memory.search(query, top_n=3)

Methods

Memory provides the following methods:

init(memory_file=None, chunking_strategy={"mode":"sliding_window"}, embedding_model="sentence-transformers/all-MiniLM-L6-v2")

  • Initializes the Memory class.
  • memory_file (str): Path to the memory file (default: None). If provided, memory will persist to disk and loaded/saved to this file. save() will automatically use the provided memory_file.
  • chunking_strategy (dict): Dictionary containing the chunking mode (default: {"mode": "sliding_window"})
    Options:
    {'mode':'sliding_window', 'window_size': 240, 'overlap': 8}
    {'mode':'paragraph'}
  • embeddings (str): can be 'fast', 'normal' or 'best' or the name of the pre-trained embeddings model from HuggingFace to be used for embeddings (default: "sentence-transformers/all-MiniLM-L6-v2"). See Pretrained models and MTEB.

save(texts, metadata, memory_file=None)

  • Saves the given texts and metadata to memory.
  • texts (str or list of str): Text or list of texts to be saved.
  • metdata (dict or list of dict): Metadata or list of metadata associated with the texts.
  • memory_file (str): Path to the memory file (default: None).

search(self, query, top_n=5)

  • Searches for the most similar chunks to the given query in memory.
  • query (str): Query text.
  • top_n (int): Number of most similar chunks to return (default: 5).
  • Returns: List of dictionaries containing the top_n most similar chunks and their associated metadata.

clear(self)

  • Clears the memory.

dump(self)

  • Prints the contents of the memory.

Example

from vectordb import Memory

memory = Memory(chunking_strategy={'mode':'sliding_window', 'window_size': 128, 'overlap': 16})

text = """
Machine learning is a method of data analysis that automates analytical model building.

It is a branch of artificial intelligence based on the idea that systems can learn from data,
identify patterns and make decisions with minimal human intervention.

Machine learning algorithms are trained on data sets that contain examples of the desired output. For example, a machine learning algorithm that is used to classify images might be trained on a data set that contains images of cats and dogs.
Once an algorithm is trained, it can be used to make predictions on new data. For example, the machine learning algorithm that is used to classify images could be used to predict whether a new image contains a cat or a dog.

Machine learning algorithms can be used to solve a wide variety of problems. Some common applications of machine learning include:

Classification: Categorizing data into different groups. For example, a machine learning algorithm could be used to classify emails as spam or not spam.

Regression: Predicting a continuous value. For example, a machine learning algorithm could be used to predict the price of a house.

Clustering: Finding groups of similar data points. For example, a machine learning algorithm could be used to find groups of customers with similar buying habits.

Anomaly detection: Finding data points that are different from the rest of the data. For example, a machine learning algorithm could be used to find fraudulent credit card transactions.

Machine learning is a powerful tool that can be used to solve a wide variety of problems. As the amount of data available continues to grow, machine learning is likely to become even more important in the future.

"""

metadata = {"title": "Introduction to Machine Learning", "url": "https://example.com/introduction-to-machine-learning"}

memory.save(text, metadata)

text2 = """
Artificial intelligence (AI) is the simulation of human intelligence in machines
that are programmed to think like humans and mimic their actions.

The term may also be applied to any machine that exhibits traits associated with
a human mind such as learning and problem-solving.

AI research has been highly successful in developing effective techniques for solving a wide range of problems, from game playing to medical diagnosis.

However, there is still a long way to go before AI can truly match the intelligence of humans. One of the main challenges is that human intelligence is incredibly complex and poorly understood.

Despite the challenges, AI is a rapidly growing field with the potential to revolutionize many aspects of our lives. Some of the potential benefits of AI include:

Increased productivity: AI can be used to automate tasks that are currently performed by humans, freeing up our time for more creative and fulfilling activities.

Improved decision-making: AI can be used to make more informed decisions, based on a wider range of data than humans can typically access.

Enhanced creativity: AI can be used to generate new ideas and solutions, beyond what humans can imagine on their own.
Of course, there are also potential risks associated with AI, such as:

Job displacement: As AI becomes more capable, it is possible that it will displace some human workers.

Weaponization: AI could be used to develop new weapons that are more powerful and destructive than anything we have today.

Loss of control: If AI becomes too powerful, we may lose control over it, with potentially disastrous consequences.

It is important to weigh the potential benefits and risks of AI carefully as we continue to develop this technology. With careful planning and oversight, AI has the potential to make the world a better place. However, if we are not careful, it could also lead to serious problems.
"""

metadata2 = {"title": "Introduction to Artificial Intelligence", "url": "https://example.com/introduction-to-artificial-intelligence"}

memory.save(text2, metadata2)

query = "What is the relationship between AI and machine learning?"

results = memory.search(query, top_n=3)

print(results)

Output:

[
  {
    'chunk': 'Machine learning is a method of data analysis that automates analytical model building . It is a branch of artificial intelligence based on the idea that systems can learn from data , identify patterns and make decisions with minimal human intervention . Machine learning algorithms are trained on data sets that contain examples of the desired output . For example , a machine learning algorithm that is used to classify images might be trained on a data set that contains images of cats and dogs . Once an algorithm is trained , it can be used to make predictions on new data . For example , the machine learning algorithm that is used to classify images could be used to predict whether a new image contains a cat', 
    'metadata': 
    {
      'title': 'Introduction to Machine Learning', 
      'url': 'https://example.com/introduction-to-machine-learning'
    }
  }, 
  {
    'chunk': 'Artificial intelligence ( AI ) is the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions . The term may also be applied to any machine that exhibits traits associated with a human mind such as learning and problem - solving . AI research has been highly successful in developing effective techniques for solving a wide range of problems , from game playing to medical diagnosis . However , there is still a long way to go before AI can truly match the intelligence of humans . One of the main challenges is that human intelligence is incredibly complex and poorly understood . Despite the challenges , AI is a rapidly growing field with the potential to revolutionize many aspects', 
    'metadata': 
    {
      'title': 'Introduction to Artificial Intelligence', 
      'url': 'https://example.com/introduction-to-artificial-intelligence'
    }
  }, 
  {
    'chunk': 'are also potential risks associated with AI , such as : Job displacement : As AI becomes more capable , it is possible that it will displace some human workers . Weaponization : AI could be used to develop new weapons that are more powerful and destructive than anything we have today . Loss of control : If AI becomes too powerful , we may lose control over it , with potentially disastrous consequences . It is important to weigh the potential benefits and risks of AI carefully as we continue to develop this technology . With careful planning and oversight , AI has the potential to make the world a better place . However , if we are not careful , it could also lead to serious', 
    'metadata': 
    {
      'title': 'Introduction to Artificial Intelligence', 
      'url': 'https://example.com/introduction-to-artificial-intelligence'
    }
  }
]

Vector search performance

VectorDB is optimized for speed of retrieval and uses Faiss for low number of chunks (<4000) and mrpt for high number of chunks to ensure maximum performance across the spectrum of use cases.

Vector search engine comparison

License

MIT License.

vectordb2's People

Contributors

vprelovac avatar bkiat1123 avatar tamamushi-iro avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.