Giter Site home page Giter Site logo

reverseame / apotheosis Goto Github PK

View Code? Open in Web Editor NEW
6.0 1.0 1.0 344 KB

A specialized implementation of the Hierarchical Navigable Small World (HNSW) data structure adapted for efficient nearest neighbor lookup of approximate matching hashes

License: GNU General Public License v3.0

Python 99.60% TeX 0.34% Shell 0.06%
approximate-nearest-neighbor-search hnsw python3 approximate-matching

apotheosis's Introduction

APOTHEOSIS

APOTHEOSIS (APprOximaTe searcH systEm Of Similarity dIgeSts) is a powerful system to perform similarity search, using Radix Tree and specialized implementation of the Hierarchical Navigable Small World (HNSW) data structure adapted for efficient nearest neighbor lookup of approximate matching hashes.

License: GPL v3

Features

  • Construction of APOTHEOSIS model, consisting on two data structures: Radix Tree and HNSW.
  • Insertion of nodes in the system.
  • K-nearest neighbor search based on similarity.
  • Threshold search to retrieve nodes that meet a similarity threshold.
  • Several scripts to test and benchmark the system and generate graphs.
  • Logging functionality for debugging and monitoring.
  • Ability to save to disk and restore the index structure.
  • REST API for easily adaption on existing workflows

Installation

You can install the necessary dependencies using:

pip install -r requirements.txt

System configuration parameters

In order to reach the proper balance between precission and speed, some configuration values can be modified in order to tune the performance. This configuration values have impact on HNSW data structure mainly. Values may be adjusted depending on your use case.

  • M: specifies the number of connections of a new node when inserted.
  • Mmax: specifies the maximum number of neighbors (connections) a node can have at each layer of the hierarchy higher than zero
  • Mmax0: specifies the maximum number of neighbors (connections) a node can have at each layer of the hierarchy at layer zero
  • ef: : controls the number of neighbors to explore during the construction and search phase of the HNSW graph.
  • heuristic (True/False): Perform KNN search with heuristic. May improve performance, check Algorithm 4 in MY-TPAMI-20 for more information.
  • keep_pruned_conns (True/False): Indicate whether or not to add discarded elements when performing heuristic search.
  • beer_factor: Performs random walk when exploring the neighborhood.

Usage

from apotheosis_winmodule import ApotheosisWinModule
from datalayer.node import HashNode
from datalayer.hash_algorithm.tlsh_algorithm import TLSHHashAlgorithm

# Create an APOTHEOSIS model
myApo = ApotheosisWinModule(M=4, ef=4, Mmax=8, Mmax0=16,\
                    heuristic=False,\
                    extend_candidates=False, keep_pruned_conns=False,\
                    beer_factor=0,\
                    distance_algorithm=TLSHHashAlgorithm)

# Create the nodes based on TLSH algorithm approximate matching hashes
node1 = HashNode("T12B81E2134758C0E3CA097B381202C62AC793B46686CD9E2E8F9190EC89C537B5E7AF4C", TLSHHashAlgorithm)
node2 = HashNode("T10381E956C26225F2DAD9D5C2C5C1A337FAF3708A25012B8A1EACDAC00B37D557E0E714", TLSHHashAlgorithm)
node3 = HashNode("T1DF8174A9C2A506F9C6FFC292D6816333FEF1B845C419121A0F91CF5359B5B21FA3A304", TLSHHashAlgorithm)

# Insert nodes on the APOTHEOSIS system
myApo.insert(node1)
myApo.insert(node2)
myApo.insert(node3)

# Perform k-nearest neighbor search based on TLSH approximate matching hashes
query_node = HashNode("T1BF81A292E336D1F68224D4A4C751A2B3BB353CA9C2103BA69FA4C7908761B50F22E301", TLSHHashAlgorithm)
results = myApo.knn_search(query_node, k=5)

# Perform threshold search to retrieve nodes above a similarity threshold
results = myApo.threshold_search(query_node, threshold=60)

# Dump created APOTHEOSIS structure to disk
myHNSW.dump("myAPO")

License

Licensed under the GNU GPLv3 license.

apotheosis's People

Contributors

danielhuici avatar dependabot[bot] avatar ricardojrdez avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

niceboy120

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.