Giter Site home page Giter Site logo

lshfunctions.jl's Introduction

LSHFunctions.jl

Stable docs Dev docs Build Status Codecov DOI

A Julia package for locality-sensitive hashing to accelerate similarity search.

What's LSH?

Traditionally, if you have a data point x, and want to find the most similar point(s) to x in your database, you would compute the similarity between x and all of the points in your database, and keep whichever points were the most similar. For instance, this type of approach is used by the classic k-nearest neighbors algorithm. However, it has two major problems:

  • The time to find the most similar point(s) to x is linear in the number of points in your database. This can make similarity search prohibitively expensive for even moderately large datasets.
  • In addition, the time complexity to compute the similarity between two datapoints is typically linear in the number of dimensions of those datapoints. If your data are high-dimensional (i.e. in the thousands to millions of dimensions), every similarity computation you perform can be fairly costly.

Locality-sensitive hashing (LSH) is a technique for accelerating these kinds of similarity searches. Instead of measuring how similar your query point is to every point in your database, you calculate a few hashes of the query point and only compare it against those points with which it experiences a hash collision. Locality-sensitive hash functions are randomly generated, with the fundamental property that as the similarity between x and y increases, the probability of a hash collision between x and y also increases.

Installation

You can install LSHFunctions.jl from the Julia REPL with

pkg> add LSHFunctions

Supported similarity functions

So far, there are hash functions for the similarity functions:

  • Cosine similarity (SimHash)
  • Jaccard similarity (MinHash)
  • L1 (Manhattan / "taxicab") distance: L1Hash
  • L2 (Euclidean) distance: L2Hash
  • Inner product
    • SignALSH (recommended)
    • MIPSHash
  • Function-space hashes (supports L1, L2, and cosine similarity)
    • MonteCarloHash
    • ChebHash

This package still needs a lot of work, including improvement to the documentation and API.

Examples

The easiest way to start constructing new hash functions is by calling LSHFunction with the following syntax:

hashfn = LSHFunction(similarity function,
                     number of hash functions to generate;
                     [LSH family-specific keyword arguments])

For example, the following snippet generates 10 locality-sensitive hash functions (bundled together into a single SimHash ) for cosine similarity:

julia> using LSHFunctions;

julia> hashfn = LSHFunction(cossim, 10);

julia> n_hashes(hashfn)
10

julia> similarity(hashfn)
cossim

You can hash inputs by calling hashfn():

julia> x = randn(128);

julia> x_hashes = hashfn(x);

For more details, check out the LSHFunctions.jl documentation.

lshfunctions.jl's People

Contributors

kernelmethod avatar github-actions[bot] avatar

Stargazers

KAGSA avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.