Giter Site home page Giter Site logo

kristofferc / nearestneighbors.jl Goto Github PK

View Code? Open in Web Editor NEW
407.0 15.0 65.0 1.23 MB

High performance nearest neighbor data structures (KDTree and BallTree) and algorithms for Julia.

License: Other

Julia 100.00%
nearest-neighbors datastructures knn-search range-search

nearestneighbors.jl's Introduction

NearestNeighbors.jl

Build Status codecov.io DOI

NearestNeighbors.jl is a Julia package for performing nearest neighbor searches.

Creating a Tree

There are currently three types of trees available:

  • KDTree: Recursively splits points into groups using hyper-planes.
  • BallTree: Recursively splits points into groups bounded by hyper-spheres.
  • BruteTree: Not actually a tree. It linearly searches all points in a brute force manner.

These trees can be created using the following syntax:

KDTree(data, metric; leafsize, reorder)
BallTree(data, metric; leafsize, reorder)
BruteTree(data, metric; leafsize, reorder) # leafsize and reorder are unused for BruteTree
  • data: The points to build the tree from, either as
    • A matrix of size nd ร— np where nd is the dimensionality and np is the number of points, or
    • A vector of vectors with fixed dimensionality nd, i.e., data should be a Vector{V} where V is a subtype of AbstractVector with defined length(V). For example a Vector{V} where V = SVector{3, Float64} is ok because length(V) = 3 is defined.
  • metric: The Metric (from Distances.jl) to use, defaults to Euclidean. KDTree works with axis-aligned metrics: Euclidean, Chebyshev, Minkowski, and Cityblock while for BallTree and BruteTree other pre-defined Metrics can be used as well as custom metrics (that are subtypes of Metric).
  • leafsize: Determines the number of points (default 10) at which to stop splitting the tree. There is a trade-off between tree traversal and evaluating the metric for an increasing number of points.
  • reorder: If true (default), during tree construction this rearranges points to improve cache locality during querying. This will create a copy of the original data.

All trees in NearestNeighbors.jl are static, meaning points cannot be added or removed after creation. Note that this package is not suitable for very high dimensional points due to high compilation time and inefficient queries on the trees.

Example of creating trees:

using NearestNeighbors
data = rand(3, 10^4)

# Create trees
kdtree = KDTree(data; leafsize = 10)
balltree = BallTree(data, Minkowski(3.5); reorder = false)
brutetree = BruteTree(data)

k-Nearest Neighbor (kNN) Searches

A kNN search finds the k nearest neighbors to a given point or points. This is done with the methods:

knn(tree, point[s], k, skip = always_false) -> idxs, dists
knn!(idxs, dists, tree, point, k, skip = always_false)
  • tree: The tree instance.
  • points: A vector or matrix of points to find the k nearest neighbors for. A vector of numbers represents a single point; a matrix means the k nearest neighbors for each point (column) will be computed. points can also be a vector of vectors.
  • skip (optional): A predicate to skip certain points, e.g., points already visited.

For the single closest neighbor, you can use nn:

nn(tree, points, skip = always_false) -> idxs, dists

Examples:

using NearestNeighbors
data = rand(3, 10^4)
k = 3
point = rand(3)

kdtree = KDTree(data)
idxs, dists = knn(kdtree, point, k, true)

idxs
# 3-element Array{Int64,1}:
#  4683
#  6119
#  3278

dists
# 3-element Array{Float64,1}:
#  0.039032201026256215
#  0.04134193711411951
#  0.042974090446474184

# Multiple points
points = rand(3, 4)
idxs, dists = knn(kdtree, points, k, true)

idxs
# 4-element Array{Array{Int64,1},1}:
#  [3330, 4072, 2696]
#  [1825, 7799, 8358]
#  [3497, 2169, 3737]
#  [1845, 9796, 2908]

# dists
# 4-element Array{Array{Float64,1},1}:
#  [0.0298932, 0.0327349, 0.0365979]
#  [0.0348751, 0.0498355, 0.0506802]
#  [0.0318547, 0.037291, 0.0421208]
#  [0.03321, 0.0360935, 0.0411951]

# Static vectors
using StaticArrays
v = @SVector[0.5, 0.3, 0.2];

idxs, dists = knn(kdtree, v, k, true)

idxs
# 3-element Array{Int64,1}:
#   842
#  3075
#  3046

dists
# 3-element Array{Float64,1}:
#  0.04178677766255837
#  0.04556078331418939
#  0.049967238112417205

# Preallocating input results
idxs, dists = zeros(Int32, k), zeros(Float32, k)
knn!(idxs, dists, kdtree, v, k)

Range Searches

A range search finds all neighbors within the range r of given point(s). This is done with the methods:

inrange(tree, points, r) -> idxs
inrange!(idxs, tree, point, r)

Distances are not returned.

Example:

using NearestNeighbors
data = rand(3, 10^4)
r = 0.05
point = rand(3)

balltree = BallTree(data)
idxs = inrange(balltree, point, r)

# 4-element Array{Int64,1}:
#  317
#  983
# 4577
# 8675

# Updates `idxs`
idxs = Int32[]
inrange!(idxs, balltree, point, r)

# counts points without allocating index arrays
neighborscount = inrangecount(balltree, point, r)

Using On-Disk Data Sets

By default, trees store a copy of the data provided during construction. For data sets larger than available memory, DataFreeTree can be used to strip a tree of its data field and re-link it later.

Example with a large on-disk data set:

using Mmap
ndim = 2
ndata = 10_000_000_000
data = Mmap.mmap(datafilename, Matrix{Float32}, (ndim, ndata))
data[:] = rand(Float32, ndim, ndata) # create example data
dftree = DataFreeTree(KDTree, data)

dftree stores the indexing data structures. To perform look-ups, re-link the tree to the data:

tree = injectdata(dftree, data) # yields a KDTree
knn(tree, data[:,1], 3) # perform operations as usual

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.