instant-labs / instant-distance Goto Github PK
View Code? Open in Web Editor NEWFast approximate nearest neighbor searching in Rust, based on HNSW index
License: Apache License 2.0
Fast approximate nearest neighbor searching in Rust, based on HNSW index
License: Apache License 2.0
While debugging another issue, I found a deadlock when a extend_candidates
is set to true
in the Heuristic:
map
test in tests/all.rs as below (set the Heuristic in the builder).cargo test map
diff --git a/instant-distance/tests/all.rs b/instant-distance/tests/all.rs
index b9fa973..8e916c6 100644
--- a/instant-distance/tests/all.rs
+++ b/instant-distance/tests/all.rs
@@ -4,7 +4,7 @@ use ordered_float::OrderedFloat;
use rand::rngs::{StdRng, ThreadRng};
use rand::{Rng, SeedableRng};
-use instant_distance::{Builder, Point as _, Search};
+use instant_distance::{Builder, Heuristic, Point as _, Search};
#[test]
#[allow(clippy::float_cmp, clippy::approx_constant)]
@@ -16,7 +16,13 @@ fn map() {
let seed = ThreadRng::default().gen::<u64>();
println!("map (seed = {seed})");
- let map = Builder::default().seed(seed).build(points, values);
+ let map = Builder::default()
+ .seed(seed)
+ .select_heuristic(Some(Heuristic {
+ extend_candidates: true,
+ ..Default::default()
+ }))
+ .build(points, values);
let mut search = Search::default();
for (i, item) in map.search(&Point(2.0, 2.0), &mut search).enumerate() {
By building an index ahead of time, the order of the candidates returned are lost. This makes it hard to map an embedding back onto it's originating word if we load the instant-distance instance from a dump.
Unless I'm missing something, does it make sense to add this information to the bincode dump?
there is an issue with instant-distance package:
ERROR: Could not find a version that satisfies the requirement instant-distance==0.3.1 (from versions: none)
ERROR: No matching distribution found for instant-distance==0.3.1
numpy
pytest
nltk
aiohttp==3.8.3
progress==1.6
instant-distance==0.3.1
python version: 3.9.6
pip version: 23.1.2
MacOS 13.4 (22F66)
Our Rust API is currently completely abstract over types that implement the Point
trait:
pub trait Point: Clone + Sync {
fn distance(&self, other: &Self) -> f32;
}
An important part of making instant-distance fast is making the distance()
implementation fast, which comes down to using a SIMD implementation. I wrote such an implementation which is specialized for the case of [f32; 300]
because that's what we typically use for InstantDomainSearch (since Meta's FastText vectors have 300 elements).
However, for the Python bindings, we have some other needs. There, we don't really have the opportunity to make use of compile-time generics; the vector length should be run-time property of the vector. Since we probably don't want to dereference through a pointer for every vector element access, this also means we might want to change the Hnsw
implementations to hold a Vec<f32>
instead of a Vec<[f32; 300]>
(for example), without losing the performance benefits of avoiding bounds checking where possible. For the Python API, we should then also do a SIMD distance implementation that can adjust to the size of the vector at run-time, ideally without much performance loss compared to the current, fixed-length implementation.
Hello,
I encountered an error with make test-python
. The error says
cp: cannot stat 'target/release/libinstant_distance.dylib': No such file or directory make: *** [Makefile:3: test-python] Error 1
I would greatly appreciate any help.
Thank you.
Best,
Yazhini
Hello, does this crate support incremental index construction? For example, I would like to store the constructed index on disk to quickly retrieve desired data, and incrementally update the index when the data is updated.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.