hora-search / hora Goto Github PK
View Code? Open in Web Editor NEWπ efficient approximate nearest neighbor search algorithm collections library written in Rust π¦ .
Home Page: http://horasearch.com/
License: Apache License 2.0
π efficient approximate nearest neighbor search algorithm collections library written in Rust π¦ .
Home Page: http://horasearch.com/
License: Apache License 2.0
Hi
The current index add method adds a single vector. Is it possible to add 2D numpy array of vectors and use the array row position as the index.
# proposal for consideration
a = np.array([[1,2,3],[4,5,6]]
index.add(a)
# currently
for i in a:
index.add(sample, i).unwrap();
Hi.
With the latest hora, I get a panic when using hora::core::metrics::Metric::CosineSimilarity
.
thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/core/neighbor.rs:32:54
I think this is an error caused by partial_cmp
.
partial_cmp
expect it to be None when the given value cannot be ordered.
version does is not available.
https://www.python.org/dev/peps/pep-0396/
import horapy
dir(horapy)
['BruteForceIndex', 'HNSWIndex', 'HoraANNIndex', 'HoraBruteForceIndexStr', 'HoraBruteForceIndexUsize', 'HoraHNSWIndexStr', 'HoraHNSWIndexUsize', 'HoraIVFPQIndexStr', 'HoraIVFPQIndexUsize', 'HoraPQIndexStr', 'HoraPQIndexUsize', 'HoraSSGIndexStr', 'HoraSSGIndexUsize', 'IVFPQIndex', 'PQIndex', 'SSGIndex', 'builtins', 'cached', 'doc', 'file', 'loader', 'name', 'package', 'path', 'spec', 'horapy', 'numpy']
horapy.version
Traceback (most recent call last):
File "", line 1, in
AttributeError: module 'horapy' has no attribute 'version'
This line assumes, that dot
function returns a simple dot production and negates it to make it work like a distance (smaller = closer)
Line 56 in a6759f8
But in reality, dot
function is already negated in the SIMDOptmized
inplementation:
Line 32 in ce10e42
Which leads to double inverting and incorrect use of Dot Production
Hey, I just saw this on Reddit and I am very excited to try this out on my computer vision datasets.
I just created an ANN search data structure somewhat recently called HGG (https://github.com/rust-cv/hgg), and I would love to be able to add it to your collection of nearest neighbor searches (and especially over at https://github.com/hora-search/ann-benchmarks). It is based on HNSW, and I am currently using it for computer vision purposes. Let me know if you need some help integrating it.
I'm not sure if this is my problem, (possible).
I am testing the index types all work, (SSG had extensive index times > 24 hours so I aborted that).
IVFPQIndex
I am doing this with all index types, no issues except this case.
index = IVFPQIndex(dimension, "usize")
for i,d in tqdm(enumerate(hd[hashtype].tolist())):
index.add(d,i)
FYI:
print(i,d)
15746524 [248, 225, 188, 223, 199, 174, 144, 146]
Error:
lib\site-packages\horapy\__init__.py in build(self, metrics)
36
37 def build(self, metrics=""):
---> 38 self.ann_idx.build(metrics)
39
40 def add(self, vs, idx=None):
PanicException: attempt to calculate the remainder with a divisor of zero
Other ANN libraries return the index array of the closest vectors and optionally the distance metric in a second array.
Is that possible or in future plans?
Thanks.
I have millions of vectors that need to be indexed, and the build speed is extremely slow.
Can I serialize index to disk and then deserialize it into an index?
Hello! Thanks for your wonderful library. Please tell me if supported extensible index. Can I adding elements to an already built index?
I watch a great marketplace on my country.
I live in Ecuador and the country have good opportunities on the turism market.
I have a idea to connect the people, but I need some help.
I have basic skills on development, and some money to start a project . Thanks.
Psdt: is a great business
I am finding a product which can replace Elasticsearch.
Hey guys,
This isn't an issue. You can close or delete it whenever you wish. I just wanted to share a fun fact.
I discovered "hora" at the top of a "most trending projects" lists and it attracted my attention as "hora" ("Ρ ΠΎΡΠ°" in Cyrillic) means "people". I thought, is it possible that someone will name his trending lib after a Bulgarian word? Maybe there was a hidden meaning... I opened and scrolled the repo - what can I see - a list of faces/people. I was right. Someone named the library after a Bulgarian word... only to realise a few moments later that it's named after a Japanese word.
What a funny collision.
Have a great week!
Hi,
I'm using Horapy, and everything works great. But how do I save an index to disk after building it, so that I can use it elsewhere or later?
Tried pickling the index but it gives an error that usize objects can't be pickled.
For the Node version, is it possible to run comparisons on arrays of objects?
For example, for using it with map data, the query would be a tuple of a map point coordinates: [latitude, longitude].
The material to compare against would be an array of objects similar to:
[ { objectId: 123, someOtherObjectData: string, coordinates: [latitude, longitude] }, { next object }, { next object} ... ]
If so, would it be possible to provide a few pointers or a basic example?
Thanks much!
Now
hello, where is the Wine Search source code?
Any support for Arabic, Hebrew, Persian (Farsi), Chinese and other non-ascii languages would be great.
Are there any plans to support Android in the near future?
Can this library add/remove/modify nodes to builded graph?
Does this work with Django?
Thanks
as title
when n increase, index buid very slow? is that normal?
Thanks for the very nice library! I'm interested in using hora for doing nearest neighbor finding in single-cell genomics. The data of interest consist of very high dimensional points (D = 30,000), but for most points, most dimensions have value 0. Therefore, I'd like to avoid (it's not really feasible) to densify the elements before indexing them. Is there some way to provide a custom implementation of the relevant distance metrics for the indexed type such that I don't have to actually insert a dense representation of the points into the index?
So, I tried installing using the pip install horapy
, but getting this error:
ERROR: Command errored out with exit status 1:
command: 'C:\Users\acer\anaconda3\python.exe' 'C:\Users\acer\anaconda3\lib\site-packages\pip\_vendor\pep517\_in_process.py' prepare_metadata_for_build_wheel 'C:\Users\acer\AppData\Local\Temp\tmp6bg4brka'
cwd: C:\Users\acer\AppData\Local\Temp\pip-install-qsf3wt4q\horapy_417ccdb8395f4cf39fc2475178a42d0c
Complete output (6 lines):
Cargo, the Rust package manager, is not installed or is not on PATH.
This package requires Rust and Cargo to compile extensions. Install it through
the system's package manager or via https://rustup.rs/
Checking for Rust toolchain....
----------------------------------------
WARNING: Discarding https://files.pythonhosted.org/packages/e2/4e/75206eb830280af8b6e0618bd76b4e3dc98b24f9cc9c3809af14b85c4f35/horapy-0.0.1.tar.gz#sha256=f1be68b8481b348e8a615328923f893678eac03215483abf5ff04304b8f952cc (from https://pypi.org/simple/horapy/). Command errored out with exit status 1: 'C:\Users\acer\anaconda3\python.exe' 'C:\Users\acer\anaconda3\lib\site-packages\pip\_vendor\pep517\_in_process.py' prepare_metadata_for_build_wheel 'C:\Users\acer\AppData\Local\Temp\tmp6bg4brka' Check the logs for full command output.
ERROR: Could not find a version that satisfies the requirement horapy
ERROR: No matching distribution found for horapy
```
When indexing a small number of vectors I am getting this error when specifying cosine_similarity (euclidean works fine for instance):
thread 'hora_test' panicked at 'called `Option::unwrap()` on a `None` value', /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/core/neighbor.rs:32:54
stack backtrace:
0: rust_begin_unwind
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:584:5
1: core::panicking::panic_fmt
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/panicking.rs:143:14
2: core::panicking::panic
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/panicking.rs:48:5
3: core::option::Option<T>::unwrap
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/option.rs:752:21
4: <hora::core::neighbor::Neighbor<E,T> as core::cmp::Ord>::cmp
at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/core/neighbor.rs:32:9
5: <hora::core::neighbor::Neighbor<E,T> as core::cmp::PartialOrd>::partial_cmp
at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/core/neighbor.rs:38:14
6: core::cmp::PartialOrd::le
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/cmp.rs:1129:19
7: core::cmp::impls::<impl core::cmp::PartialOrd<&B> for &A>::le
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/cmp.rs:1505:13
8: alloc::collections::binary_heap::BinaryHeap<T>::sift_up
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/collections/binary_heap.rs:562:16
9: alloc::collections::binary_heap::BinaryHeap<T>::push
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/collections/binary_heap.rs:496:18
10: hora::index::hnsw_idx::HNSWIndex<E,T>::search_layer::{{closure}}
at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/index/hnsw_idx.rs:363:25
11: <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::for_each
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/slice/iter/macros.rs:211:21
12: hora::index::hnsw_idx::HNSWIndex<E,T>::search_layer
at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/index/hnsw_idx.rs:353:13
13: hora::index::hnsw_idx::HNSWIndex<E,T>::search_knn
at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/index/hnsw_idx.rs:433:25
14: <hora::index::hnsw_idx::HNSWIndex<E,T> as hora::core::ann_index::ANNIndex<E,T>>::node_search_k
at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/index/hnsw_idx.rs:615:55
15: hora::core::ann_index::ANNIndex::search
at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/core/ann_index.rs:93:9
16: hora_c::hora_test
at ./src/lib.rs:192:13
17: hora_c::hora_test::{{closure}}
at ./src/lib.rs:168:1
18: core::ops::function::FnOnce::call_once
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ops/function.rs:227:5
19: core::ops::function::FnOnce::call_once
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
failures:
hora_test
index = HNSWIndex(dimension, "usize")
---> 16 index.load(ipath)
17
18 # Search
C:\g\vr\lw\python\lib\site-packages\horapy\__init__.py in load(self, path)
78 path: file path
79 """
---> 80 self.ann_idx = self.ann_type.load(path)
81
82 def dump(self, path):
PanicException: called `Result::unwrap()` on an `Err` value: Io(Error { kind: UnexpectedEof, message: "failed to fill whole buffer" })
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.