Giter Site home page Giter Site logo

hora-search / hora Goto Github PK

View Code? Open in Web Editor NEW
2.6K 48.0 72.0 14.51 MB

πŸš€ efficient approximate nearest neighbor search algorithm collections library written in Rust πŸ¦€ .

Home Page: http://horasearch.com/

License: Apache License 2.0

Rust 100.00%
search-engine rust approximate-nearest-neighbor-search artificial-intelligence recommender-system image-search vector-search algorithm data-structures simd

hora's People

Contributors

btv avatar lesmiscore avatar mongkii avatar salamer avatar syndek avatar whiteworld avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hora's Issues

Possible alternative to 'add'

The current index add method adds a single vector. Is it possible to add 2D numpy array of vectors and use the array row position as the index.

# proposal for consideration
a = np.array([[1,2,3],[4,5,6]]
index.add(a)
# currently
for i in a:
   index.add(sample, i).unwrap();

panick when use `CosineSimilarity`

Hi.

With the latest hora, I get a panic when using hora::core::metrics::Metric::CosineSimilarity.

thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/core/neighbor.rs:32:54

I think this is an error caused by partial_cmp.
partial_cmp expect it to be None when the given value cannot be ordered.

Consider adding __version__ attribute to Python API.

version does is not available.

https://www.python.org/dev/peps/pep-0396/

import horapy

dir(horapy)
['BruteForceIndex', 'HNSWIndex', 'HoraANNIndex', 'HoraBruteForceIndexStr', 'HoraBruteForceIndexUsize', 'HoraHNSWIndexStr', 'HoraHNSWIndexUsize', 'HoraIVFPQIndexStr', 'HoraIVFPQIndexUsize', 'HoraPQIndexStr', 'HoraPQIndexUsize', 'HoraSSGIndexStr', 'HoraSSGIndexUsize', 'IVFPQIndex', 'PQIndex', 'SSGIndex', 'builtins', 'cached', 'doc', 'file', 'loader', 'name', 'package', 'path', 'spec', 'horapy', 'numpy']
horapy.version
Traceback (most recent call last):
File "", line 1, in
AttributeError: module 'horapy' has no attribute 'version'

Dot production is not acting as a distance

This line assumes, that dot function returns a simple dot production and negates it to make it work like a distance (smaller = closer)

dot(vec1, vec2).map(|x| -x)

But in reality, dot function is already negated in the SIMDOptmized inplementation:

Ok(-(a.iter().zip(b).map(|(p, q)| p * q).sum::<$type_id>()))

Which leads to double inverting and incorrect use of Dot Production

Including HGG

Hey, I just saw this on Reddit and I am very excited to try this out on my computer vision datasets.

I just created an ANN search data structure somewhat recently called HGG (https://github.com/rust-cv/hgg), and I would love to be able to add it to your collection of nearest neighbor searches (and especially over at https://github.com/hora-search/ann-benchmarks). It is based on HNSW, and I am currently using it for computer vision purposes. Let me know if you need some help integrating it.

IVFPQIndex ?

I'm not sure if this is my problem, (possible).
I am testing the index types all work, (SSG had extensive index times > 24 hours so I aborted that).
IVFPQIndex

I am doing this with all index types, no issues except this case.

    index = IVFPQIndex(dimension, "usize")
    for i,d in tqdm(enumerate(hd[hashtype].tolist())):
        index.add(d,i)

FYI:
print(i,d)
15746524 [248, 225, 188, 223, 199, 174, 144, 146]

Error:

lib\site-packages\horapy\__init__.py in build(self, metrics)
     36 
     37     def build(self, metrics=""):
---> 38         self.ann_idx.build(metrics)
     39 
     40     def add(self, vs, idx=None):

PanicException: attempt to calculate the remainder with a divisor of zero

Return distance metric?

Other ANN libraries return the index array of the closest vectors and optionally the distance metric in a second array.
Is that possible or in future plans?
Thanks.

Can hora's index be serialized to disk?

I have millions of vectors that need to be indexed, and the build speed is extremely slow.
Can I serialize index to disk and then deserialize it into an index?

Extensible index

Hello! Thanks for your wonderful library. Please tell me if supported extensible index. Can I adding elements to an already built index?

Hey hello!! Project turismEC!!

I watch a great marketplace on my country.
I live in Ecuador and the country have good opportunities on the turism market.
I have a idea to connect the people, but I need some help.
I have basic skills on development, and some money to start a project . Thanks.
Psdt: is a great business

A fun fact!

Hey guys,

This isn't an issue. You can close or delete it whenever you wish. I just wanted to share a fun fact.

I discovered "hora" at the top of a "most trending projects" lists and it attracted my attention as "hora" ("Ρ…ΠΎΡ€Π°" in Cyrillic) means "people". I thought, is it possible that someone will name his trending lib after a Bulgarian word? Maybe there was a hidden meaning... I opened and scrolled the repo - what can I see - a list of faces/people. I was right. Someone named the library after a Bulgarian word... only to realise a few moments later that it's named after a Japanese word.

What a funny collision.

Have a great week!

Saving Hora index to disk?

Hi,
I'm using Horapy, and everything works great. But how do I save an index to disk after building it, so that I can use it elsewhere or later?
Tried pickling the index but it gives an error that usize objects can't be pickled.

Node searches on arrays of objects

For the Node version, is it possible to run comparisons on arrays of objects?

For example, for using it with map data, the query would be a tuple of a map point coordinates: [latitude, longitude].

The material to compare against would be an array of objects similar to:

[ { objectId: 123, someOtherObjectData: string, coordinates: [latitude, longitude] }, { next object }, { next object} ... ]

If so, would it be possible to provide a few pointers or a basic example?

Thanks much!

Possible to add sparse elements?

Thanks for the very nice library! I'm interested in using hora for doing nearest neighbor finding in single-cell genomics. The data of interest consist of very high dimensional points (D = 30,000), but for most points, most dimensions have value 0. Therefore, I'd like to avoid (it's not really feasible) to densify the elements before indexing them. Is there some way to provide a custom implementation of the relevant distance metrics for the indexed type such that I don't have to actually insert a dense representation of the points into the index?

Not able to install using pip

So, I tried installing using the pip install horapy, but getting this error:

ERROR: Command errored out with exit status 1:
     command: 'C:\Users\acer\anaconda3\python.exe' 'C:\Users\acer\anaconda3\lib\site-packages\pip\_vendor\pep517\_in_process.py' prepare_metadata_for_build_wheel 'C:\Users\acer\AppData\Local\Temp\tmp6bg4brka'
         cwd: C:\Users\acer\AppData\Local\Temp\pip-install-qsf3wt4q\horapy_417ccdb8395f4cf39fc2475178a42d0c
    Complete output (6 lines):

    Cargo, the Rust package manager, is not installed or is not on PATH.
    This package requires Rust and Cargo to compile extensions. Install it through
    the system's package manager or via https://rustup.rs/

    Checking for Rust toolchain....
    ----------------------------------------
WARNING: Discarding https://files.pythonhosted.org/packages/e2/4e/75206eb830280af8b6e0618bd76b4e3dc98b24f9cc9c3809af14b85c4f35/horapy-0.0.1.tar.gz#sha256=f1be68b8481b348e8a615328923f893678eac03215483abf5ff04304b8f952cc (from https://pypi.org/simple/horapy/). Command errored out with exit status 1: 'C:\Users\acer\anaconda3\python.exe' 'C:\Users\acer\anaconda3\lib\site-packages\pip\_vendor\pep517\_in_process.py' prepare_metadata_for_build_wheel 'C:\Users\acer\AppData\Local\Temp\tmp6bg4brka' Check the logs for full command output.
ERROR: Could not find a version that satisfies the requirement horapy
ERROR: No matching distribution found for horapy
```

Crash using cosine similarity when calling search

When indexing a small number of vectors I am getting this error when specifying cosine_similarity (euclidean works fine for instance):

thread 'hora_test' panicked at 'called `Option::unwrap()` on a `None` value', /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/core/neighbor.rs:32:54
stack backtrace:
   0: rust_begin_unwind
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/panicking.rs:143:14
   2: core::panicking::panic
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/panicking.rs:48:5
   3: core::option::Option<T>::unwrap
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/option.rs:752:21
   4: <hora::core::neighbor::Neighbor<E,T> as core::cmp::Ord>::cmp
             at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/core/neighbor.rs:32:9
   5: <hora::core::neighbor::Neighbor<E,T> as core::cmp::PartialOrd>::partial_cmp
             at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/core/neighbor.rs:38:14
   6: core::cmp::PartialOrd::le
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/cmp.rs:1129:19
   7: core::cmp::impls::<impl core::cmp::PartialOrd<&B> for &A>::le
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/cmp.rs:1505:13
   8: alloc::collections::binary_heap::BinaryHeap<T>::sift_up
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/collections/binary_heap.rs:562:16
   9: alloc::collections::binary_heap::BinaryHeap<T>::push
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/collections/binary_heap.rs:496:18
  10: hora::index::hnsw_idx::HNSWIndex<E,T>::search_layer::{{closure}}
             at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/index/hnsw_idx.rs:363:25
  11: <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::for_each
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/slice/iter/macros.rs:211:21
  12: hora::index::hnsw_idx::HNSWIndex<E,T>::search_layer
             at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/index/hnsw_idx.rs:353:13
  13: hora::index::hnsw_idx::HNSWIndex<E,T>::search_knn
             at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/index/hnsw_idx.rs:433:25
  14: <hora::index::hnsw_idx::HNSWIndex<E,T> as hora::core::ann_index::ANNIndex<E,T>>::node_search_k
             at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/index/hnsw_idx.rs:615:55
  15: hora::core::ann_index::ANNIndex::search
             at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/core/ann_index.rs:93:9
  16: hora_c::hora_test
             at ./src/lib.rs:192:13
  17: hora_c::hora_test::{{closure}}
             at ./src/lib.rs:168:1
  18: core::ops::function::FnOnce::call_once
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ops/function.rs:227:5
  19: core::ops::function::FnOnce::call_once
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


failures:
    hora_test

BruteForceIndex load index from file error.

index = HNSWIndex(dimension, "usize") 


---> 16     index.load(ipath)
     17 
     18     # Search

C:\g\vr\lw\python\lib\site-packages\horapy\__init__.py in load(self, path)
     78             path: file path
     79         """
---> 80         self.ann_idx = self.ann_type.load(path)
     81 
     82     def dump(self, path):

PanicException: called `Result::unwrap()` on an `Err` value: Io(Error { kind: UnexpectedEof, message: "failed to fill whole buffer" })

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.