Giter Site home page Giter Site logo

erikbern / ann-benchmarks Goto Github PK

View Code? Open in Web Editor NEW
4.6K 116.0 691.0 21.94 MB

Benchmarks of approximate nearest neighbor libraries in Python

Home Page: http://ann-benchmarks.com

License: MIT License

Python 88.14% HTML 2.61% Dockerfile 9.25%
nearest-neighbors benchmark docker

ann-benchmarks's People

Contributors

aaalgo avatar ale-f avatar alexklibisz avatar ankane avatar benwtrent avatar ceh avatar erikbern avatar etiennedi avatar generall avatar guilhemn avatar hhy3 avatar jeadie avatar jtibshirani avatar khylonwong avatar kmrmt avatar kshivendu avatar lmcinnes avatar masajiro avatar maumueller avatar raoufchebri avatar sammymax avatar sdu-l avatar searchivarius avatar shikharj avatar stephenleo avatar tahle avatar thomasahle avatar toregge avatar www avatar yurymalkov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ann-benchmarks's Issues

New run May 2018

Have been running it on a c5.4xlarge for about 36h. It's still on glove-100-angular and it's mostly done:

image

Not sure what's up with NGT... think maybe it timed out and we should increase timeout a bit. Especially since we're now running many searches in the same container and CPU is limited to 1. I'll double it from 2h to 4h I think.

the module 'ann_benchmarks.algorithms.faiss_gpu' could not be loaded How can I solve it?

python run.py --docker-tag ann-benchmarks-faiss --dataset gist-960-euclidean --run-disabled --local --batch

ann_benchmarks.algorithms.faiss_gpu.FaissGPU([4096, 1]): warning: the module 'ann_benchmarks.algorithms.faiss_gpu' could not be loaded; skipping
ann_benchmarks.algorithms.nmslib.NmslibReuseIndex([u'euclidean', 'sw-graph', {'NN': 16}, False]): warning: the module 'ann_benchmarks.algorithms.nmslib' could not be loaded; skipping
ann_benchmarks.algorithms.panns.PANNS([u'euclidean', 10, 50]): warning: the module 'ann_benchmarks.algorithms.panns' could not be loaded; skipping
ann_benchmarks.algorithms.nmslib.NmslibReuseIndex([u'euclidean', 'vptree', {'tuneK': 10, 'desiredRecall': 0.4}, False]): warning: the module 'ann_benchmarks.algorithms.nmslib' could not be loaded; skipping
ann_benchmarks.algorithms.nearpy.NearPy([u'euclidean', 16, 20]): warning: the module 'ann_benchmarks.algorithms.nearpy' could not be loaded; skipping
ann_benchmarks.algorithms.mrpt.MRPT([u'euclidean', 25, 2]): warning: the module 'ann_benchmarks.algorithms.mrpt' could not be loaded; skipping
ann_benchmarks.algorithms.nmslib.NmslibReuseIndex([u'euclidean', 'vptree', {'tuneK': 10, 'desiredRecall': 0.9}, False]): warning: the module 'ann_benchmarks.algorithms.nmslib' could not be loaded; skipping
ann_benchmarks.algorithms.hnswlib.HnswLib([u'euclidean', {'efConstruction': 500, 'M': 48}]): warning: the module 'ann_benchmarks.algorithms.hnswlib' could not be loaded; skipping
ann_benchmarks.algorithms.pynndescent.PyNNDescent([u'euclidean', 10, 4, 20]): warning: the module 'ann_benchmarks.algorithms.pynndescent' could not be loaded; skipping
ann_benchmarks.algorithms.faiss_gpu.FaissGPU([1024, 10]): warning: the module 'ann_benchmarks.algorithms.faiss_gpu' could not be loaded; skipping
ann_benchmarks.algorithms.faiss.FaissIVF([u'euclidean', 8192]): warning: the module 'ann_benchmarks.algorithms.faiss' could not be loaded; skipping
ann_benchmarks.algorithms.hnswlib.HnswLib([u'euclidean', {'efConstruction': 500, 'M': 8}]): warning: the module 'ann_benchmarks.algorithms.hnswlib' could not be loaded; skipping
ann_benchmarks.algorithms.nmslib.NmslibReuseIndex([u'euclidean', 'vptree', {'tuneK': 10, 'desiredRecall': 0.2}, False]): warning: the module 'ann_benchmarks.algorithms.nmslib' could not be loaded; skipping
ann_benchmarks.algorithms.faiss.FaissIVF([u'euclidean', 512]): warning: the module 'ann_benchmarks.algorithms.faiss' could not be loaded; skipping
ann_benchmarks.algorithms.hnswlib.HnswLib([u'euclidean', {'efConstruction': 500, 'M': 24}]): warning: the module 'ann_benchmarks.algorithms.hnswlib' could not be loaded; skipping
ann_benchmarks.algorithms.faiss.FaissIVF([u'euclidean', 1024]): warning: the module 'ann_benchmarks.algorithms.faiss' could not be loaded; skipping
ann_benchmarks.algorithms.flann.FLANN([u'euclidean', 0.9]): warning: the module 'ann_benchmarks.algorithms.flann' could not be loaded; skipping

float32 vs float64

It appears that the benchmark has switched from float32 to float64 quite a while ago, and this somehow contributed to a substantial performance regression of KGraph, which is only optimized for float32 although float64 also runs.

Below are my justifications for using float32 in the context of approximate K-NN search:

  • The search process is usually approximate, and the feature data are usually noisy by nature, so typically the precision of float64 is not necessary.
  • The necessity of an index to speed up search presumes that data is big. So a factor of 2x in the cost just to store the data in memory is significant.
  • For some algorithm/implementation, using float32 means more data can be fit into cache and the code being much faster.

I believe the performance behavior of some library will be very different for float32 and float64 (though I believe the relative ranking of different libraries should roughly stay the same). If only one can be chosen, I feel it should be float32 so that the benchmark represents the typical use case.

Is there a particular reason to keep float64, or it's better supported by some libraries in the benchmark?
(There's another reason why KGraph is slow which I'm still investigating.)

plot.py does not plot

I used python plot.py --dataset=glove-100-angular to plot the results after the simulations are done. The results are in results/glove-100-angular/10.
But it does not plot. Instead, it says Exception: Nothing to plot
I tracked into compute_metrics in utils.py and found that the res is empty.
Can you help me out?

benchmark running for too long

I've been running the benchmark on a machine that shouldn't be much slower than c4.2xlarge (core i7-2600k @ 3.4GHz). It's been 5 days and I only see 195 lines in sift.txt (the plot looks good though). There are 321 lines in the standard glove.txt, and 82 lines in the standard sift.txt which I think is incomplete. So I estimate it will take 15 - 20 days for me to finish running everything. Is it supposed to be so?

I'm attaching the SIFT plot I produced with the data I have so far, which is coherent to the Glove plot I see from the project page. (KGraph bug fixed.)

xx

Check README or datasets

In README it says that GloVe datasets have around 60k queries, while in the files there are 10k of them.

for help: code raise a exception

when runing install.py this line of code raise a exception,
subprocess.check_call( 'docker build \ --rm -t ann-benchmarks -f install/Dockerfile .', shell=True)
Exception is:
subprocess.CalledProcessError: Command 'docker build --rm -t ann-benchmarks -f install/Dockerfile .' returned non-zero exit status 1.
I hope you can help me

Batch queries and single-threaded mode

There are currently two relicts in the code which are not accessible since the arguments for them have been removed. What shall we do with them?

  1. Running an algorithm with runner.py has a force_single which avoids the thread-pooling mechanism altogether, independent of the use_threads method of an algorithm. I find it nice for checking whether the ThreadPool decreases performance.

  2. Running an algorithm also has support for batch mode and each algorithm already implements batch_query. However, only faiss_gpu.py currently implements an actual batch mode. However, HNSW and faiss implementations actually have good batch query implementations and they can easily be included to the benchmark in a different section.

Zero vectors in nytimes-256-angular

I found that nytimes-256-angular and nytimes-16-angular datasets have hundreds of zero-valued vectors.
The problem is that the cosine distance to such vectors is undefined, making the search an ill-defined task.
Probably, such vectors should be removed. Another option is using unnormalized inner product distance instead of cosine.

Random datasets too easy

The random blobs created from sklearn seem to be far too easy to solve, see screenshot below. I propose to bring back the old random datasets that use https://github.com/maumueller/random-inputs/. For me the easiest way would be to create them, put them on a website, add a function to datasets.py that fetches it and transforms it correctly.

image

Which libraries support incremental item additions?

Thanks for putting this project together. Is there any way a table or small list could be included to describe which libraries support incremental item additions? I imagine for many this is an important criteria in real-time or near-real-time applications where new content is continuously added and needs to be indexed for look-ups.

Remove FALCONN from benchmarks

Please, remove FALCONN from the benchmarks. It's not properly tuned for the new settings (100 neighbors instead of 10 etc), and the results are confusing.

We might later add it back (the new, faster version), but meanwhile I don't want the results to be here.

Are numerical results available as well?

Right now, I can see only plots in the repo, which are pretty useless for the "external" comparison as such.

In particular, is HNSW still using hours for preprocessing? :) There does not seem to be a way to check now.

Split up building and query parameters

Currently, we spend a lot of time basically building the same data structure over and over again. What we could do instead, is split up build and query parameters. For example, Annoy is currently configured by

    annoy:
      constructor: Annoy
      base-args: ["@metric"]
      run-groups:
        annoy:
          args: [[100, 200, 400], [100, 200, 400, 1000, 2000, 4000, 10000,
                20000, 40000, 100000, 200000, 400000]] # number of trees x number of points to gather

It would save a lot of time, if one were to separate query and build parameters in the following way:

    annoy:
      constructor: Annoy
      base-args: ["@metric"]
      run-groups:
        annoy:
          build-args: [[100, 200, 400]] # number of trees 
          query-args: [[100, 200, 400, 1000, 2000, 4000, 10000,
                20000, 40000, 100000, 200000, 400000]] # number of points to gather during query

The idea would now be to build the data structure for each build-args group, and query it for all (combinations of) query-args. This should be rather straight-forward to do.

Should be labeled as "enhancement".

Licence?

Much appreciate addition of a licence file.

About SPTAG test results

I tested it with a GIST data set,The size of the query vector is (1000*960),Return 100 neighbors,I used the official default parameters,The time I built the index is 20562.084 s,The query time is 35.082 s,recall is 0.92669,The time to add 100 vectors is 22.281 s. I feel this time is very long, am I using it correctly? Its index construction and query time are longer than NSG. Can you give SPTAG test reference results?
my machine:
Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz
Ubuntu 18.04
python3

Request: Provide nearest neighbors for training data

For datasets like SIFT 1M, it would be great if nearest neighbors were provided for training data as well. From the perspective of computational cost, the test data is very small and users can create its metrics themsleves. But if you could provide the NNs for 1M training points that would help those who want to use the larger datasets.

Run ONNG for nytimes

@maumueller
I resolved the problem (#93) of the results due to invalid zero vectors in the queries. Sorry for troubling you, but could you rebuild NGT and recreate results for the nytimes dataset?
Thank you in advance.

index construction time plotting

I plotted index construction time using data saved in glove.txt and see a spread of almost 2 orders of magnitude (hnsw(nmslib) index construction might have been parallelized according to code). I'm most interested in the performance of sw-graph vs hnsw. It's very hard to say which one is better because one might be able to improve the search performance of sw-graph by building more aggressive indices. I collaborated with Princeton folks on developing a hierarchical version of the algorithm behind KGraph and the results were negative. Initially there was noticeable speedup at the same accuracy, but after carefully controlling index construction setup, the speedup was gone.

I also suggest that we establish a standard hardware configuration to run the benchmark. Preferrably some type of AWS instance which can be easily reproduced. I think something close to your current setup should be good.

build
throughput

10X performance difference between multi-thread and single thread tests

Good day!

I have measured the difference in performance when switching the use_threads parameter from True to False on hnsw (nmslib). As it turns out, the difference is really huge, ~ 10X on 1M SIFT for low recalls (see enclosed plots for a 6-core intel cpu), and likely is caused by the python-part overhead of the tester scripts.

Multi-thread (default):
sift_mt
Single-thread (use_threads=False):
sift_st

This makes the current results questionable, especially when compared to FALCONN (which seems to be the only algorithm which takes advantage of not using multi-thread tests). Multithread tests also significantly change the shape of the recall-QPS curve (flat vs rapidly decreasing QPS at low recall).

I suggest using the single thread version for all algorithms. Otherwise, for the case of relatively high QPS (~>2 kps) we just measure the performance of the tester script.

disable nearpy, panns, dolphinnpy

They seem pretty bad. I don't mind keeping the code, but probably makes sense to disable them by default. They add a bunch of noise to the plots and also take a disproportionate amount of time during benchmarks

Website Title Isn't Very Descriptive

Just a suggestion, I think having a more descriptive <title> tag on the website's homepage would be helpful. Right now, it's simply ANN-Benchmarks. It might be better as "ANN-Benchmarks: Approximate Nearest-Neighbors Algorithm Benchmarks" or something like that.

Noob question: OS you used and requirements

Hello @erikbern and team,

First of all, let me congratulate you for your fine job, it is amazing :D

I'm trying to run your benchmark for K = 1 (seems like performance changes). I'm using Ubuntu 16.04 and python3.5.

While I was installing (sudo python3 install.py) I noticed it installs lots of dependencies from bionic beaver. Is Bionic Beaver (18.04) or any other OS a requirement?

Colors

@erikbern Your choices of colors make the charts really hard to read... and I am cannot fathom what it would be for a color blind person. It would be great if you can pick something with more contrast and possibly avoid shades of blue/green/orange. Thanks!

Nearpy distance is Cosine by default - Euclidean not applied

Hi,

I was checking some libraries and realized that in the case of Nearpy, the euclidean distance is not applied in the benchmark. As can be seen in the docs of Nearpy, the default is the cosine but the euclidean one is not applied. I will PR changes
thanks for the bench, it's great

Is there any plan to compare with Rayuela.jl ??

Is there any plan to compare with Rayuela.jl ??

https://github.com/una-dinosauria/Rayuela.jl
This is the code for the paper
Julieta Martinez, Shobhit Zakhmi, Holger H. Hoos, James J. Little. LSQ++: Lower running time and higher recall in multi-codebook quantization. In ECCV 2018. [CVF].

Rayuela.jl is a package that implements non-orthogonal multi-codebook quantization methods (MCQ). These methods are useful for fast search of high-dimensional (d=100s-1000s) dense vectors. Rayuela is written in Julia.

if I only use the MNIST dataset?

hi,dear,
for the project may cost a long time,so I just want to try a little Dataset,
maybe the MNIST will be okey
so how to modify the codes ?
thx

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.