Giter Site home page Giter Site logo

zhangjun / ivf-hnsw Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dbaranchuk/ivf-hnsw

1.0 0.0 0.0 12.42 MB

Code for ECCV2018 paper: Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors

License: MIT License

CMake 1.61% C++ 96.04% Python 1.96% Shell 0.39%

ivf-hnsw's Introduction

Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors

This is the code for the current state-of-the-art billion-scale nearest neighbor search system presented in the paper:

Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors,
Dmitry Baranchuk, Artem Babenko, Yury Malkov

The code is developed upon the FAISS library.

Build

Today we provide the C++ implementation supporting only the CPU version, which requires a BLAS library.

The code requires a C++ compiler that understands:

  • the Intel intrinsics for SSE instructions
  • the GCC intrinsic for the popcount instruction
  • basic OpenMP

Installation instructions

  1. Clone repository

git clone https://github.com/dbaranchuk/ivf-hnsw --recursive

  1. Configure FAISS

There are a few models for makefile.inc in the faiss/example_makefiles/ subdirectory. Copy the relevant one for your system to faiss/ and adjust to your needs. In particular, for ivf-hnsw project, you need to set a proper BLAS library paths. There are also indications for specific configurations in the troubleshooting section of the FAISS wiki

  1. Replace FAISS CMakeList.txt

Replace faiss/CMakeList.txt with CMakeList.txt.faiss in order to deactivate building of unnecessary tests and the GPU version.

mv CMakeLists.txt.faiss faiss/CMakeLists.txt

  1. Build project

cmake . && make

Data

The proposed methods are tested on two 1 billion datasets: SIFT1B and DEEP1B. For using provided examples, all data files have to be in data/SIFT1B and data/DEEP1B.

Data files:

Note: precomputed indices are optional, as it just lets avoid assigning step, which takes about 2-3 days for 2^20 centroids.

Run

tests/ provides two tests for each dataset:

  • IVFADC
  • IVFADC + Grouping (+ Pruning)

Each test requires many options, so we provide bash scripts in examples/, exploiting these tests. Scripts are commented and the Parser class provides short descriptions for each option.

Make sure that:

  • models/SIFT1B/ and models/DEEP1B/ exist

mkdir models && mkdir models/SIFT1B && mkdir models/DEEP1B

  • the data is placed to data/SIFT1B/ and data/DEEP1B/ respectively (or just make symbolic links)
  • run, for example:

bash examples/run_deep1b_grouping.sh

Documentation

The doxygen documentation gives per-class information

ivf-hnsw's People

Contributors

dbaranchuk avatar yurymalkov avatar grihabor avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.