Comments (2)
I also got the duplicate problem (but didn't get any search time problem). The duplicate problem occurred even when I tried to search with about 1000-NN (tested both BKT and KDT with data with about 100 dimension, and size of around 100,000 using Cosine as distance method), and it happened not only when using Python wrapper, but also when directly using IndexSearcher (so I guess the problem lies withing the source code).
I'm currently reading the source code, but since I'm not very familiar with c++, I haven't found any problem yet. Anyone has some ideas or solutions on this?
My environment:
Python 2.7
Windows 10/Ubuntu 16.04
from sptag.
So I was focusing on testing different parameters in the past two weeks, and didn't look too deep into the source code. But I do temporally solve the duplicate problem by adding a check when adding a node to the result query (QueryResultSet.h under SPTAGLib):
bool AddPoint(const int index, float dist)
{
for (int i = 0; i < m_results.Length(); i++) {
if (index == m_results[i].VID) {
return false;
}
}
if (dist < m_results[0].Dist || (dist == m_results[0].Dist && index < m_results[0].VID))
{
m_results[0].VID = index;
m_results[0].Dist = dist;
Heapify(m_resultNum);
return true;
}
return false;
}
Although making this change can avoid the duplicates, I believe that this is not the alternative solution as the same node should not be able to reach AddPoint function more than once (maybe something wrong when checking the hash table, haven't looked at the hash table yet).
@Arctanxy Have you got a better solution? Also, did you get good result on your face dataset? I'm also testing with a face database (256 dimension, and using L2 distance), but did not get good result (have tried to tuning almost all the parameters.) But do get good results on all the datasets on https://github.com/erikbern/ann-benchmarks
from sptag.
Related Issues (20)
- Logger is not configurable until after it's been used
- SPATAG build failure with cmake HOT 3
- Not enough memory on host devices offered by Azure
- thread local context (#359) causes test issue on Linux: `1: [4] fid:0 channel 2, to submit:64, submitted:Operation not permitted` HOT 18
- This repo is missing important files
- build failure with GCC 13 due to missing `#include <cstdint> in `AnnService/inc/Helper/DiskIO.h`
- index.Save function doesn't create the file on disk HOT 1
- How to Search SPANN SSD Index???
- Import SPTAG Failed
- Missing m_SPTQueue.insert()?
- [QUESTION] How to start a online server for service.ini?
- Multiple connection for one server (to be assigned for each client)
- How to use distribute server? HOT 2
- compile failed with bug
- KMeans clustering
- Improve the BalancedDataPartition program
- Building a 1000W BKT index crashes
- Unable to download vectors_9.bin and vectors_12.bin using git lfs pull
- double free or corruption (out) during Search
- Want to know some features of sptag
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sptag.