Comments (7)
/assign @liliu-z
please take a look
/unassign
from milvus.
don't quite understand it.
What is the current problem?
The distance result seems to be correct
from milvus.
I think searching an exact vector for himself should return 0 as distance, but it returns -1 if cosine.
from milvus.
I think searching an exact vector for himself should return 0 as distance, but it returns -1 if cosine.
exact vector should be 1. 0 means not related at all
from milvus.
it's opposite direction, then the distance is -1.
from milvus.
Yes, that's exactly the problem. The values I get back from the search query are incorrect.
If you look closer to the example I provided above, I've added three vectors to the database ([1, 1, 1, 1]
, [1, 1, -1, -1]
, [-1, -1, -1, -1]
) and then searched for a vector [-1, -1, -1, -1]
. As a result, I got distance values as following:
[1, 1, 1, 1]
->1.0
[1, 1, -1, -1]
->0.0
[-1, -1, -1, -1]
->-1.0
(exact search)
Those are incorrect values, neither for cosine similarity nor cosine distance. From the behaviour I'm seeing, those values are representing cosine similarity multiplied by -1. But... why?
Here is a snippet of code using scipy & scikit-learn computing those metrics on the same vectors:
>>> from scipy.spatial import distance
>>> distance.cosine([-1, -1, -1, -1], [1, 1, 1, 1])
2.0
>>> distance.cosine([-1, -1, -1, -1], [1, 1, -1, -1])
1.0
>>> distance.cosine([-1, -1, -1, -1], [-1, -1, -1, -1])
0.0
>>> from sklearn.metrics.pairwise import cosine_similarity
>>> cosine_similarity([[-1, -1, -1, -1]], [[1, 1, 1, 1]])
array([[-1.]])
>>> cosine_similarity([[-1, -1, -1, -1]], [[1, 1, -1, -1]])
array([[0.]])
>>> cosine_similarity([[-1, -1, -1, -1]], [[-1, -1, -1, -1]])
array([[1.]])
from milvus.
/assign @liliu-z
from milvus.
Related Issues (20)
- [Bug]: multi-col top-k unexpected HOT 30
- [Bug]: Different collection names returned across V1 and V2 of API HOT 3
- [Bug]: flush timeout after upgrading from v2.3.4 to 2.4-20240517-780f3137-amd64 HOT 5
- [Bug]: flush failed with error `can not find session: node not found` after etcd pod failure chaos test HOT 5
- [Bug]: flush timeout after datanode pod kill chaos test HOT 3
- [Bug]: Lack of handling for L0 segments in binlog import HOT 11
- [Bug]: When importing a sparse vector, if the format is coordinate list, it will fail. HOT 3
- [Enhancement]: Add config to control whether to init the public role privilege
- [Bug]: L0 compactor are leaked in compaction executor forever HOT 1
- [Bug]: milvus pulsar error too many requests to the same bookie HOT 2
- can not load collection[Bug]: HOT 12
- [Enhancement]: CompactionExecutor use a pool HOT 3
- [Enhancement]: use proto for passing index params for cgo HOT 1
- [Bug]: [CI] Delete using varchar datatype will fail HOT 3
- [Bug]: Milvus can't generate traceID when set trace exporter to noop. HOT 1
- [Bug]: Search return result less than limit cause E2e failed. HOT 1
- [Feature]: When importing Parquet files, you can ignore some built-in index columns. HOT 6
- [Bug]: Creating a collection without a vector field was successful HOT 3
- [Bug]: NodeNotFound when trying to load collection after milvus update and bulk insert HOT 2
- [Bug]: [CI] integration test failed due to DATA RACE detected in internal/datacoord/server.go HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from milvus.