Comments (6)
I think you can already use knn classifier to evaluate classification tasks, with method="kNN"
:
However this is just a different way to train a classifier, there is no particular "kNN graph quality" metrics. You also cannot use knn classification evaluation on clustering datasets like BiorxivP2P, for this we would need to create train sets for clustering data. I don't know if the samples within clustering datasets would provide meaningful insights when used for classification, and it would require a clear methodology to ensure that the kNN graph quality metrics correctly reflects unsupervised learning performance
Is method="kNN"
satisfactory, or do you think it would be useful to add dedicated evaluations for kNN graphs ?
from mteb.
Thanks @loicmagne for your reply!
I think you can already use knn classifier to evaluate classification tasks, with method="kNN"
Oh, that's cool, I did not realize that.
Is method="kNN" satisfactory, or do you think it would be useful to add dedicated evaluations for kNN graphs ?
I am not sure. In principle one can compute kNN accuracy on the entire dataset without any train/test split by explicitly constructing the full kNN graph (it's implicitly a leave-one-out procedure):
from sklearn.neighbors import NearestNeighbors
from scipy.stats import mode
def knn_accuracy_loocv(X, y, n_neighbors=10):
neigh = NearestNeighbors(n_neighbors=n_neighbors).fit(X)
knn = neigh.kneighbors(return_distance=False) # returns kNN graph of X
yhat = mode(y[knn], axis=1).mode.flatten() # kNN classifier predictions
return np.mean(yhat == y) # kNN accuracy
This is nice because it evaluates kNN accuracy over the entire dataset. But in practice running KNeighborsClassifier
on a train/test split would yield close results. So it's not a huge difference.
You also cannot use knn classification evaluation on clustering datasets like BiorxivP2P, for this we would need to create train sets for clustering data.
That's perhaps the biggest problem right now. Datasets like BiorxivP2P seem to me to be very good candidates for this metric. Would it make sense to create train/test splits for all of them? Then any classification metric could be run on them, including kNN. Can't one simply create train/test split at runtime and fix the random seed so that it's deterministic?
Alternative option would be to implement kNN graph evaluation like I suggested above without any train/test split.
from mteb.
Thanks for your reply, those are good options. I don't know how it should be integrated with the MTEB lib, new task? new evaluator? @Muennighoff what do you think?
from mteb.
Really cool discussion. I think it'd be interesting to have it as an option (while the default remains as is)
from mteb.
I think it'd be interesting to have it as an option (while the default remains as is)
Hi @Muennighoff, I'm not quite sure what you mean here. To have it as an option where exactly?
from mteb.
If you think it's better as a standalone evaluator (not an option for one of the existing ones) that's fine too I think
from mteb.
Related Issues (20)
- wrong `eval_langs` description in all `*PLRetrieval` tasks HOT 1
- MTEB Leaderboard HOT 5
- Unknown tasks: SummEval. HOT 4
- Benchmarks for Titan Embeddings HOT 3
- [Bug] scripts/mteb_meta.py doesn't support C_MTEB
- how to submit results for open ai models and similar ? HOT 2
- Use MTEB evaluation via API HOT 2
- Retrieval task loading HF dataset HOT 4
- Parallel retrieval is automatically selected during fabric training HOT 3
- How to evaluate on BitextMining and Summarization datasets HOT 2
- How to add private model to mteb leaderboard HOT 2
- Adding German to MTEB HOT 13
- Why those two links redirect to bioRxiv rather than Medrxiv? HOT 2
- The ms marco v2 dataset download link fail to redirect HOT 2
- SweFAQ seems to misdirect... HOT 4
- Dalaj seems to misdirect HOT 2
- Broken link for SummEval HOT 1
- Is it `MMarcoReranking` rather than `MMarcoRetrieval`? HOT 2
- Which version is used for the retrieval? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mteb.