Comments (13)
Hey there :)... currently working on implementing a retrieval benchmark based on GermanQuAD. Will publish a PR soon and keep you updated here. If you want to chat about it/join the discussion, here's a link to the DiscoResearch discord: https://discord.gg/FBvnqsDS
from mteb.
Working on it here:
https://github.com/DiscoResearch/mteb/tree/germanquad-retrieval
Here are first results for intfloat/multilingual-e5-small
on the test split of deepset/germanquad
.
INFO:root:MRR@1: 0.8720
INFO:root:MRR@3: 0.9091
INFO:root:MRR@5: 0.9130
INFO:root:MRR@10: 0.9139
INFO:root:MRR@100: 0.9149
INFO:root:MRR@1000: 0.9149
Are the scores on the actual HF leaderboard multiplied by 100 or why are they in the range 0-100? 🤔
And I couldn't find the code for the actual HF space. Is it not open source?
from mteb.
Great work! For running the evaluation, they have a section on their HF hub page: https://huggingface.co/intfloat/multilingual-e5-large#mteb-benchmark-evaluation
Therefore only one matching context can be retrieved from the corpus and MRR would be the best metric to score this, correct?
I think you can still use nDCG but MRR is fine too
from mteb.
Given the merged PR it seems like this issue is resolved? Though we might still be missing a german tab on the leaderboard. If that is the case we can create a separate issue on this.
from mteb.
Yes will close this issue then as lots of good development. Thanks all!
Will open a new one here for the German tab and see if / how I can support !
from mteb.
Yeah that'd be great. I'd be happy to add an Overall
German tab once we have ~30 datasets (https://huggingface.co/spaces/mteb/leaderboard)
Note that for Clustering there are some German datasets already thanks to @slvnwhrl who may also be interested in helping out with this effort.
We should aim to minimize MT and use as many human-written datasets as possible I think.
Some datasets from here #174 may also be available in German
from mteb.
Hi,
@Muennighoff thanks for including me. I'd be happy to help. And I agree, there should be enough German open source datasets out there, at least for some of the tasks. To give som suggestions:
Classification:
- SCARE - The Sentiment Corpus of App Reviews with Fine-grained Annotations in German
- Germeval Task 2017: Shared Task on Aspect-based Sentiment in Social Media Customer Feedback
- GermEval2021 - Toxic, Engaging, & Fact-Claiming Comments
- GermEval 2019 Task 1 -- Shared task on hierarchical classification of blurbs (note that the current German clustering benchmark uses the same data)
- Ten Thousand German News Articles Dataset (note that the current German clustering benchmark uses the same data)
Reranking:
These are some German datasets that come to my mind at the moment. I am sure there are more, although, for some tasks it might be harder to find good datasets. I also haven't checked all of the licenses of the listed datasets. This repository could also be of help: German-NLP
from mteb.
Great! Yes they are multiplied by 100 to be from 0-100 in order to make it more readable :)
Everything is open-source - Do you mean this code https://huggingface.co/spaces/mteb/leaderboard/blob/main/app.py ?
from mteb.
Ah yes thank you! :) I knew about the "Files" tab in spaces but somehow overlooked it this time 😅
I'm currently testing with intfloat/multilingual-e5-small
.
As you might know these need to be prompted in a specific way for full capability:
https://huggingface.co/intfloat/multilingual-e5-large#usage
I can't find a corresponding model class in mtebscripts although the e5 embeddings are on the leaderboard.
And how do I contribute a benchmarks specifically?
Everything runs fine with my run_mteb_german.py.
As I understand the README and the other scripts, that's enough and you take care of running it for different embeddings?
If so the only things I see left to do are:
- clean/finish up my fork a bit
- host the GermanQuAD dataset in BEIR format on HF instead of generating it locally
I will create and link a draft PR in a minute, so you can compare the changes more easily ;)
from mteb.
Draft PR: #197
from mteb.
one more note: deepset/germanquad
has only one relevant context per question.
Therefore only one matching context can be retrieved from the corpus and MRR would be the best metric to score this, correct?
from mteb.
FYI there is this German fork already https://github.com/jina-ai/mteb-de?ref=jina-ai-gmbh.ghost.io
from mteb.
FYI there is this German fork already https://github.com/jina-ai/mteb-de?ref=jina-ai-gmbh.ghost.io
Nice, do you plan on opening a PR? Would be to help 🙌
from mteb.
Related Issues (20)
- Convert ThuNews and CLS to Fast
- Fix : Error [test_all_metadata_is_filled ] HOT 2
- Add Similarity Metric Used to leaderboard HOT 6
- Convert TwentyNewsgroups to fast clustering
- Refresh after submitting model issues HOT 2
- Only Spanish is covered in FloresClustering HOT 9
- update automatic metadata construction
- Finalizing MMTEB HOT 4
- Convert Polish Clustering tasks to Fast
- Carbon emission tracker raises error
- Datasets not available on HF
- Convert MLSUM to Fast HOT 2
- Convert MasakhaNews to Fast HOT 3
- Convert BigPatent to Fast HOT 4
- Question about Adding Datasets HOT 3
- [BUG] MalteseNewsClassification fails with KeyError HOT 1
- Convert HALClusteringS2S to Fast
- Convert AlloProfClustering to Fast
- Integrate BIRCO HOT 8
- Inconsistencies in the naming of some tasks fast clustering versions HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mteb.