-
|
Hi, I see these files but outdated, average scores don't match with the current average scores shown on the MTEB leaderboard. |
Beta Was this translation helpful? Give feedback.
Answered by
Samoed
Oct 31, 2025
Replies: 1 comment 7 replies
-
|
You can do this by: import mteb
from mteb.cache import ResultCache
cache = ResultCache()
# cache.download_from_remote()
tasks = mteb.get_tasks(task_types=["Retrieval"])
results = cache.load_results(["sentence-transformers/all-MiniLM-L12-v2"], tasks)
scores = []
for model_results in results.model_results: # multiple revisions
for task_result in model_results.task_results:
for split_name, split_subsets in task_result.scores.items():
for task_subset in split_subsets:
if "ndcg_at_10" in task_subset:
scores.append(task_subset["ndcg_at_10"])
print(sum(scores)/len(scores)) |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This can be caused by that tasks results are duplicated between revision, and you're selecting all splits and subsets for these tasks (FiQA have dev, test, train).