Skip to content
Discussion options

You must be logged in to vote

This can be caused by that tasks results are duplicated between revision, and you're selecting all splits and subsets for these tasks (FiQA have dev, test, train).

import mteb

b = mteb.get_benchmark("MTEB(eng, v2)")
retrieval = [t for t in b.tasks if t.metadata.type == "Retrieval"]
results = mteb.load_results(models=["sentence-transformers/all-MiniLM-L12-v2"], tasks=retrieval).join_revisions()

scores = []
for model_results in results.model_results: # multiple revisions
    for task_result in model_results.task_results:
        for split_name, split_subsets in task_result.scores.items():
            for task_subset in split_subsets:
                if "main_score" in task_subset:  # for …

Replies: 1 comment 7 replies

Comment options

You must be logged in to vote
7 replies
@nseidan
Comment options

@Samoed
Comment options

@nseidan
Comment options

@Samoed
Comment options

Answer selected by nseidan
@nseidan
Comment options

@KennethEnevoldsen
Comment options

@Samoed
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants