-
Notifications
You must be signed in to change notification settings - Fork 155
fix(benchmark): fix hle text only #87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -2,29 +2,31 @@ | |||||||||||
| # | ||||||||||||
| # SPDX-License-Identifier: Apache-2.0 | ||||||||||||
|
|
||||||||||||
|
|
||||||||||||
| import json | ||||||||||||
| from typing import Generator, MutableMapping | ||||||||||||
|
|
||||||||||||
| from datasets import load_dataset | ||||||||||||
| import requests | ||||||||||||
|
|
||||||||||||
| from utils.prepare_benchmark.common import Task | ||||||||||||
|
|
||||||||||||
|
|
||||||||||||
| def gen_hle_text_only(hf_token: str) -> Generator[Task, None, None]: | ||||||||||||
| dataset = load_dataset("macabdul9/hle_text_only", split="test", token=hf_token) | ||||||||||||
| for x in dataset: | ||||||||||||
| metadata: MutableMapping = x # type: ignore | ||||||||||||
| task_id = metadata.pop("id") | ||||||||||||
| question = metadata.pop("question") | ||||||||||||
| gt = metadata.pop("answer") | ||||||||||||
| metadata.pop("image_preview") | ||||||||||||
| metadata.pop("rationale_image") | ||||||||||||
| response = requests.get( | ||||||||||||
| "https://raw.githubusercontent.com/RUC-NLPIR/WebThinker/refs/heads/main/data/HLE/test.json" | ||||||||||||
| ) | ||||||||||||
| dataset = json.loads(response.content) | ||||||||||||
|
||||||||||||
| dataset = json.loads(response.content) | |
| dataset = response.json() |
Copilot
AI
Oct 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converting the ID to string with an empty string default may create invalid task IDs if 'id' is missing. Consider raising an error or using a more meaningful default value to ensure task_id is always valid.
| task_id = str(metadata.pop("id", "")) | |
| try: | |
| task_id = str(metadata.pop("id")) | |
| except KeyError: | |
| raise ValueError(f"Missing 'id' field in row: {row}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing error handling for the HTTP request. If the request fails or returns non-200 status, the code will fail silently or raise an unclear exception. Add status code checking and provide a clear error message if the request fails.