Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/mkdocs/docs/hle_text_only.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ More details: [HLE text only Dataset on HuggingFace](https://huggingface.co/data
## Dataset Overview

!!! info "HLE Dataset (text only)"
The dataset is a text-only subset of HLE.
The experiments are conducted on the **500 text-only subset** of the HLE dataset, available from [WebThinker](https://github.com/RUC-NLPIR/WebThinker/blob/main/data/HLE/test.json).

---

Expand Down
24 changes: 13 additions & 11 deletions utils/prepare_benchmark/gen_hle_text_only.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,29 +2,31 @@
#
# SPDX-License-Identifier: Apache-2.0


import json
from typing import Generator, MutableMapping

from datasets import load_dataset
import requests

from utils.prepare_benchmark.common import Task


def gen_hle_text_only(hf_token: str) -> Generator[Task, None, None]:
dataset = load_dataset("macabdul9/hle_text_only", split="test", token=hf_token)
for x in dataset:
metadata: MutableMapping = x # type: ignore
task_id = metadata.pop("id")
question = metadata.pop("question")
gt = metadata.pop("answer")
metadata.pop("image_preview")
metadata.pop("rationale_image")
response = requests.get(
"https://raw.githubusercontent.com/RUC-NLPIR/WebThinker/refs/heads/main/data/HLE/test.json"
)
Copy link

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing error handling for the HTTP request. If the request fails or returns non-200 status, the code will fail silently or raise an unclear exception. Add status code checking and provide a clear error message if the request fails.

Suggested change
)
)
response.raise_for_status()

Copilot uses AI. Check for mistakes.
dataset = json.loads(response.content)
Copy link

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using response.content (bytes) with json.loads() may cause issues. Use response.json() instead, which handles decoding automatically and is the standard approach for JSON responses.

Suggested change
dataset = json.loads(response.content)
dataset = response.json()

Copilot uses AI. Check for mistakes.
for row in dataset:
metadata: MutableMapping = row
task_id = str(metadata.pop("id", ""))
Copy link

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting the ID to string with an empty string default may create invalid task IDs if 'id' is missing. Consider raising an error or using a more meaningful default value to ensure task_id is always valid.

Suggested change
task_id = str(metadata.pop("id", ""))
try:
task_id = str(metadata.pop("id"))
except KeyError:
raise ValueError(f"Missing 'id' field in row: {row}")

Copilot uses AI. Check for mistakes.
question = metadata.pop("Question", "")
answer = metadata.pop("answer", "")
task = Task(
task_id=task_id,
task_question=question,
ground_truth=gt,
ground_truth=answer,
file_path=None,
metadata=metadata,
)
yield task

return