Skip to content

Conversation

@ntudy
Copy link
Contributor

@ntudy ntudy commented Oct 16, 2025

Describe this PR

What changed?

Why?

Related issues

Checklist for PR

  • Write a descriptive PR title following the Angular commit message format: <type>(<scope>): <subject>

    • Examples: feat(agent): add pdf tool via mcp, perf: make llm client async, fix(utils): load custom config via importlib
    • Valid types: feat, fix, docs, style, refactor, perf, test, build, ci, revert
    • The check-pr-title CI job will validate your title format
    • Bad title examples and why they fail:
      • Update README ❌ Missing type and colon
      • feat add new feature ❌ Missing colon after type
      • Feature: add new tool ❌ Invalid type (should be feat)
      • feat(Agent): add tool ❌ Scope should be lowercase
      • feat(): add tool ❌ Empty scope not allowed
      • feat(my_scope): add tool ❌ Underscores not allowed in scope
      • feat(my space): add tool ❌ Space not allowed in scope
      • feat(scope):add tool ❌ Missing space after colon
      • feat(scope): ❌ Empty subject
  • Run lint and format locally:

@ntudy ntudy requested review from BinWang28 and Copilot October 16, 2025 07:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the HLE text-only benchmark data loading mechanism by switching from the HuggingFace dataset to the raw JSON data source from the WebThinker repository. The change removes the dependency on the HuggingFace datasets library for this specific benchmark and updates field names to match the new data source format.

  • Replaced HuggingFace dataset loading with direct HTTP request to WebThinker GitHub repository
  • Updated field name mappings (questionQuestion, answeranswer, etc.)
  • Updated documentation to clarify the dataset source and size

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
utils/prepare_benchmark/gen_hle_text_only.py Replaced dataset loading logic with direct JSON fetch from WebThinker repository and updated field name handling
docs/mkdocs/docs/hle_text_only.md Updated dataset description to reference WebThinker source and specify 500 text-only subset

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

metadata.pop("rationale_image")
response = requests.get(
"https://raw.githubusercontent.com/RUC-NLPIR/WebThinker/refs/heads/main/data/HLE/test.json"
)
Copy link

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing error handling for the HTTP request. If the request fails or returns non-200 status, the code will fail silently or raise an unclear exception. Add status code checking and provide a clear error message if the request fails.

Suggested change
)
)
response.raise_for_status()

Copilot uses AI. Check for mistakes.
response = requests.get(
"https://raw.githubusercontent.com/RUC-NLPIR/WebThinker/refs/heads/main/data/HLE/test.json"
)
dataset = json.loads(response.content)
Copy link

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using response.content (bytes) with json.loads() may cause issues. Use response.json() instead, which handles decoding automatically and is the standard approach for JSON responses.

Suggested change
dataset = json.loads(response.content)
dataset = response.json()

Copilot uses AI. Check for mistakes.
dataset = json.loads(response.content)
for row in dataset:
metadata: MutableMapping = row
task_id = str(metadata.pop("id", ""))
Copy link

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting the ID to string with an empty string default may create invalid task IDs if 'id' is missing. Consider raising an error or using a more meaningful default value to ensure task_id is always valid.

Suggested change
task_id = str(metadata.pop("id", ""))
try:
task_id = str(metadata.pop("id"))
except KeyError:
raise ValueError(f"Missing 'id' field in row: {row}")

Copilot uses AI. Check for mistakes.
@BinWang28 BinWang28 merged commit e1cbf9a into miroflow-v0.3 Oct 16, 2025
3 checks passed
@BinWang28 BinWang28 deleted the fix-hle-text branch October 16, 2025 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants