Skip to content

Conversation

@ntudy
Copy link
Contributor

@ntudy ntudy commented Oct 14, 2025

Describe this PR

What changed?

Why?

Related issues

Checklist for PR

  • Write a descriptive PR title following the Angular commit message format: <type>(<scope>): <subject>

    • Examples: feat(agent): add pdf tool via mcp, perf: make llm client async, fix(utils): load custom config via importlib
    • Valid types: feat, fix, docs, style, refactor, perf, test, build, ci, revert
    • The check-pr-title CI job will validate your title format
    • Bad title examples and why they fail:
      • Update README ❌ Missing type and colon
      • feat add new feature ❌ Missing colon after type
      • Feature: add new tool ❌ Invalid type (should be feat)
      • feat(Agent): add tool ❌ Scope should be lowercase
      • feat(): add tool ❌ Empty scope not allowed
      • feat(my_scope): add tool ❌ Underscores not allowed in scope
      • feat(my space): add tool ❌ Space not allowed in scope
      • feat(scope):add tool ❌ Missing space after colon
      • feat(scope): ❌ Empty subject
  • Run lint and format locally:

@ntudy ntudy requested review from BinWang28 and Copilot October 14, 2025 08:46
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for the HLE-text-only benchmark dataset by implementing data preparation, configuration, and documentation for evaluating text-only reasoning tasks. The implementation follows the existing pattern of other benchmark integrations in the codebase.

  • Added HLE-text-only dataset preparation generator that loads and processes the text-only subset of HLE data
  • Created benchmark configuration and agent configuration files for running evaluations with Claude 3.7 Sonnet
  • Added comprehensive documentation with setup and usage instructions

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
utils/prepare_benchmark/main.py Added import and case handling for hle-text-only dataset preparation
utils/prepare_benchmark/gen_hle_text_only.py New generator module that processes HLE text-only dataset from HuggingFace
scripts/run_prepare_benchmark.sh Added command to prepare hle-text-only dataset
docs/mkdocs/docs/hle-text-only.md Complete documentation for HLE-text-only benchmark usage
config/benchmark/hle-text-only.yaml Benchmark configuration for HLE-text-only dataset
config/agent_hle-text-only_claude37sonnet.yaml Agent configuration for running HLE-text-only with Claude 3.7 Sonnet

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +29 to +30

return
Copy link

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explicit return statement at the end of a generator function is unnecessary. Generator functions automatically return when they reach the end.

Suggested change
return

Copilot uses AI. Check for mistakes.
@BinWang28 BinWang28 merged commit 10d9ab6 into miroflow-v0.3 Oct 14, 2025
3 checks passed
@BinWang28 BinWang28 deleted the add-hle-text branch October 14, 2025 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants