-
Notifications
You must be signed in to change notification settings - Fork 155
feat(benchmark): add browsecomp_zh #88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds BrowseComp-ZH benchmark support, including docs, benchmark config, and two agent configurations (Claude via OpenRouter and MiroThinker).
- New mkdocs page and nav entry for BrowseComp-ZH
- New benchmark config browsecomp-zh.yaml
- New agent configs for Claude 3.7 Sonnet (OpenRouter) and MiroThinker
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/mkdocs/mkdocs.yml | Adds BrowseComp-ZH page to docs navigation |
| docs/mkdocs/docs/browsecomp_zh.md | New documentation page with setup and run instructions |
| config/benchmark/browsecomp-zh.yaml | New benchmark configuration (data paths, execution params, OpenAI key) |
| config/agent_browsecomp-zh_mirothinker.yaml | New agent config for MiroThinker with Chinese context |
| config/agent_browsecomp-zh_claude37sonnet.yaml | New agent config for Claude 3.7 Sonnet via OpenRouter with worker sub-agent |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| | Agent Configuration | Model | Use Case | | ||
| |-------------------|-------|----------| | ||
| | `agent_browsecomp-zh_claude37sonnet` | Claude 3.7 Sonnet | Recommended for better performance on Chinese tasks | | ||
| | `agent_browsecomp-zh_mirothinker` | MiroThinker | For local deployment | |
Copilot
AI
Oct 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The table header currently starts with a double pipe on each line (||), which breaks Markdown table rendering. Use a single leading pipe (|) as shown to render correctly.
| openrouter_base_url: "${oc.env:OPENROUTER_BASE_URL,https://openrouter.ai/api/v1}" | ||
| openrouter_provider: "anthropic" | ||
| disable_cache_control: false | ||
| keep_tool_result: -1 |
Copilot
AI
Oct 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] keep_tool_result is defined both under main_agent.llm and main_agent; this duplication can cause confusion about which value is authoritative. Define it in a single place (preferably at main_agent level if the agent layer consumes it) and remove the duplicate.
| keep_tool_result: -1 |
|
|
||
| openai_api_key: "${oc.env:OPENAI_API_KEY,???}" # used for hint generation and final answer extraction | ||
| add_message_id: true | ||
| keep_tool_result: -1 |
Copilot
AI
Oct 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] keep_tool_result is defined both under main_agent.llm and main_agent; this duplication can cause confusion about which value is authoritative. Define it in a single place (preferably at main_agent level if the agent layer consumes it) and remove the duplicate.
| max_tokens: 4096 | ||
| oai_mirothinker_api_key: "${oc.env:OAI_MIROTHINKER_API_KEY,dummy_key}" | ||
| oai_mirothinker_base_url: "${oc.env:OAI_MIROTHINKER_BASE_URL,http://localhost:61005/v1}" | ||
| keep_tool_result: -1 |
Copilot
AI
Oct 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] keep_tool_result is duplicated at both llm and main_agent levels. Consolidate to a single definition to avoid ambiguity; keep it where your code reads it (commonly the agent level) and remove the other.
| keep_tool_result: -1 |
* system prompt: 提示文件嵌入格式 * 简化提示词 * 更新 system prompt, 规范解析失败时的格式 * 优化 system prompt, 防止模型认为空内容是 OCR 失败导致的 * 支持注入额外 prompt
Describe this PR
What changed?
Why?
Related issues
Checklist for PR
Write a descriptive PR title following the Angular commit message format:
<type>(<scope>): <subject>feat(agent): add pdf tool via mcp,perf: make llm client async,fix(utils): load custom config via importlibfeat,fix,docs,style,refactor,perf,test,build,ci,revertcheck-pr-titleCI job will validate your title formatUpdate README❌ Missing type and colonfeat add new feature❌ Missing colon after typeFeature: add new tool❌ Invalid type (should befeat)feat(Agent): add tool❌ Scope should be lowercasefeat(): add tool❌ Empty scope not allowedfeat(my_scope): add tool❌ Underscores not allowed in scopefeat(my space): add tool❌ Space not allowed in scopefeat(scope):add tool❌ Missing space after colonfeat(scope):❌ Empty subjectRun lint and format locally:
uv tool run [email protected] check --fix .uv tool run [email protected] format .lintenforces ruff default format/lint rules on all new codes.