Skip to content

Commit d041bbc

Browse files
authored
feat(doc&benchmark): add tool-searching doc and gaia test config (#41)
* Add GAIA test configuration and documentation * fix * add python tool doc
1 parent d9a29ba commit d041bbc

File tree

7 files changed

+351
-2
lines changed

7 files changed

+351
-2
lines changed

config/agent_gaia-test.yaml

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
defaults:
2+
- benchmark: gaia-test
3+
- override hydra/job_logging: none
4+
- _self_ # Allow defining variables at the top of this file
5+
6+
7+
main_agent:
8+
prompt_class: MainAgentPrompt_GAIA
9+
llm:
10+
provider_class: "ClaudeOpenRouterClient"
11+
model_name: "anthropic/claude-3.7-sonnet"
12+
async_client: true
13+
temperature: 0.3
14+
top_p: 0.95
15+
min_p: 0.0
16+
top_k: -1
17+
max_tokens: 32000
18+
openrouter_api_key: "${oc.env:OPENROUTER_API_KEY,???}"
19+
openrouter_base_url: "${oc.env:OPENROUTER_BASE_URL,https://openrouter.ai/api/v1}"
20+
openrouter_provider: "anthropic"
21+
disable_cache_control: false
22+
keep_tool_result: -1
23+
oai_tool_thinking: false
24+
25+
tool_config:
26+
- tool-reasoning
27+
28+
max_turns: -1 # Maximum number of turns for main agent execution
29+
max_tool_calls_per_turn: 10 # Maximum number of tool calls per turn
30+
31+
input_process:
32+
o3_hint: true
33+
output_process:
34+
o3_final_answer: true
35+
36+
openai_api_key: "${oc.env:OPENAI_API_KEY,???}" # used for o3 hints and final answer extraction
37+
add_message_id: true
38+
keep_tool_result: -1
39+
chinese_context: "${oc.env:CHINESE_CONTEXT,false}"
40+
41+
42+
sub_agents:
43+
agent-worker:
44+
prompt_class: SubAgentWorkerPrompt
45+
llm:
46+
provider_class: "ClaudeOpenRouterClient"
47+
model_name: "anthropic/claude-3.7-sonnet"
48+
async_client: true
49+
temperature: 0.3
50+
top_p: 0.95
51+
min_p: 0.0
52+
top_k: -1
53+
max_tokens: 32000
54+
openrouter_api_key: "${oc.env:OPENROUTER_API_KEY,???}"
55+
openrouter_base_url: "${oc.env:OPENROUTER_BASE_URL,https://openrouter.ai/api/v1}"
56+
openrouter_provider: "anthropic"
57+
disable_cache_control: false
58+
keep_tool_result: -1
59+
oai_tool_thinking: false
60+
61+
tool_config:
62+
- tool-searching
63+
- tool-image-video
64+
- tool-reading
65+
- tool-code
66+
- tool-audio
67+
68+
max_turns: -1 # Maximum number of turns for main agent execution
69+
max_tool_calls_per_turn: 10 # Maximum number of tool calls per turn
70+
71+
72+
# Can define some top-level or default parameters here
73+
output_dir: logs/
74+
data_dir: "${oc.env:DATA_DIR,data}" # Points to where data is stored
75+

config/benchmark/gaia-test.yaml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# config/benchmark/gaia-validation.yaml
2+
defaults:
3+
- default
4+
- _self_
5+
6+
name: "gaia-test"
7+
8+
data:
9+
data_dir: "${data_dir}/gaia-test"
10+
11+
execution:
12+
max_tasks: null # null means no limit
13+
max_concurrent: 10
14+
pass_at_k: 1
15+
16+
openai_api_key: "${oc.env:OPENAI_API_KEY,???}"

docs/mkdocs/docs/gaia_test.md

Lines changed: 46 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,54 @@
11
# GAIA Test
22

3+
This document provides step-by-step instructions for evaluating the GAIA test benchmark.
34

4-
# - Coming Soon -
5+
### Step 1: Prepare the GAIA Test Dataset
56

7+
First, download and prepare the GAIA test dataset:
8+
```bash
9+
cd data
10+
wget https://huggingface.co/datasets/miromind-ai/MiroFlow-Benchmarks/resolve/main/gaia-test.zip
11+
unzip gaia-test.zip
12+
# The unzip passcode is: `pf4*`
13+
```
614

15+
### Step 2: Configure API Keys
16+
17+
Set up the required API keys for model access and tool functionality. Update the `.env` file to include the following keys:
18+
19+
```
20+
21+
# For searching and scraping
22+
SERPER_API_KEY="xxx"
23+
JINA_API_KEY="xxx"
24+
25+
# For Linux sandbox (code execution environment)
26+
E2B_API_KEY="xxx"
27+
28+
# We use Claude-3.7-Sonnet with OpenRouter backend to initialize the LLM. The main reason is that OpenRouter provides better response rates
29+
OPENROUTER_API_KEY="xxx"
30+
OPENROUTER_BASE_URL="https://openrouter.ai/api/v1"
31+
32+
# Used for Claude vision understanding
33+
ANTHROPIC_API_KEY="xxx"
34+
35+
# Used for Gemini vision
36+
GEMINI_API_KEY="xxx"
37+
38+
# Use for llm judge, reasoning, o3 hints, etc.
39+
OPENAI_API_KEY="xxx"
40+
OPENAI_BASE_URL="https://api.openai.com/v1"
41+
42+
43+
```
44+
45+
### Step 3: Run the Evaluation
46+
47+
Execute the following command to run a single evaluation pass on the GAIA test dataset:
48+
49+
```
50+
uv run main.py common-benchmark --config_file_name=agent_gaia-test output_dir="logs/gaia-test/$(date +"%Y%m%d_%H%M")"
51+
```
752

853
---
954
**Last Updated:** Sep 2025

docs/mkdocs/docs/gaia_validation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ JINA_API_KEY="xxx"
5757
# For Linux sandbox (code execution environment)
5858
E2B_API_KEY="xxx"
5959
60-
# We use Claude-3.5-Sonnet with OpenRouter backend to initialize the LLM. The main reason is that OpenRouter provides better response rates
60+
# We use Claude-3.7-Sonnet with OpenRouter backend to initialize the LLM. The main reason is that OpenRouter provides better response rates
6161
OPENROUTER_API_KEY="xxx"
6262
OPENROUTER_BASE_URL="https://openrouter.ai/api/v1"
6363

docs/mkdocs/docs/tool_python.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# Python Tools (`python_server.py`)
2+
3+
The Python Execution Server provides a secure sandboxed environment for running Python code and shell commands using E2B server.
4+
5+
### `create_sandbox()`
6+
Creates a Linux sandbox for safely executing commands and running Python code.
7+
8+
**Returns:**
9+
- `str`: The `sandbox_id` of the newly created sandbox
10+
11+
**Usage Notes:**
12+
- This tool must be called before using other tools within this MCP server file
13+
- The sandbox may timeout and automatically shut down
14+
- The sandbox comes pre-installed with common packages for data science and document processing. For a detailed list and advanced usage information, see [E2B Extention](./e2b_extension.md)
15+
16+
### `run_command(sandbox_id: str, command: str)`
17+
Execute shell commands in the Linux sandbox.
18+
19+
**Parameters:**
20+
- `sandbox_id`: ID of the existing sandbox (must be created first)
21+
- `command`: Shell command to execute
22+
23+
**Returns:**
24+
- `str`: Command execution result (stderr, stdout, exit_code, error)
25+
26+
**Features:**
27+
- Automatic retry mechanism
28+
- Permission hints for sudo commands
29+
30+
### `run_python_code(sandbox_id: str, code_block: str)`
31+
Run Python code in the sandbox and return execution results.
32+
33+
**Parameters:**
34+
- `sandbox_id`: ID of the existing sandbox
35+
- `code_block`: Python code to execute
36+
37+
**Returns:**
38+
- `str`: Code execution result (stderr, stdout, exit_code, error)
39+
40+
**Features:**
41+
- Automatic retry mechanism
42+
43+
### `upload_file_from_local_to_sandbox(sandbox_id: str, local_file_path: str, sandbox_file_path: str = "/home/user")`
44+
Upload local files to the sandbox environment.
45+
46+
When a local file is provided to the agent, the agent needs to call this tool to copy the file from local storage to the sandbox for further file processing.
47+
48+
**Parameters:**
49+
- `sandbox_id`: ID of the existing sandbox
50+
- `local_file_path`: Local path of the file to upload
51+
- `sandbox_file_path`: Target directory in sandbox (default: `/home/user`)
52+
53+
**Returns:**
54+
- `str`: Path of uploaded file in sandbox or error message
55+
56+
### `download_file_from_internet_to_sandbox(sandbox_id: str, url: str, sandbox_file_path: str = "/home/user")`
57+
Download files from the internet directly to the sandbox.
58+
59+
**Parameters:**
60+
- `sandbox_id`: ID of the existing sandbox
61+
- `url`: URL of the file to download
62+
- `sandbox_file_path`: Target directory in sandbox (default: `/home/user`)
63+
64+
**Returns:**
65+
- `str`: Path of downloaded file in sandbox or error message
66+
67+
**Features:**
68+
- Automatic retry mechanism
69+
70+
### `download_file_from_sandbox_to_local(sandbox_id: str, sandbox_file_path: str, local_filename: str = None)`
71+
Download files from sandbox to local system for processing by other tools.
72+
73+
Other MCP tools (such as visual question answering) cannot access files in a sandbox. Therefore, this tool should be called when the agent wants other tools to analyze files in the sandbox.
74+
75+
**Parameters:**
76+
- `sandbox_id`: ID of the sandbox
77+
- `sandbox_file_path`: Path of file in sandbox
78+
- `local_filename`: Optional local filename (uses original if not provided)
79+
80+
**Returns:**
81+
- `str`: Local path of downloaded file or error message
82+
83+
---
84+
85+
**Last Updated:** Sep 2025
86+
**Doc Contributor:** Team @ MiroMind AI

docs/mkdocs/docs/tool_searching.md

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
# Searching Tools (`searching_mcp_server.py`)
2+
3+
The Searching MCP Server provides comprehensive search capabilities including Google search, Wikipedia content retrieval, archive searching, and web scraping functionality.
4+
5+
## Environment Variables Used in Tools
6+
- `SERPER_API_KEY`: Required API key for Serper service, Used by `google_search` and as a fallback for `scrape_website`
7+
- `JINA_API_KEY`: Required API key for JINA service. Default choice for scraping websites in `scrape_website`
8+
- `REMOVE_SNIPPETS`: Set to "true" to filter out snippets from results. Used in `google_search` to filter the search results returned by Serper.
9+
- `REMOVE_KNOWLEDGE_GRAPH`: Set to "true" to remove knowledge graph data. Used in `google_search` to filter the search results returned by Serper.
10+
- `REMOVE_ANSWER_BOX`: Set to "true" to remove answer box content. Used in `google_search` to filter the search results returned by Serper.
11+
12+
### `google_search(q: str, gl: str = "us", hl: str = "en", location: str = None, num: int = 10, tbs: str = None, page: int = 1)`
13+
Perform Google searches via Serper API and retrieve rich search results including organic results, people also ask, related searches, and knowledge graph.
14+
15+
**Parameters:**
16+
17+
- `q`: Search query string
18+
- `gl`: Country context for search (e.g., 'us' for United States, 'cn' for China, 'uk' for United Kingdom). Default: 'us'
19+
- `hl`: Google interface language (e.g., 'en' for English, 'zh' for Chinese, 'es' for Spanish). Default: 'en'
20+
- `location`: City-level location for search results (e.g., 'SoHo, New York, United States', 'California, United States')
21+
- `num`: Number of results to return. Default: 10
22+
- `tbs`: Time-based search filter ('qdr:h' for past hour, 'qdr:d' for past day, 'qdr:w' for past week, 'qdr:m' for past month, 'qdr:y' for past year)
23+
- `page`: Page number of results to return. Default: 1
24+
25+
**Returns:**
26+
27+
- `str`: JSON formatted search results with organic results and related information
28+
29+
**Features:**
30+
31+
- Automatic retry mechanism (up to 5 attempts)
32+
- Configurable result filtering via environment variables
33+
- Support for regional and language-specific searches
34+
35+
### `wiki_get_page_content(entity: str, first_sentences: int = 10)`
36+
Get specific Wikipedia page content for entities (people, places, concepts, events) and return structured information.
37+
38+
**Parameters:**
39+
40+
- `entity`: The entity to search for in Wikipedia
41+
- `first_sentences`: Number of first sentences to return from the page. Set to 0 to return full content. Default: 10
42+
43+
**Returns:**
44+
45+
- `str`: Formatted content containing page title, introduction/full content, and URL
46+
47+
**Features:**
48+
49+
- Handles disambiguation pages automatically
50+
- Provides clean, structured output
51+
- Fallback search suggestions when page not found
52+
- Automatic content truncation for manageable output
53+
54+
### `search_wiki_revision(entity: str, year: int, month: int, max_revisions: int = 50)`
55+
Search for an entity in Wikipedia and return the revision history for a specific month.
56+
57+
**Parameters:**
58+
59+
- `entity`: The entity to search for in Wikipedia
60+
- `year`: The year of the revision (e.g., 2024)
61+
- `month`: The month of the revision (1-12)
62+
- `max_revisions`: Maximum number of revisions to return. Default: 50
63+
64+
**Returns:**
65+
66+
- `str`: Formatted revision history with timestamps, revision IDs, and URLs
67+
68+
**Features:**
69+
70+
- Automatic date validation and adjustment
71+
- Support for date range from 2000 to current year
72+
- Detailed revision metadata including timestamps and direct links
73+
- Clear error handling for invalid dates or missing pages
74+
75+
### `search_archived_webpage(url: str, year: int, month: int, day: int)`
76+
Search the Wayback Machine (archive.org) for archived versions of a webpage for a specific date.
77+
78+
**Parameters:**
79+
80+
- `url`: The URL to search for in the Wayback Machine
81+
- `year`: The target year (e.g., 2023)
82+
- `month`: The target month (1-12)
83+
- `day`: The target day (1-31)
84+
85+
**Returns:**
86+
87+
- `str`: Formatted archive information including archived URL, timestamp, and availability status
88+
89+
**Features:**
90+
91+
- Automatic URL protocol detection and correction
92+
- Date validation and adjustment (1995 to present)
93+
- Fallback to most recent archive if specific date not found
94+
- Special handling for Wikipedia URLs with tool suggestions
95+
- Automatic retry mechanism for reliable results
96+
97+
### `scrape_website(url: str)`
98+
Scrape website content including support for regular websites and YouTube video information.
99+
100+
**Parameters:**
101+
102+
- `url`: The URL of the website to scrape
103+
104+
**Returns:**
105+
106+
- `str`: Scraped website content including text, metadata, and structured information
107+
108+
**Features:**
109+
110+
- Support for various website types
111+
- YouTube video information extraction (subtitles, titles, descriptions, key moments)
112+
- Automatic content parsing and cleaning
113+
- Integration with Jina API for enhanced scraping capabilities
114+
115+
**Usage Notes:**
116+
117+
- Search engines are not supported by this tool
118+
- For YouTube videos, provides non-visual information only
119+
- Content may be incomplete for some complex websites
120+
121+
---
122+
123+
**Last Updated:** Sep 2025
124+
**Doc Contributor:** Team @ MiroMind AI
125+

docs/mkdocs/mkdocs.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,8 @@ nav:
6060
- Tools:
6161
- tool-reasoning: tool_reasoning.md
6262
- tool-vqa: tool_vqa.md
63+
- tool-searching: tool_searching.md
64+
- tool-python: tool_python.md
6365
- Advanced Features:
6466
- E2B Advanced Features: e2b_advanced_features.md
6567
- Add New Tools: contribute_tools.md

0 commit comments

Comments
 (0)