Skip to content

Commit 0b20ff3

Browse files
authored
feat(tool): incorporate open-source tools from MiroThinker (#60)
* upd: add futurex evaluation support. * upd: support multiple eval for futurex and add relavent doc. * upd: fix bugs with doc for futurex. * debug: fix wrong calling path. * add preparation for finsearchcomp. * update a premature version of finsearchcomp benchmark. * clean redundent code in merging. * upd: modify yaml to use Mirothinker as the main agent, add check progress file to exclude T1. * upd: check_progress function for finsearchcomp now consider globe and greater china respectively. * upd: add docs and shell script for multiple runs. * fix: check_finsearchcomp_progress not displaying results from greater china region. * fix: catch ContextLimitError in more observed cases. * initialize open source tools for audio, vision and reasoning. * upd: docs for open-source tools. * fix wrong date.
1 parent 6ec4972 commit 0b20ff3

File tree

10 files changed

+891
-0
lines changed

10 files changed

+891
-0
lines changed

config/tool/tool-audio-os.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
name: "tool-audio-os"
2+
tool_command: "python"
3+
args:
4+
- "-m"
5+
- "src.tool.mcp_servers.audio_mcp_server_os"
6+
env:
7+
WHISPER_API_KEY: "${oc.env:WHISPER_API_KEY}"
8+
WHISPER_BASE_URL: "${oc.env:WHISPER_BASE_URL}"
9+
WHISPER_MODEL_NAME: "${oc.env:WHISPER_MODEL_NAME}"
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
name: "tool-image-video-os"
2+
tool_command: "python"
3+
args:
4+
- "-m"
5+
- "src.tool.mcp_servers.vision_mcp_server_os"
6+
env:
7+
VISION_API_KEY: "${oc.env:VISION_API_KEY}"
8+
VISION_BASE_URL: "${oc.env:VISION_BASE_URL}"
9+
VISION_MODEL_NAME: "${oc.env:VISION_MODEL_NAME}"

config/tool/tool-reasoning-os.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
name: "tool-reasoning-os"
2+
tool_command: "python"
3+
args:
4+
- "-m"
5+
- "src.tool.mcp_servers.reasoning_mcp_server_os"
6+
env:
7+
REASONING_API_KEY: "${oc.env:REASONING_API_KEY}"
8+
REASONING_BASE_URL: "${oc.env:REASONING_BASE_URL}"
9+
REASONING_MODEL_NAME: "${oc.env:REASONING_MODEL_NAME}"

docs/mkdocs/docs/tool_audio_os.md

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
# Audio Tools - Open Source (`audio_mcp_server_os.py`)
2+
3+
The Audio MCP Server (Open Source) enables audio transcription using open-source Whisper models. It provides comprehensive audio-to-text conversion with support for multiple audio formats, local files, and URLs.
4+
5+
!!! info "Available Functions"
6+
This MCP server provides the following functions that agents can call:
7+
8+
- **Audio Transcription**: High-quality speech-to-text conversion
9+
- **Multi-Format Support**: MP3, WAV, M4A, AAC, OGG, FLAC, WMA formats
10+
- **Flexible Input**: Local file paths and web URLs
11+
- **Open-Source Model Support**: Whisper-Large-v3-Turbo with automatic processing
12+
13+
---
14+
15+
## Environment Variables
16+
17+
!!! warning "Configuration Location"
18+
The `audio_mcp_server_os.py` reads environment variables that are passed through the `tool-audio-os.yaml` configuration file, not directly from `.env` file.
19+
20+
**Open-Source Model Configuration:**
21+
22+
- `WHISPER_API_KEY`: Required API key for the open-source Whisper service
23+
- `WHISPER_BASE_URL`: Base URL for the Whisper service API endpoint
24+
- `WHISPER_MODEL_NAME`: Model name (default: `openai/whisper-large-v3-turbo`)
25+
26+
**Example Configuration:**
27+
```bash
28+
# API for Open-Source Audio Transcription Tool (for benchmark testing)
29+
WHISPER_MODEL_NAME="openai/whisper-large-v3-turbo"
30+
WHISPER_API_KEY=your_whisper_key
31+
WHISPER_BASE_URL="https://your_whisper_base_url/v1"
32+
```
33+
34+
---
35+
36+
## Local Deployment
37+
38+
### Using vLLM Server
39+
40+
For optimal performance with the Whisper-Large-v3-Turbo model, deploy using vLLM:
41+
42+
```bash
43+
pip install vllm==0.10.0
44+
pip install vllm[audio]
45+
vllm serve /path/to/whisper \
46+
--served-model-name whisper-large-v3-turbo \
47+
--task transcription
48+
```
49+
50+
### Configuration for Local Deployment
51+
52+
When using local deployment, configure your environment variables:
53+
54+
```bash
55+
WHISPER_MODEL_NAME="openai/whisper-large-v3-turbo"
56+
WHISPER_API_KEY="dummy_key" # Not required for local deployment
57+
WHISPER_BASE_URL="http://localhost:8000/v1"
58+
```
59+
60+
---
61+
62+
## Function Reference
63+
64+
The following function is provided by the `audio_mcp_server_os.py` MCP tool and can be called by agents:
65+
66+
### `audio_transcription(audio_path_or_url: str)`
67+
68+
Transcribe audio files to text using open-source Whisper models. Supports both local files and web URLs with automatic format detection and processing.
69+
70+
**Parameters:**
71+
72+
- `audio_path_or_url`: Local file path (accessible to server) or web URL
73+
74+
**Returns:**
75+
76+
- `str`: The transcription of the audio file
77+
78+
**Supported Audio Formats:**
79+
- MP3 (.mp3)
80+
- WAV (.wav)
81+
- M4A (.m4a)
82+
- AAC (.aac)
83+
- OGG (.ogg)
84+
- FLAC (.flac)
85+
- WMA (.wma)
86+
87+
## Usage Examples
88+
89+
### Local File Transcription
90+
```python
91+
# Local file transcription
92+
result = audio_transcription(
93+
audio_path_or_url="/path/to/audio.mp3"
94+
)
95+
```
96+
97+
### URL-based Transcription
98+
```python
99+
# URL transcription
100+
result = audio_transcription(
101+
audio_path_or_url="https://example.com/audio.wav"
102+
)
103+
```
104+
105+
### Meeting Recording Transcription
106+
```python
107+
result = audio_transcription(
108+
audio_path_or_url="meeting_recording.m4a"
109+
)
110+
```
111+
112+
### Podcast Transcription
113+
```python
114+
result = audio_transcription(
115+
audio_path_or_url="podcast_episode.mp3"
116+
)
117+
```
118+
119+
---
120+
121+
## Technical Implementation
122+
123+
### Audio Processing Pipeline
124+
125+
1. **Input Validation**: Checks if input is local file or URL
126+
2. **Format Detection**: Determines audio format from extension or content type
127+
3. **File Handling**: Downloads URL files to temporary storage with proper extensions
128+
4. **API Request**: Sends audio file to Whisper model for transcription
129+
5. **Cleanup**: Removes temporary files after processing
130+
6. **Response Processing**: Returns transcription text
131+
132+
### Error Handling
133+
134+
- **File Access Errors**: Graceful handling of inaccessible local files
135+
- **Network Errors**: Robust URL fetching with retry logic (up to 3 attempts)
136+
- **Format Errors**: Automatic format detection and validation
137+
- **API Errors**: Clear error reporting for service issues
138+
- **Sandbox Restrictions**: Prevents access to sandbox files with clear error messages
139+
140+
### Retry Logic
141+
142+
- **Maximum Retries**: 3 attempts for failed requests
143+
- **Exponential Backoff**: 5, 10, 20 second delays between retries
144+
- **Network Resilience**: Handles temporary network issues and service unavailability
145+
146+
---
147+
148+
!!! info "Documentation Info"
149+
**Last Updated:** October 2025 · **Doc Contributor:** Team @ MiroMind AI
Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
# Reasoning Tools - Open Source (`reasoning_mcp_server_os.py`)
2+
3+
The Reasoning MCP Server (Open Source) provides a **pure text-based reasoning engine** using open-source models. It supports logical analysis, problem solving, and planning, with robust retry mechanisms and exponential backoff for reliability.
4+
5+
!!! info "Available Functions"
6+
This MCP server provides the following functions that agents can call:
7+
8+
- **Pure Text Reasoning**: Logical analysis and problem solving using open-source LLM backends
9+
- **Step-by-Step Analysis**: Structured reasoning with detailed explanations
10+
- **Open-Source Model Support**: Qwen3-235B-A22B-Thinking-2507 with automatic fallback
11+
- **Robust Error Handling**: Exponential backoff retry logic (up to 10 attempts)
12+
13+
---
14+
15+
## Environment Variables
16+
17+
!!! warning "Configuration Location"
18+
The `reasoning_mcp_server_os.py` reads environment variables that are passed through the `tool-reasoning-os.yaml` configuration file, not directly from `.env` file.
19+
20+
**Open-Source Model Configuration:**
21+
22+
- `REASONING_API_KEY`: Required API key for the open-source reasoning service
23+
- `REASONING_BASE_URL`: Base URL for the reasoning service API endpoint
24+
- `REASONING_MODEL_NAME`: Model name (default: `Qwen/Qwen3-235B-A22B-Thinking-2507`)
25+
26+
**Example Configuration:**
27+
```bash
28+
# API for Open-Source Reasoning Tool (for benchmark testing)
29+
REASONING_MODEL_NAME="Qwen/Qwen3-235B-A22B-Thinking-2507"
30+
REASONING_API_KEY=your_reasoning_key
31+
REASONING_BASE_URL="https://your_reasoning_base_url/v1/chat/completions"
32+
```
33+
34+
---
35+
36+
## Local Deployment
37+
38+
### Using SGLang Server
39+
40+
For optimal performance with the Qwen3-235B-A22B-Thinking model, deploy using SGLang:
41+
42+
```bash
43+
python3 -m sglang.launch_server \
44+
--model-path /path/to/Qwen3-235B-A22B-Thinking-2507 \
45+
--tp 8 --host 0.0.0.0 --port 1234 \
46+
--trust-remote-code --enable-metrics \
47+
--log-level debug --log-level-http debug \
48+
--log-requests --log-requests-level 2 \
49+
--show-time-cost --context-length 131072
50+
```
51+
52+
### Configuration for Local Deployment
53+
54+
When using local deployment, configure your environment variables:
55+
56+
```bash
57+
REASONING_MODEL_NAME="Qwen/Qwen3-235B-A22B-Thinking-2507"
58+
REASONING_API_KEY="dummy_key" # Not required for local deployment
59+
REASONING_BASE_URL="http://localhost:1234/v1/chat/completions"
60+
```
61+
62+
---
63+
64+
## Function Reference
65+
66+
The following function is provided by the `reasoning_mcp_server_os.py` MCP tool and can be called by agents:
67+
68+
### `reasoning(question: str)`
69+
70+
Perform step-by-step reasoning, analysis, and planning over a **text-only input**. This tool is specialized for **complex thinking tasks** that require deep analytical reasoning.
71+
72+
!!! note "Text-Only Processing"
73+
This tool processes only the provided text input and will not fetch external data or context. Ensure all necessary information is included in the question.
74+
75+
**Parameters:**
76+
77+
- `question`: A detailed, complex question or problem statement that includes all necessary information
78+
79+
**Returns:**
80+
81+
- `str`: A structured, step-by-step reasoned answer
82+
83+
**Features:**
84+
85+
- **Open-Source Model**: Uses Qwen3-235B-A22B-Thinking-2507 for advanced reasoning
86+
- **Robust Retry Logic**: Exponential backoff retry mechanism (up to 10 attempts)
87+
- **Thinking Mode Support**: Automatically extracts reasoning content from thinking blocks
88+
- **Error Handling**: Graceful fallback with informative error messages
89+
- **Timeout Protection**: 600-second timeout for long-running reasoning tasks
90+
- **Jittered Backoff**: Prevents thundering herd problems with randomized retry delays
91+
92+
**Retry Configuration:**
93+
- Maximum retries: 10 attempts
94+
- Initial backoff: 1.0 seconds
95+
- Maximum backoff: 30.0 seconds
96+
- Exponential backoff with jitter (0.8-1.2x multiplier)
97+
98+
---
99+
100+
## Usage Examples
101+
102+
### Complex Mathematical Problems
103+
```python
104+
question = """
105+
Solve this complex optimization problem:
106+
A company wants to minimize costs while maximizing production.
107+
Given constraints: 2x + 3y ≤ 100, x + y ≤ 50, x ≥ 0, y ≥ 0
108+
Cost function: C = 5x + 8y
109+
Production function: P = 3x + 4y
110+
Find the optimal values of x and y.
111+
"""
112+
```
113+
114+
### Logical Puzzles
115+
```python
116+
question = """
117+
Three people are in a room: Alice, Bob, and Charlie.
118+
- Alice says: "Bob is lying"
119+
- Bob says: "Charlie is lying"
120+
- Charlie says: "Alice is lying"
121+
If exactly one person is telling the truth, who is it?
122+
"""
123+
```
124+
125+
### Strategic Planning
126+
```python
127+
question = """
128+
Design a strategy for a startup to enter a competitive market
129+
with limited resources. Consider market analysis, competitive
130+
positioning, resource allocation, and risk mitigation.
131+
"""
132+
```
133+
134+
!!! info "Documentation Info"
135+
**Last Updated:** October 2025 · **Doc Contributor:** Team @ MiroMind AI

0 commit comments

Comments
 (0)