-
Notifications
You must be signed in to change notification settings - Fork 155
feat(tool): incorporate open-source tools from MiroThinker #60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 19 commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
56b235d
upd: add futurex evaluation support.
JubSteven 287a7bc
upd: support multiple eval for futurex and add relavent doc.
JubSteven bf43b37
upd: fix bugs with doc for futurex.
JubSteven d1e1637
debug: fix wrong calling path.
JubSteven eb6f302
add preparation for finsearchcomp.
JubSteven 4dabaee
update a premature version of finsearchcomp benchmark.
JubSteven 5ea9b61
Resolve merge conflicts in FutureX utility files
JubSteven c086e41
clean redundent code in merging.
JubSteven d6a8715
upd: modify yaml to use Mirothinker as the main agent, add check prog…
JubSteven e7163d3
upd: check_progress function for finsearchcomp now consider globe and…
JubSteven b0e494f
Merge remote-tracking branch 'upstream/miroflow-v0.3' into explorations
JubSteven 256ba2c
upd: add docs and shell script for multiple runs.
JubSteven 835e590
fix: check_finsearchcomp_progress not displaying results from greater…
JubSteven 5ffc269
Merge remote-tracking branch 'upstream/miroflow-v0.3' into explorations
JubSteven 4918ee2
Merge branch 'miroflow-v0.3' into explorations
JubSteven 72e9bb6
fix: catch ContextLimitError in more observed cases.
JubSteven e589468
initialize open source tools for audio, vision and reasoning.
JubSteven 948d856
Merge remote-tracking branch 'upstream/miroflow-v0.3' into explorations
JubSteven 15a7ef9
upd: docs for open-source tools.
JubSteven bf786ca
fix wrong date.
JubSteven File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| name: "tool-audio-os" | ||
| tool_command: "python" | ||
| args: | ||
| - "-m" | ||
| - "src.tool.mcp_servers.audio_mcp_server_os" | ||
| env: | ||
| WHISPER_API_KEY: "${oc.env:WHISPER_API_KEY}" | ||
| WHISPER_BASE_URL: "${oc.env:WHISPER_BASE_URL}" | ||
| WHISPER_MODEL_NAME: "${oc.env:WHISPER_MODEL_NAME}" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| name: "tool-image-video-os" | ||
| tool_command: "python" | ||
| args: | ||
| - "-m" | ||
| - "src.tool.mcp_servers.vision_mcp_server_os" | ||
| env: | ||
| VISION_API_KEY: "${oc.env:VISION_API_KEY}" | ||
| VISION_BASE_URL: "${oc.env:VISION_BASE_URL}" | ||
| VISION_MODEL_NAME: "${oc.env:VISION_MODEL_NAME}" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| name: "tool-reasoning-os" | ||
| tool_command: "python" | ||
| args: | ||
| - "-m" | ||
| - "src.tool.mcp_servers.reasoning_mcp_server_os" | ||
| env: | ||
| REASONING_API_KEY: "${oc.env:REASONING_API_KEY}" | ||
| REASONING_BASE_URL: "${oc.env:REASONING_BASE_URL}" | ||
| REASONING_MODEL_NAME: "${oc.env:REASONING_MODEL_NAME}" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,149 @@ | ||
| # Audio Tools - Open Source (`audio_mcp_server_os.py`) | ||
|
|
||
| The Audio MCP Server (Open Source) enables audio transcription using open-source Whisper models. It provides comprehensive audio-to-text conversion with support for multiple audio formats, local files, and URLs. | ||
|
|
||
| !!! info "Available Functions" | ||
| This MCP server provides the following functions that agents can call: | ||
|
|
||
| - **Audio Transcription**: High-quality speech-to-text conversion | ||
| - **Multi-Format Support**: MP3, WAV, M4A, AAC, OGG, FLAC, WMA formats | ||
| - **Flexible Input**: Local file paths and web URLs | ||
| - **Open-Source Model Support**: Whisper-Large-v3-Turbo with automatic processing | ||
|
|
||
| --- | ||
|
|
||
| ## Environment Variables | ||
|
|
||
| !!! warning "Configuration Location" | ||
| The `audio_mcp_server_os.py` reads environment variables that are passed through the `tool-audio-os.yaml` configuration file, not directly from `.env` file. | ||
|
|
||
| **Open-Source Model Configuration:** | ||
|
|
||
| - `WHISPER_API_KEY`: Required API key for the open-source Whisper service | ||
| - `WHISPER_BASE_URL`: Base URL for the Whisper service API endpoint | ||
| - `WHISPER_MODEL_NAME`: Model name (default: `openai/whisper-large-v3-turbo`) | ||
|
|
||
| **Example Configuration:** | ||
| ```bash | ||
| # API for Open-Source Audio Transcription Tool (for benchmark testing) | ||
| WHISPER_MODEL_NAME="openai/whisper-large-v3-turbo" | ||
| WHISPER_API_KEY=your_whisper_key | ||
| WHISPER_BASE_URL="https://your_whisper_base_url/v1" | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Local Deployment | ||
|
|
||
| ### Using vLLM Server | ||
|
|
||
| For optimal performance with the Whisper-Large-v3-Turbo model, deploy using vLLM: | ||
|
|
||
| ```bash | ||
| pip install vllm==0.10.0 | ||
| pip install vllm[audio] | ||
| vllm serve /path/to/whisper \ | ||
| --served-model-name whisper-large-v3-turbo \ | ||
| --task transcription | ||
| ``` | ||
|
|
||
| ### Configuration for Local Deployment | ||
|
|
||
| When using local deployment, configure your environment variables: | ||
|
|
||
| ```bash | ||
| WHISPER_MODEL_NAME="openai/whisper-large-v3-turbo" | ||
| WHISPER_API_KEY="dummy_key" # Not required for local deployment | ||
| WHISPER_BASE_URL="http://localhost:8000/v1" | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Function Reference | ||
|
|
||
| The following function is provided by the `audio_mcp_server_os.py` MCP tool and can be called by agents: | ||
|
|
||
| ### `audio_transcription(audio_path_or_url: str)` | ||
|
|
||
| Transcribe audio files to text using open-source Whisper models. Supports both local files and web URLs with automatic format detection and processing. | ||
|
|
||
| **Parameters:** | ||
|
|
||
| - `audio_path_or_url`: Local file path (accessible to server) or web URL | ||
|
|
||
| **Returns:** | ||
|
|
||
| - `str`: The transcription of the audio file | ||
|
|
||
| **Supported Audio Formats:** | ||
| - MP3 (.mp3) | ||
| - WAV (.wav) | ||
| - M4A (.m4a) | ||
| - AAC (.aac) | ||
| - OGG (.ogg) | ||
| - FLAC (.flac) | ||
| - WMA (.wma) | ||
|
|
||
| ## Usage Examples | ||
|
|
||
| ### Local File Transcription | ||
| ```python | ||
| # Local file transcription | ||
| result = audio_transcription( | ||
| audio_path_or_url="/path/to/audio.mp3" | ||
| ) | ||
| ``` | ||
|
|
||
| ### URL-based Transcription | ||
| ```python | ||
| # URL transcription | ||
| result = audio_transcription( | ||
| audio_path_or_url="https://example.com/audio.wav" | ||
| ) | ||
| ``` | ||
|
|
||
| ### Meeting Recording Transcription | ||
| ```python | ||
| result = audio_transcription( | ||
| audio_path_or_url="meeting_recording.m4a" | ||
| ) | ||
| ``` | ||
|
|
||
| ### Podcast Transcription | ||
| ```python | ||
| result = audio_transcription( | ||
| audio_path_or_url="podcast_episode.mp3" | ||
| ) | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Technical Implementation | ||
|
|
||
| ### Audio Processing Pipeline | ||
|
|
||
| 1. **Input Validation**: Checks if input is local file or URL | ||
| 2. **Format Detection**: Determines audio format from extension or content type | ||
| 3. **File Handling**: Downloads URL files to temporary storage with proper extensions | ||
| 4. **API Request**: Sends audio file to Whisper model for transcription | ||
| 5. **Cleanup**: Removes temporary files after processing | ||
| 6. **Response Processing**: Returns transcription text | ||
|
|
||
| ### Error Handling | ||
|
|
||
| - **File Access Errors**: Graceful handling of inaccessible local files | ||
| - **Network Errors**: Robust URL fetching with retry logic (up to 3 attempts) | ||
| - **Format Errors**: Automatic format detection and validation | ||
| - **API Errors**: Clear error reporting for service issues | ||
| - **Sandbox Restrictions**: Prevents access to sandbox files with clear error messages | ||
|
|
||
| ### Retry Logic | ||
|
|
||
| - **Maximum Retries**: 3 attempts for failed requests | ||
| - **Exponential Backoff**: 5, 10, 20 second delays between retries | ||
| - **Network Resilience**: Handles temporary network issues and service unavailability | ||
|
|
||
| --- | ||
|
|
||
| !!! info "Documentation Info" | ||
| **Last Updated:** January 2025 · **Doc Contributor:** Team @ MiroMind AI | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,135 @@ | ||
| # Reasoning Tools - Open Source (`reasoning_mcp_server_os.py`) | ||
|
|
||
| The Reasoning MCP Server (Open Source) provides a **pure text-based reasoning engine** using open-source models. It supports logical analysis, problem solving, and planning, with robust retry mechanisms and exponential backoff for reliability. | ||
|
|
||
| !!! info "Available Functions" | ||
| This MCP server provides the following functions that agents can call: | ||
|
|
||
| - **Pure Text Reasoning**: Logical analysis and problem solving using open-source LLM backends | ||
| - **Step-by-Step Analysis**: Structured reasoning with detailed explanations | ||
| - **Open-Source Model Support**: Qwen3-235B-A22B-Thinking-2507 with automatic fallback | ||
| - **Robust Error Handling**: Exponential backoff retry logic (up to 10 attempts) | ||
|
|
||
| --- | ||
|
|
||
| ## Environment Variables | ||
|
|
||
| !!! warning "Configuration Location" | ||
| The `reasoning_mcp_server_os.py` reads environment variables that are passed through the `tool-reasoning-os.yaml` configuration file, not directly from `.env` file. | ||
|
|
||
| **Open-Source Model Configuration:** | ||
|
|
||
| - `REASONING_API_KEY`: Required API key for the open-source reasoning service | ||
| - `REASONING_BASE_URL`: Base URL for the reasoning service API endpoint | ||
| - `REASONING_MODEL_NAME`: Model name (default: `Qwen/Qwen3-235B-A22B-Thinking-2507`) | ||
|
|
||
| **Example Configuration:** | ||
| ```bash | ||
| # API for Open-Source Reasoning Tool (for benchmark testing) | ||
| REASONING_MODEL_NAME="Qwen/Qwen3-235B-A22B-Thinking-2507" | ||
| REASONING_API_KEY=your_reasoning_key | ||
| REASONING_BASE_URL="https://your_reasoning_base_url/v1/chat/completions" | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Local Deployment | ||
|
|
||
| ### Using SGLang Server | ||
|
|
||
| For optimal performance with the Qwen3-235B-A22B-Thinking model, deploy using SGLang: | ||
|
|
||
| ```bash | ||
| python3 -m sglang.launch_server \ | ||
| --model-path /path/to/Qwen3-235B-A22B-Thinking-2507 \ | ||
| --tp 8 --host 0.0.0.0 --port 1234 \ | ||
| --trust-remote-code --enable-metrics \ | ||
| --log-level debug --log-level-http debug \ | ||
| --log-requests --log-requests-level 2 \ | ||
| --show-time-cost --context-length 131072 | ||
| ``` | ||
|
|
||
| ### Configuration for Local Deployment | ||
|
|
||
| When using local deployment, configure your environment variables: | ||
|
|
||
| ```bash | ||
| REASONING_MODEL_NAME="Qwen/Qwen3-235B-A22B-Thinking-2507" | ||
| REASONING_API_KEY="dummy_key" # Not required for local deployment | ||
| REASONING_BASE_URL="http://localhost:1234/v1/chat/completions" | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Function Reference | ||
|
|
||
| The following function is provided by the `reasoning_mcp_server_os.py` MCP tool and can be called by agents: | ||
|
|
||
| ### `reasoning(question: str)` | ||
|
|
||
| Perform step-by-step reasoning, analysis, and planning over a **text-only input**. This tool is specialized for **complex thinking tasks** that require deep analytical reasoning. | ||
|
|
||
| !!! note "Text-Only Processing" | ||
| This tool processes only the provided text input and will not fetch external data or context. Ensure all necessary information is included in the question. | ||
|
|
||
| **Parameters:** | ||
|
|
||
| - `question`: A detailed, complex question or problem statement that includes all necessary information | ||
|
|
||
| **Returns:** | ||
|
|
||
| - `str`: A structured, step-by-step reasoned answer | ||
|
|
||
| **Features:** | ||
|
|
||
| - **Open-Source Model**: Uses Qwen3-235B-A22B-Thinking-2507 for advanced reasoning | ||
| - **Robust Retry Logic**: Exponential backoff retry mechanism (up to 10 attempts) | ||
| - **Thinking Mode Support**: Automatically extracts reasoning content from thinking blocks | ||
| - **Error Handling**: Graceful fallback with informative error messages | ||
| - **Timeout Protection**: 600-second timeout for long-running reasoning tasks | ||
| - **Jittered Backoff**: Prevents thundering herd problems with randomized retry delays | ||
|
|
||
| **Retry Configuration:** | ||
| - Maximum retries: 10 attempts | ||
| - Initial backoff: 1.0 seconds | ||
| - Maximum backoff: 30.0 seconds | ||
| - Exponential backoff with jitter (0.8-1.2x multiplier) | ||
|
|
||
| --- | ||
|
|
||
| ## Usage Examples | ||
|
|
||
| ### Complex Mathematical Problems | ||
| ```python | ||
| question = """ | ||
| Solve this complex optimization problem: | ||
| A company wants to minimize costs while maximizing production. | ||
| Given constraints: 2x + 3y ≤ 100, x + y ≤ 50, x ≥ 0, y ≥ 0 | ||
| Cost function: C = 5x + 8y | ||
| Production function: P = 3x + 4y | ||
| Find the optimal values of x and y. | ||
| """ | ||
| ``` | ||
|
|
||
| ### Logical Puzzles | ||
| ```python | ||
| question = """ | ||
| Three people are in a room: Alice, Bob, and Charlie. | ||
| - Alice says: "Bob is lying" | ||
| - Bob says: "Charlie is lying" | ||
| - Charlie says: "Alice is lying" | ||
| If exactly one person is telling the truth, who is it? | ||
| """ | ||
| ``` | ||
|
|
||
| ### Strategic Planning | ||
| ```python | ||
| question = """ | ||
| Design a strategy for a startup to enter a competitive market | ||
| with limited resources. Consider market analysis, competitive | ||
| positioning, resource allocation, and risk mitigation. | ||
| """ | ||
| ``` | ||
|
|
||
| !!! info "Documentation Info" | ||
| **Last Updated:** January 2025 · **Doc Contributor:** Team @ MiroMind AI |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be "October 2025"