MiroMindAI
diff --git a/‎config/tool/tool-audio-os.yaml‎
Lines changed: 9 additions & 0 deletions b/‎config/tool/tool-audio-os.yaml‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎config/tool/tool-image-video-os.yaml‎
Lines changed: 9 additions & 0 deletions b/‎config/tool/tool-image-video-os.yaml‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎config/tool/tool-reasoning-os.yaml‎
Lines changed: 9 additions & 0 deletions b/‎config/tool/tool-reasoning-os.yaml‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎docs/mkdocs/docs/tool_audio_os.md‎
Lines changed: 149 additions & 0 deletions b/‎docs/mkdocs/docs/tool_audio_os.md‎
Lines changed: 149 additions & 0 deletions
diff --git a/‎docs/mkdocs/docs/tool_reasoning_os.md‎
Lines changed: 135 additions & 0 deletions b/‎docs/mkdocs/docs/tool_reasoning_os.md‎
Lines changed: 135 additions & 0 deletions
@@ -0,0 +1,9 @@
+name: "tool-audio-os"
+tool_command: "python"
+args:
+  - "-m"
+  - "src.tool.mcp_servers.audio_mcp_server_os"
+env:
+  WHISPER_API_KEY: "${oc.env:WHISPER_API_KEY}"
+  WHISPER_BASE_URL: "${oc.env:WHISPER_BASE_URL}"
+  WHISPER_MODEL_NAME: "${oc.env:WHISPER_MODEL_NAME}"
@@ -0,0 +1,9 @@
+name: "tool-image-video-os"
+tool_command: "python"
+args:
+  - "-m"
+  - "src.tool.mcp_servers.vision_mcp_server_os"
+env:
+  VISION_API_KEY: "${oc.env:VISION_API_KEY}"
+  VISION_BASE_URL: "${oc.env:VISION_BASE_URL}"
+  VISION_MODEL_NAME: "${oc.env:VISION_MODEL_NAME}"
@@ -0,0 +1,9 @@
+name: "tool-reasoning-os"
+tool_command: "python"
+args:
+  - "-m"
+  - "src.tool.mcp_servers.reasoning_mcp_server_os"
+env:
+  REASONING_API_KEY: "${oc.env:REASONING_API_KEY}"
+  REASONING_BASE_URL: "${oc.env:REASONING_BASE_URL}"
+  REASONING_MODEL_NAME: "${oc.env:REASONING_MODEL_NAME}"
@@ -0,0 +1,149 @@
+# Audio Tools - Open Source (`audio_mcp_server_os.py`)
+
+The Audio MCP Server (Open Source) enables audio transcription using open-source Whisper models. It provides comprehensive audio-to-text conversion with support for multiple audio formats, local files, and URLs.
+
+!!! info "Available Functions"
+    This MCP server provides the following functions that agents can call:
+    
+    - **Audio Transcription**: High-quality speech-to-text conversion
+    - **Multi-Format Support**: MP3, WAV, M4A, AAC, OGG, FLAC, WMA formats
+    - **Flexible Input**: Local file paths and web URLs
+    - **Open-Source Model Support**: Whisper-Large-v3-Turbo with automatic processing
+
+---
+
+## Environment Variables
+
+!!! warning "Configuration Location"
+    The `audio_mcp_server_os.py` reads environment variables that are passed through the `tool-audio-os.yaml` configuration file, not directly from `.env` file.
+
+**Open-Source Model Configuration:**
+
+- `WHISPER_API_KEY`: Required API key for the open-source Whisper service
+- `WHISPER_BASE_URL`: Base URL for the Whisper service API endpoint
+- `WHISPER_MODEL_NAME`: Model name (default: `openai/whisper-large-v3-turbo`)
+
+**Example Configuration:**
+```bash
+# API for Open-Source Audio Transcription Tool (for benchmark testing)
+WHISPER_MODEL_NAME="openai/whisper-large-v3-turbo"
+WHISPER_API_KEY=your_whisper_key
+WHISPER_BASE_URL="https://your_whisper_base_url/v1"
+```
+
+---
+
+## Local Deployment
+
+### Using vLLM Server
+
+For optimal performance with the Whisper-Large-v3-Turbo model, deploy using vLLM:
+
+```bash
+pip install vllm==0.10.0
+pip install vllm[audio]
+vllm serve /path/to/whisper \
+  --served-model-name whisper-large-v3-turbo \
+  --task transcription
+```
+
+### Configuration for Local Deployment
+
+When using local deployment, configure your environment variables:
+
+```bash
+WHISPER_MODEL_NAME="openai/whisper-large-v3-turbo"
+WHISPER_API_KEY="dummy_key"  # Not required for local deployment
+WHISPER_BASE_URL="http://localhost:8000/v1"
+```
+
+---
+
+## Function Reference
+
+The following function is provided by the `audio_mcp_server_os.py` MCP tool and can be called by agents:
+
+### `audio_transcription(audio_path_or_url: str)`
+
+Transcribe audio files to text using open-source Whisper models. Supports both local files and web URLs with automatic format detection and processing.
+
+**Parameters:**
+
+- `audio_path_or_url`: Local file path (accessible to server) or web URL
+
+**Returns:**
+
+- `str`: The transcription of the audio file
+
+**Supported Audio Formats:**
+- MP3 (.mp3)
+- WAV (.wav)
+- M4A (.m4a)
+- AAC (.aac)
+- OGG (.ogg)
+- FLAC (.flac)
+- WMA (.wma)
+
+## Usage Examples
+
+### Local File Transcription
+```python
+# Local file transcription
+result = audio_transcription(
+    audio_path_or_url="/path/to/audio.mp3"
+)
+```
+
+### URL-based Transcription
+```python
+# URL transcription
+result = audio_transcription(
+    audio_path_or_url="https://example.com/audio.wav"
+)
+```
+
+### Meeting Recording Transcription
+```python
+result = audio_transcription(
+    audio_path_or_url="meeting_recording.m4a"
+)
+```
+
+### Podcast Transcription
+```python
+result = audio_transcription(
+    audio_path_or_url="podcast_episode.mp3"
+)
+```
+
+---
+
+## Technical Implementation
+
+### Audio Processing Pipeline
+
+1. **Input Validation**: Checks if input is local file or URL
+2. **Format Detection**: Determines audio format from extension or content type
+3. **File Handling**: Downloads URL files to temporary storage with proper extensions
+4. **API Request**: Sends audio file to Whisper model for transcription
+5. **Cleanup**: Removes temporary files after processing
+6. **Response Processing**: Returns transcription text
+
+### Error Handling
+
+- **File Access Errors**: Graceful handling of inaccessible local files
+- **Network Errors**: Robust URL fetching with retry logic (up to 3 attempts)
+- **Format Errors**: Automatic format detection and validation
+- **API Errors**: Clear error reporting for service issues
+- **Sandbox Restrictions**: Prevents access to sandbox files with clear error messages
+
+### Retry Logic
+
+- **Maximum Retries**: 3 attempts for failed requests
+- **Exponential Backoff**: 5, 10, 20 second delays between retries
+- **Network Resilience**: Handles temporary network issues and service unavailability
+
+---
+
+!!! info "Documentation Info"
+    **Last Updated:** October 2025 · **Doc Contributor:** Team @ MiroMind AI
@@ -0,0 +1,135 @@
+# Reasoning Tools - Open Source (`reasoning_mcp_server_os.py`)
+
+The Reasoning MCP Server (Open Source) provides a **pure text-based reasoning engine** using open-source models. It supports logical analysis, problem solving, and planning, with robust retry mechanisms and exponential backoff for reliability.
+
+!!! info "Available Functions"
+    This MCP server provides the following functions that agents can call:
+    
+    - **Pure Text Reasoning**: Logical analysis and problem solving using open-source LLM backends
+    - **Step-by-Step Analysis**: Structured reasoning with detailed explanations
+    - **Open-Source Model Support**: Qwen3-235B-A22B-Thinking-2507 with automatic fallback
+    - **Robust Error Handling**: Exponential backoff retry logic (up to 10 attempts)
+
+---
+
+## Environment Variables
+
+!!! warning "Configuration Location"
+    The `reasoning_mcp_server_os.py` reads environment variables that are passed through the `tool-reasoning-os.yaml` configuration file, not directly from `.env` file.
+
+**Open-Source Model Configuration:**
+
+- `REASONING_API_KEY`: Required API key for the open-source reasoning service
+- `REASONING_BASE_URL`: Base URL for the reasoning service API endpoint
+- `REASONING_MODEL_NAME`: Model name (default: `Qwen/Qwen3-235B-A22B-Thinking-2507`)
+
+**Example Configuration:**
+```bash
+# API for Open-Source Reasoning Tool (for benchmark testing)
+REASONING_MODEL_NAME="Qwen/Qwen3-235B-A22B-Thinking-2507"
+REASONING_API_KEY=your_reasoning_key
+REASONING_BASE_URL="https://your_reasoning_base_url/v1/chat/completions"
+```
+
+---
+
+## Local Deployment
+
+### Using SGLang Server
+
+For optimal performance with the Qwen3-235B-A22B-Thinking model, deploy using SGLang:
+
+```bash
+python3 -m sglang.launch_server \
+  --model-path /path/to/Qwen3-235B-A22B-Thinking-2507 \
+  --tp 8 --host 0.0.0.0 --port 1234 \
+  --trust-remote-code --enable-metrics \
+  --log-level debug --log-level-http debug \
+  --log-requests --log-requests-level 2 \
+  --show-time-cost --context-length 131072
+```
+
+### Configuration for Local Deployment
+
+When using local deployment, configure your environment variables:
+
+```bash
+REASONING_MODEL_NAME="Qwen/Qwen3-235B-A22B-Thinking-2507"
+REASONING_API_KEY="dummy_key"  # Not required for local deployment
+REASONING_BASE_URL="http://localhost:1234/v1/chat/completions"
+```
+
+---
+
+## Function Reference
+
+The following function is provided by the `reasoning_mcp_server_os.py` MCP tool and can be called by agents:
+
+### `reasoning(question: str)`
+
+Perform step-by-step reasoning, analysis, and planning over a **text-only input**. This tool is specialized for **complex thinking tasks** that require deep analytical reasoning.
+
+!!! note "Text-Only Processing"
+    This tool processes only the provided text input and will not fetch external data or context. Ensure all necessary information is included in the question.
+
+**Parameters:**
+
+- `question`: A detailed, complex question or problem statement that includes all necessary information
+
+**Returns:**
+
+- `str`: A structured, step-by-step reasoned answer
+
+**Features:**
+
+- **Open-Source Model**: Uses Qwen3-235B-A22B-Thinking-2507 for advanced reasoning
+- **Robust Retry Logic**: Exponential backoff retry mechanism (up to 10 attempts)
+- **Thinking Mode Support**: Automatically extracts reasoning content from thinking blocks
+- **Error Handling**: Graceful fallback with informative error messages
+- **Timeout Protection**: 600-second timeout for long-running reasoning tasks
+- **Jittered Backoff**: Prevents thundering herd problems with randomized retry delays
+
+**Retry Configuration:**
+- Maximum retries: 10 attempts
+- Initial backoff: 1.0 seconds
+- Maximum backoff: 30.0 seconds
+- Exponential backoff with jitter (0.8-1.2x multiplier)
+
+---
+
+## Usage Examples
+
+### Complex Mathematical Problems
+```python
+question = """
+Solve this complex optimization problem:
+A company wants to minimize costs while maximizing production. 
+Given constraints: 2x + 3y ≤ 100, x + y ≤ 50, x ≥ 0, y ≥ 0
+Cost function: C = 5x + 8y
+Production function: P = 3x + 4y
+Find the optimal values of x and y.
+"""
+```
+
+### Logical Puzzles
+```python
+question = """
+Three people are in a room: Alice, Bob, and Charlie. 
+- Alice says: "Bob is lying"
+- Bob says: "Charlie is lying" 
+- Charlie says: "Alice is lying"
+If exactly one person is telling the truth, who is it?
+"""
+```
+
+### Strategic Planning
+```python
+question = """
+Design a strategy for a startup to enter a competitive market 
+with limited resources. Consider market analysis, competitive 
+positioning, resource allocation, and risk mitigation.
+"""
+```
+
+!!! info "Documentation Info"
+    **Last Updated:** October 2025 · **Doc Contributor:** Team @ MiroMind AI