Skip to content

Commit f2b0ca4

Browse files
author
Yue Deng
committed
add prepare benchmark and remove unused config
1 parent 57486a0 commit f2b0ca4

File tree

2 files changed

+99
-8
lines changed

2 files changed

+99
-8
lines changed

config/benchmark/default.yaml

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,6 @@ name: "default"
44

55
data:
66
metadata_file: "standardized_data.jsonl"
7-
field_mapping:
8-
task_id_field: "task_id"
9-
task_question_field: "task_question"
10-
ground_truth_field: "ground_truth"
11-
file_name_field: "file_name"
127
whitelist: []
138

149
execution:
Lines changed: 99 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,106 @@
1-
# Add New Benchmarks
1+
# 🧪 Adding New Benchmarks to MiroFlow
22

3+
This guide provides a comprehensive walkthrough for adding new benchmarks to the MiroFlow framework. MiroFlow uses a modular benchmark architecture that allows for easy integration of new evaluation datasets.
34

5+
---
6+
7+
## 🚀 Step-by-Step Implementation Guide
8+
9+
### Step 1: Prepare Your Dataset
10+
11+
Your benchmark dataset should follow this structure:
12+
13+
```
14+
your-benchmark/
15+
├── standardized_data.jsonl # Metadata file (required)
16+
├── file1.pdf # Optional: Binary files referenced by tasks
17+
├── file2.png
18+
└── ...
19+
```
20+
21+
#### Metadata Format (JSONL)
22+
23+
Each line in `standardized_data.jsonl` should be a JSON object with these fields:
24+
25+
```json
26+
{
27+
"task_id": "unique_task_identifier",
28+
"task_question": "The question or instruction for the task",
29+
"ground_truth": "The expected answer or solution",
30+
"file_path": "path/to/file.pdf", // Optional, can be null
31+
"metadata": { // Optional, can be empty
32+
"difficulty": "hard",
33+
"category": "reasoning",
34+
"source": "original_dataset_name"
35+
}
36+
}
37+
```
38+
39+
**Example:**
40+
```json
41+
{
42+
"task_id": "math_001",
43+
"task_question": "What is the integral of x^2 from 0 to 2?",
44+
"ground_truth": "8/3",
45+
"file_path": null,
46+
"metadata": {
47+
"difficulty": "medium",
48+
"category": "calculus"
49+
}
50+
}
51+
```
52+
53+
### Step 2: Create Configuration File
54+
55+
Create a new configuration file in `config/benchmark/your-benchmark.yaml`:
456

5-
# -- Coming Soon --
57+
```yaml
58+
# config/benchmark/your-benchmark.yaml
59+
defaults:
60+
- default
61+
- _self_
662

63+
name: "your-benchmark"
64+
65+
data:
66+
data_dir: "${data_dir}/your-benchmark" # Path to your dataset
67+
metadata_file: "standardized_data.jsonl" # Metadata filename
68+
whitelist: [] # Optional: List of specific task_ids to run
69+
70+
execution:
71+
max_tasks: null # null = no limit, or specify a number
72+
max_concurrent: 5 # Number of parallel tasks
73+
pass_at_k: 1 # Number of attempts per task
74+
75+
openai_api_key: "${oc.env:OPENAI_API_KEY,???}"
76+
```
77+
78+
### Step 3: Set Up Data Directory
79+
80+
Place your dataset in the appropriate data directory:
81+
82+
```bash
83+
# Create the benchmark data directory
84+
mkdir -p data/your-benchmark
85+
86+
# Copy your dataset files
87+
cp your-dataset/* data/your-benchmark/
88+
```
89+
90+
### Step 4: Test Your Benchmark
91+
92+
Run your benchmark using the MiroFlow CLI:
93+
94+
```bash
95+
# Test with a small subset
96+
uv run main.py common-benchmark \
97+
--config_file_name=agent_quickstart_1 \
98+
benchmark=your-benchmark \
99+
benchmark.execution.max_tasks=5 \
100+
output_dir=logs/test-your-benchmark
101+
```
7102

8103
---
104+
9105
**Last Updated:** Sep 2025
10-
**Doc Contributor:** Team @ MiroMind AI
106+
**Doc Contributor:** Team @ MiroMind AI

0 commit comments

Comments
 (0)