You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/mkdocs/docs/futurex.md
+4-19Lines changed: 4 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -137,33 +137,18 @@ Expected Format: \boxed{A}, \boxed{B}, or \boxed{C}
137
137
138
138
### Step 1: Run Multiple Evaluations
139
139
140
-
Use the multiple runs script to execute several independent evaluations:
140
+
Set `num_runs` in relevant config to the desired number of runs to run multiple evaluations and automatically enable parallel thinking for enhanced performance.
141
141
142
-
```bash title="Run Multiple Evaluations"
143
-
./scripts/run_evaluate_multiple_runs_futurex.sh
144
-
```
145
-
146
-
This script will:
142
+
It will:
147
143
148
-
- Run 3 independent evaluations by default (configurable with `NUM_RUNS`)
144
+
- Run multiple independent evaluations by default (configurable with `num_runs`)
149
145
- Execute all tasks in parallel for efficiency
150
146
- Generate separate result files for each run in `run_1/`, `run_2/`, etc.
151
-
- Create a consolidated `futurex_submission.jsonl` file with voting results
152
147
153
148
### Step 2: Customize Multiple Runs
154
149
155
-
You can customize the evaluation parameters:
156
-
157
-
```bash title="Custom Multiple Runs"
158
-
# Run 5 evaluations with limited tasks for testing
This script runs 3 evaluations in parallel and calculates average scores. You can modify `NUM_RUNS` in the script to change the number of runs.
65
+
Set `num_runs` in relevant config to the desired number of runs to run multiple evaluations and automatically enable parallel thinking for enhanced performance.
Set `num_runs` in relevant config to the desired number of runs to run multiple xbench-DeepSearch evaluations and automatically enable parallel thinking for enhanced performance.
0 commit comments