File tree Expand file tree Collapse file tree 1 file changed +4
-4
lines changed Expand file tree Collapse file tree 1 file changed +4
-4
lines changed Original file line number Diff line number Diff line change @@ -56,14 +56,14 @@ OPENAI_BASE_URL="https://api.openai.com/v1"
5656### Step 3: Run the Evaluation
5757
5858``` bash title="Run HLE Evaluation"
59- uv run main.py common-benchmark --config_file_name=agent_hle_claude37sonnet benchmark=hle output_dir=" logs/hle/$( date +" %Y%m%d_%H%M" ) "
59+ uv run main.py common-benchmark --config_file_name=agent_hle_claude37sonnet output_dir=" logs/hle/$( date +" %Y%m%d_%H%M" ) "
6060```
6161
6262!!! tip "Resume Interrupted Evaluation"
6363 Specify the same output directory to continue from where you left off:
6464
6565 ```bash
66- uv run main.py common-benchmark --config_file_name=agent_hle_claude37sonnet benchmark=hle output_dir="logs/hle/20251014_1504"
66+ uv run main.py common-benchmark --config_file_name=agent_hle_claude37sonnet output_dir="logs/hle/20251014_1504"
6767 ```
6868
6969### Step 4: Review Results
@@ -83,13 +83,13 @@ cat logs/hle/*/benchmark_results.jsonl
8383### Test with Limited Tasks
8484
8585``` bash
86- uv run main.py common-benchmark --config_file_name=agent_hle_claude37sonnet benchmark=hle benchmark .execution.max_tasks=10 output_dir=" logs/hle/$( date +" %Y%m%d_%H%M" ) "
86+ uv run main.py common-benchmark --config_file_name=agent_hle_claude37sonnet benchmark.execution.max_tasks=10 output_dir=" logs/hle/$( date +" %Y%m%d_%H%M" ) "
8787```
8888
8989### Adjust Concurrency
9090
9191``` bash
92- uv run main.py common-benchmark --config_file_name=agent_hle_claude37sonnet benchmark=hle benchmark .execution.max_concurrent=5 output_dir=" logs/hle/$( date +" %Y%m%d_%H%M" ) "
92+ uv run main.py common-benchmark --config_file_name=agent_hle_claude37sonnet benchmark.execution.max_concurrent=5 output_dir=" logs/hle/$( date +" %Y%m%d_%H%M" ) "
9393```
9494
9595---
You can’t perform that action at this time.
0 commit comments