Skip to content

Commit 303c0c5

Browse files
committed
add eval config for Qwen3-235B-A22B-Thinking-2507-FP8
Signed-off-by: Huamin Li <[email protected]>
1 parent 99722d5 commit 303c0c5

File tree

6 files changed

+26
-3
lines changed

6 files changed

+26
-3
lines changed
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
model_name: "Qwen/Qwen3-235B-A22B-Thinking-2507-FP8"
2+
backend: "vllm"
3+
tasks:
4+
- name: "mmlu_pro"
5+
metrics:
6+
- name: "exact_match,custom-extract"
7+
value: 0.77
8+
num_fewshot: 5
9+
limit: 250 # will run on 250 * 14 subjects = 3500 samples
10+
max_model_len: 8096
11+
gen_kwargs: "top_p=1,top_k=0,max_gen_toks=1536"
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
model_name: "Qwen/Qwen3-8B"
2+
backend: "vllm"
3+
tasks:
4+
- name: "mmlu_pro"
5+
metrics:
6+
- name: "exact_match,custom-extract"
7+
value: 0.60
8+
num_fewshot: 5
9+
limit: 250 # will run on 250 * 14 subjects = 3500 samples
10+
max_model_len: 8096
11+
gen_kwargs: "top_p=1,top_k=0,max_gen_toks=1536"
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
Meta-Llama-4-Maverick-17B-128E-Instruct-FP8.yaml
1+
Qwen3-235B-A22B-Thinking-2507-FP8.yaml
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
Qwen2.5-1.5B-Instruct.yaml
21
Meta-Llama-3.2-1B-Instruct-INT8-compressed-tensors.yaml
32
Meta-Llama-3-8B-Instruct-INT8-compressed-tensors-asym.yaml
43
Meta-Llama-3-8B-Instruct-nonuniform-compressed-tensors.yaml
54
Qwen2.5-VL-3B-Instruct-FP8-dynamic.yaml
65
Qwen1.5-MoE-W4A16-compressed-tensors.yaml
6+
Qwen3-8B.yaml

.buildkite/lm-eval-harness/test_lm_eval_correctness.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ def launch_lm_eval(eval_config, tp_size):
4040
# existing text models in CI, so only apply it for mm.
4141
apply_chat_template=backend == "vllm-vlm",
4242
batch_size=batch_size,
43+
gen_kwargs=eval_config.get("gen_kwargs", None),
4344
)
4445
return results
4546

.buildkite/test-pipeline.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1084,7 +1084,7 @@ steps:
10841084
- tests/weight_loading
10851085
commands:
10861086
- bash weight_loading/run_model_weight_loading_test.sh -c weight_loading/models-large.txt
1087-
1087+
10881088
- label: NixlConnector PD accuracy tests (Distributed) # 30min
10891089
timeout_in_minutes: 30
10901090
working_dir: "/vllm-workspace/tests"

0 commit comments

Comments
 (0)