Skip to content

Commit bb9a649

Browse files
committed
add eval config for Qwen3-235B-A22B-Thinking-2507-FP8
Signed-off-by: Huamin Li <[email protected]>
1 parent 7c57254 commit bb9a649

File tree

6 files changed

+37
-3
lines changed

6 files changed

+37
-3
lines changed
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
model_name: "Qwen/Qwen3-235B-A22B-Thinking-2507-FP8"
2+
backend: "vllm"
3+
tasks:
4+
- name: "mmlu_pro"
5+
metrics:
6+
- name: "exact_match,custom-extract"
7+
value: 0.77
8+
num_fewshot: 5
9+
limit: 250 # will run on 250 * 14 subjects = 3500 samples
10+
max_model_len: 8096
11+
gen_kwargs: "top_p=1,top_k=0,max_gen_toks=1536"
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
model_name: "Qwen/Qwen3-8B"
2+
backend: "vllm"
3+
tasks:
4+
- name: "mmlu_pro"
5+
metrics:
6+
- name: "exact_match,custom-extract"
7+
value: 0.60
8+
num_fewshot: 5
9+
limit: 250 # will run on 250 * 14 subjects = 3500 samples
10+
max_model_len: 8096
11+
gen_kwargs: "top_p=1,top_k=0,max_gen_toks=1536"
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
Meta-Llama-4-Maverick-17B-128E-Instruct-FP8.yaml
1+
Qwen3-235B-A22B-Thinking-2507-FP8.yaml
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
Qwen2.5-1.5B-Instruct.yaml
21
Meta-Llama-3.2-1B-Instruct-INT8-compressed-tensors.yaml
32
Meta-Llama-3-8B-Instruct-INT8-compressed-tensors-asym.yaml
43
Meta-Llama-3-8B-Instruct-nonuniform-compressed-tensors.yaml
54
Qwen2.5-VL-3B-Instruct-FP8-dynamic.yaml
65
Qwen1.5-MoE-W4A16-compressed-tensors.yaml
6+
Qwen3-8B.yaml

.buildkite/lm-eval-harness/test_lm_eval_correctness.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ def launch_lm_eval(eval_config, tp_size):
4040
# existing text models in CI, so only apply it for mm.
4141
apply_chat_template=backend == "vllm-vlm",
4242
batch_size=batch_size,
43+
gen_kwargs=eval_config.get("gen_kwargs", None),
4344
)
4445
return results
4546

.buildkite/test-pipeline.yaml

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1084,7 +1084,7 @@ steps:
10841084
- tests/weight_loading
10851085
commands:
10861086
- bash weight_loading/run_model_weight_loading_test.sh -c weight_loading/models-large.txt
1087-
1087+
10881088
- label: NixlConnector PD accuracy tests (Distributed) # 30min
10891089
timeout_in_minutes: 30
10901090
working_dir: "/vllm-workspace/tests"
@@ -1140,6 +1140,17 @@ steps:
11401140
- pytest -v -s tests/distributed/test_context_parallel.py
11411141
- CUDA_VISIBLE_DEVICES=1,2 VLLM_ALL2ALL_BACKEND=deepep_high_throughput VLLM_USE_DEEP_GEMM=1 VLLM_LOGGING_LEVEL=DEBUG python3 examples/offline_inference/data_parallel.py --model Qwen/Qwen1.5-MoE-A2.7B --tp-size=1 --dp-size=2 --max-model-len 2048
11421142

1143+
- label: LM Eval Large Models (H200) # optional
1144+
gpu: h200
1145+
optional: true
1146+
num_gpus: 4
1147+
working_dir: "/vllm-workspace/.buildkite/lm-eval-harness"
1148+
source_file_dependencies:
1149+
- csrc/
1150+
- vllm/model_executor/layers/quantization
1151+
commands:
1152+
- pytest -s -v test_lm_eval_correctness.py --config-list-file=configs/models-large-h100.txt --tp-size=4
1153+
11431154
##### B200 test #####
11441155
- label: Distributed Tests (B200) # optional
11451156
gpu: b200

0 commit comments

Comments
 (0)