[CI/Build]Add eval config for Qwen3-235B-A22B-Instruct-2507-FP8 #27113

hl475 · 2025-10-17T17:17:19Z

Purpose

This PR adds eval config for Qwen3-235B-A22B-Instruct-2507-FP8
We rename .buildkite/lm-eval-harness/configs/models-large-h100.txt to use .buildkite/lm-eval-harness/configs/models-large-hopper.txt per suggestion.
Add a new test/label label: LM Eval Large Models (H100) # optional to run configs/models-large-hopper.txt

Test Plan

pytest -s -v .buildkite/lm-eval-harness/test_lm_eval_correctness.py \
   --config-list-file .buildkite/lm-eval-harness/configs/models-large-hopper.txt \
   --tp-size 4

Test Result

WIP

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

.buildkite/lm-eval-harness/configs/Qwen3-235B-A22B-Thinking-2507-FP8.yaml

.buildkite/lm-eval-harness/configs/models-medium-h100.txt

hl475 · 2025-10-20T16:58:29Z

cc @yeqcharlotte @zhewenl

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

.buildkite/test-pipeline.yaml

.buildkite/lm-eval-harness/test_lm_eval_correctness.py

.buildkite/lm-eval-harness/configs/Qwen3-235B-A22B-Thinking-2507-FP8.yaml

.buildkite/lm-eval-harness/configs/Qwen3-8B.yaml

zhewenl · 2025-10-20T19:42:58Z

cc @yeqcharlotte @zhewenl

LGTM, have we verified by triggering an example CI on buildkite?

hl475 · 2025-10-20T20:53:53Z

cc @yeqcharlotte @zhewenl

LGTM, have we verified by triggering an example CI on buildkite?

Thanks! I did it once over the weekend. Let me schedule a new one https://buildkite.com/vllm/ci/builds/35613/steps/canvas

LM Eval Small Models https://buildkite.com/vllm/ci/builds/35613/steps/canvas?sid=019a0362-abbd-407f-8d10-243f063b0792

LM Eval Large Models (H200) https://buildkite.com/vllm/ci/builds/35613/steps/canvas?jid=019a04d4-029e-4ed2-90e6-ccc726f215d1

.buildkite/test-pipeline.yaml

.buildkite/lm-eval-harness/configs/models-large-h100.txt

.buildkite/lm-eval-harness/configs/Qwen3-235B-A22B-Thinking-2507-FP8.yaml

.buildkite/test-pipeline.yaml

zhewenl · 2025-10-21T23:06:47Z

.buildkite/lm-eval-harness/configs/models-large-hopper.txt

@@ -1 +1,2 @@
 Meta-Llama-4-Maverick-17B-128E-Instruct-FP8.yaml


Do we also want to test llama4? Or we can just start with Qwen?

yeqcharlotte · 2025-10-28T03:18:47Z

how long does it take with and without VLLM_USE_DEEP_GEMM? sounds like qwen model is not doing good with deepgemm? cc: @minosfuture @houseroad

hl475 · 2025-10-28T04:30:12Z

how long does it take with and without VLLM_USE_DEEP_GEMM? sounds like qwen model is not doing good with deepgemm? cc: @minosfuture @houseroad

there seems some build issue at the moment (sample failure, error msg Failed to fetch http://security.ubuntu.com/ubuntu) - will have to retry later

hl475 · 2025-10-28T17:17:01Z

@yeqcharlotte - updated the test result in the Test Result section.

The fastest I can get using H100 CI is about 24 minutes. As a comparison, the same setting spent about 15 minutes on my local H100. One obvious difference between the two H100 are - my local H100 is 96GB and CI H100 is 80GB.

Signed-off-by: Huamin Li <[email protected]>

yeqcharlotte · 2025-10-30T06:50:56Z

let's get it started!

…-project#27113) Signed-off-by: Huamin Li <[email protected]>

…-project#27113) Signed-off-by: Huamin Li <[email protected]> Signed-off-by: Eldar Kurtic <[email protected]>

mergify bot added ci/build qwen Related to Qwen models labels Oct 17, 2025

hl475 force-pushed the Qwen3-235B-A22B-Thinking-2507-FP8 branch from ca0144d to ed452ff Compare October 17, 2025 17:38

zhewenl reviewed Oct 17, 2025

View reviewed changes

.buildkite/lm-eval-harness/configs/Qwen3-235B-A22B-Thinking-2507-FP8.yaml Outdated Show resolved Hide resolved

zhewenl reviewed Oct 17, 2025

View reviewed changes

.buildkite/lm-eval-harness/configs/models-medium-h100.txt Outdated Show resolved Hide resolved

hl475 force-pushed the Qwen3-235B-A22B-Thinking-2507-FP8 branch 4 times, most recently from 8720e75 to 303c0c5 Compare October 17, 2025 23:51

hl475 changed the title ~~[CI/Build][WIP] Add eval config for Qwen3-235B-A22B-Thinking-2507-FP8~~ [CI/Build]Add eval config for Qwen3-235B-A22B-Thinking-2507-FP8 and Qwen3-8B Oct 18, 2025

hl475 force-pushed the Qwen3-235B-A22B-Thinking-2507-FP8 branch from 418d612 to bb9a649 Compare October 18, 2025 05:41

hl475 mentioned this pull request Oct 20, 2025

Enhance the doc only change detection logic to skip the pipeline vllm-project/ci-infra#197

Merged

hl475 marked this pull request as ready for review October 20, 2025 16:58

hl475 requested review from mgoin and simon-mo as code owners October 20, 2025 16:58

chatgpt-codex-connector bot reviewed Oct 20, 2025

View reviewed changes

.buildkite/test-pipeline.yaml Outdated Show resolved Hide resolved

hl475 force-pushed the Qwen3-235B-A22B-Thinking-2507-FP8 branch from 1c7f37a to 30f2bc0 Compare October 20, 2025 17:18

luccafong reviewed Oct 20, 2025

View reviewed changes

.buildkite/lm-eval-harness/test_lm_eval_correctness.py Show resolved Hide resolved

zhewenl approved these changes Oct 20, 2025

View reviewed changes

.buildkite/lm-eval-harness/configs/Qwen3-235B-A22B-Thinking-2507-FP8.yaml Outdated Show resolved Hide resolved

.buildkite/lm-eval-harness/configs/Qwen3-8B.yaml Outdated Show resolved Hide resolved

hl475 force-pushed the Qwen3-235B-A22B-Thinking-2507-FP8 branch from 30f2bc0 to 99d95b6 Compare October 20, 2025 20:46

yeqcharlotte reviewed Oct 21, 2025

View reviewed changes

.buildkite/test-pipeline.yaml Outdated Show resolved Hide resolved

yeqcharlotte reviewed Oct 21, 2025

View reviewed changes

.buildkite/lm-eval-harness/configs/models-large-h100.txt Outdated Show resolved Hide resolved

yeqcharlotte reviewed Oct 21, 2025

View reviewed changes

.buildkite/lm-eval-harness/configs/Qwen3-235B-A22B-Thinking-2507-FP8.yaml Outdated Show resolved Hide resolved

hl475 force-pushed the Qwen3-235B-A22B-Thinking-2507-FP8 branch from 99d95b6 to 6003d66 Compare October 21, 2025 06:42

zhewenl reviewed Oct 21, 2025

View reviewed changes

hl475 force-pushed the Qwen3-235B-A22B-Thinking-2507-FP8 branch from ef047d9 to b09ee61 Compare October 27, 2025 07:13

hl475 changed the title ~~[CI/Build]Add eval config for Qwen3-235B-A22B-Thinking-2507-FP8 and Qwen3-8B~~ [CI/Build]Add eval config for Qwen3-235B-A22B-Instruct-2507-FP8 Oct 27, 2025

hl475 force-pushed the Qwen3-235B-A22B-Thinking-2507-FP8 branch 6 times, most recently from bfcf20c to 69ac35f Compare October 28, 2025 00:07

hl475 force-pushed the Qwen3-235B-A22B-Thinking-2507-FP8 branch 2 times, most recently from 01794c3 to 33c059c Compare October 28, 2025 15:29

add eval config for Qwen3-235B-A22B-Thinking-2507-FP8

0222140

Signed-off-by: Huamin Li <[email protected]>

hl475 force-pushed the Qwen3-235B-A22B-Thinking-2507-FP8 branch from 33c059c to 0222140 Compare October 29, 2025 16:50

hl475 requested a review from yeqcharlotte October 29, 2025 16:50

yeqcharlotte approved these changes Oct 30, 2025

View reviewed changes

yeqcharlotte enabled auto-merge (squash) October 30, 2025 06:51

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 30, 2025

yeqcharlotte merged commit 5be1bed into vllm-project:main Oct 30, 2025
18 checks passed

MatthewBonanni pushed a commit to MatthewBonanni/vllm that referenced this pull request Oct 30, 2025

[CI/Build]Add eval config for Qwen3-235B-A22B-Instruct-2507-FP8 (vllm…

dc1a097

…-project#27113) Signed-off-by: Huamin Li <[email protected]>

ilmarkov pushed a commit to neuralmagic/vllm that referenced this pull request Nov 7, 2025

[CI/Build]Add eval config for Qwen3-235B-A22B-Instruct-2507-FP8 (vllm…

277f105

…-project#27113) Signed-off-by: Huamin Li <[email protected]>

ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025

[CI/Build]Add eval config for Qwen3-235B-A22B-Instruct-2507-FP8 (vllm…

1633e11

…-project#27113) Signed-off-by: Huamin Li <[email protected]>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[CI/Build]Add eval config for Qwen3-235B-A22B-Instruct-2507-FP8 (vllm…

3e6be0a

…-project#27113) Signed-off-by: Huamin Li <[email protected]>

eldarkurtic pushed a commit to eldarkurtic/vllm that referenced this pull request Nov 12, 2025

[CI/Build]Add eval config for Qwen3-235B-A22B-Instruct-2507-FP8 (vllm…

af35efb

…-project#27113) Signed-off-by: Huamin Li <[email protected]> Signed-off-by: Eldar Kurtic <[email protected]>

		@@ -1 +1,2 @@
		Meta-Llama-4-Maverick-17B-128E-Instruct-FP8.yaml

Uh oh!

[CI/Build]Add eval config for Qwen3-235B-A22B-Instruct-2507-FP8 #27113

[CI/Build]Add eval config for Qwen3-235B-A22B-Instruct-2507-FP8 #27113

Uh oh!

Conversation

hl475 commented Oct 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

VLLM_USE_DEEP_GEMM=1

VLLM_USE_DEEP_GEMM=1

VLLM_USE_DEEP_GEMM=0

VLLM_USE_DEEP_GEMM=0 + kv_cache_dtype=fp8

Uh oh!

Uh oh!

Uh oh!

hl475 commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhewenl commented Oct 20, 2025

Uh oh!

hl475 commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhewenl Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

yeqcharlotte commented Oct 28, 2025

Uh oh!

hl475 commented Oct 28, 2025

Uh oh!

hl475 commented Oct 28, 2025

Uh oh!

yeqcharlotte commented Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hl475 commented Oct 17, 2025 •

edited by github-actions bot

Loading

hl475 commented Oct 20, 2025 •

edited

Loading

hl475 commented Oct 20, 2025 •

edited

Loading