Skip to content

Conversation

@hl475
Copy link
Contributor

@hl475 hl475 commented Oct 17, 2025

Purpose

  • This PR adds eval config for Qwen3-235B-A22B-Instruct-2507-FP8
  • We rename .buildkite/lm-eval-harness/configs/models-large-h100.txt to use .buildkite/lm-eval-harness/configs/models-large-hopper.txt per suggestion.
  • Add a new test/label label: LM Eval Large Models (H100) # optional to run configs/models-large-hopper.txt

Test Plan

pytest -s -v .buildkite/lm-eval-harness/test_lm_eval_correctness.py \
   --config-list-file .buildkite/lm-eval-harness/configs/models-large-hopper.txt \
   --tp-size 4

Test Result

WIP

VLLM_USE_DEEP_GEMM=1

1 passed, 104 warnings in 3062.67s (0:51:02)
https://buildkite.com/vllm/ci/builds/36443/steps/canvas?sid=019a272c-b17c-4404-96e9-0e83356d6bb8

VLLM_USE_DEEP_GEMM=1

1 passed, 104 warnings in 3041.13s (0:50:41)
https://buildkite.com/vllm/ci/builds/36457/steps/canvas?sid=019a27b5-4ec8-4166-8bba-091034e14805

VLLM_USE_DEEP_GEMM=0

1 passed, 104 warnings in 1900.52s (0:31:40)
https://buildkite.com/vllm/ci/builds/36470/steps/canvas?sid=019a2826-8d7a-4453-925b-472e24acf179

VLLM_USE_DEEP_GEMM=0 + kv_cache_dtype=fp8

1 passed, 104 warnings in 1445.30s (0:24:05)
https://buildkite.com/vllm/ci/builds/36607/steps/canvas?sid=019a2b72-d030-4a51-aefa-ff0838ffc8fb


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added ci/build qwen Related to Qwen models labels Oct 17, 2025
@hl475 hl475 force-pushed the Qwen3-235B-A22B-Thinking-2507-FP8 branch from ca0144d to ed452ff Compare October 17, 2025 17:38
@hl475 hl475 force-pushed the Qwen3-235B-A22B-Thinking-2507-FP8 branch 4 times, most recently from 8720e75 to 303c0c5 Compare October 17, 2025 23:51
@hl475 hl475 changed the title [CI/Build][WIP] Add eval config for Qwen3-235B-A22B-Thinking-2507-FP8 [CI/Build]Add eval config for Qwen3-235B-A22B-Thinking-2507-FP8 and Qwen3-8B Oct 18, 2025
@hl475 hl475 force-pushed the Qwen3-235B-A22B-Thinking-2507-FP8 branch from 418d612 to bb9a649 Compare October 18, 2025 05:41
@hl475 hl475 marked this pull request as ready for review October 20, 2025 16:58
@hl475 hl475 requested review from mgoin and simon-mo as code owners October 20, 2025 16:58
@hl475
Copy link
Contributor Author

hl475 commented Oct 20, 2025

cc @yeqcharlotte @zhewenl

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@hl475 hl475 force-pushed the Qwen3-235B-A22B-Thinking-2507-FP8 branch from 1c7f37a to 30f2bc0 Compare October 20, 2025 17:18
@zhewenl
Copy link
Collaborator

zhewenl commented Oct 20, 2025

cc @yeqcharlotte @zhewenl

LGTM, have we verified by triggering an example CI on buildkite?

@hl475 hl475 force-pushed the Qwen3-235B-A22B-Thinking-2507-FP8 branch from 30f2bc0 to 99d95b6 Compare October 20, 2025 20:46
@hl475
Copy link
Contributor Author

hl475 commented Oct 20, 2025

cc @yeqcharlotte @zhewenl

LGTM, have we verified by triggering an example CI on buildkite?

Thanks! I did it once over the weekend. Let me schedule a new one https://buildkite.com/vllm/ci/builds/35613/steps/canvas

LM Eval Small Models https://buildkite.com/vllm/ci/builds/35613/steps/canvas?sid=019a0362-abbd-407f-8d10-243f063b0792

LM Eval Large Models (H200) https://buildkite.com/vllm/ci/builds/35613/steps/canvas?jid=019a04d4-029e-4ed2-90e6-ccc726f215d1

@hl475 hl475 force-pushed the Qwen3-235B-A22B-Thinking-2507-FP8 branch from 99d95b6 to 6003d66 Compare October 21, 2025 06:42
@@ -1 +1,2 @@
Meta-Llama-4-Maverick-17B-128E-Instruct-FP8.yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also want to test llama4? Or we can just start with Qwen?

@hl475 hl475 force-pushed the Qwen3-235B-A22B-Thinking-2507-FP8 branch from ef047d9 to b09ee61 Compare October 27, 2025 07:13
@hl475 hl475 changed the title [CI/Build]Add eval config for Qwen3-235B-A22B-Thinking-2507-FP8 and Qwen3-8B [CI/Build]Add eval config for Qwen3-235B-A22B-Instruct-2507-FP8 Oct 27, 2025
@hl475 hl475 force-pushed the Qwen3-235B-A22B-Thinking-2507-FP8 branch 6 times, most recently from bfcf20c to 69ac35f Compare October 28, 2025 00:07
@yeqcharlotte
Copy link
Collaborator

how long does it take with and without VLLM_USE_DEEP_GEMM? sounds like qwen model is not doing good with deepgemm? cc: @minosfuture @houseroad

@hl475
Copy link
Contributor Author

hl475 commented Oct 28, 2025

how long does it take with and without VLLM_USE_DEEP_GEMM? sounds like qwen model is not doing good with deepgemm? cc: @minosfuture @houseroad

there seems some build issue at the moment (sample failure, error msg Failed to fetch http://security.ubuntu.com/ubuntu) - will have to retry later

@hl475 hl475 force-pushed the Qwen3-235B-A22B-Thinking-2507-FP8 branch 2 times, most recently from 01794c3 to 33c059c Compare October 28, 2025 15:29
@hl475
Copy link
Contributor Author

hl475 commented Oct 28, 2025

@yeqcharlotte - updated the test result in the Test Result section.

The fastest I can get using H100 CI is about 24 minutes. As a comparison, the same setting spent about 15 minutes on my local H100. One obvious difference between the two H100 are - my local H100 is 96GB and CI H100 is 80GB.

@hl475 hl475 force-pushed the Qwen3-235B-A22B-Thinking-2507-FP8 branch from 33c059c to 0222140 Compare October 29, 2025 16:50
@hl475 hl475 requested a review from yeqcharlotte October 29, 2025 16:50
@yeqcharlotte
Copy link
Collaborator

let's get it started!

@yeqcharlotte yeqcharlotte enabled auto-merge (squash) October 30, 2025 06:51
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 30, 2025
@yeqcharlotte yeqcharlotte merged commit 5be1bed into vllm-project:main Oct 30, 2025
18 checks passed
MatthewBonanni pushed a commit to MatthewBonanni/vllm that referenced this pull request Oct 30, 2025
ilmarkov pushed a commit to neuralmagic/vllm that referenced this pull request Nov 7, 2025
ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
eldarkurtic pushed a commit to eldarkurtic/vllm that referenced this pull request Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants