[Misc][LLaMa4] Compile LLaMa Vision Encoder layers #27900

Lucaskabela · 2025-10-31T22:36:42Z

Purpose

We want to speedup up inference for mllama4 by applying torch.compile to the intensive workload, similar to what is done in #23207. We start by experimenting with the VisionEncoderLayer + PixelShuffle

Test Plan

vllm serve meta-llama/Llama-4-Scout-17B-16E-Instruct --tensor-parallel-size=8 --gpu_memory_utilization=.8 --max_model_len=8192

vllm bench serve   --backend openai-chat   --model meta-llama/Llama-4-Scout-17B-16E-Instruct    --endpoint /v1/chat/completions   --dataset-name hf   --dataset-path lmarena-ai/VisionArena-Chat   --hf-split train   --num-prompts 1000

Test Result

	Baseline (main)	This Pr
Successful requests	998	998
Benchmark duration (s)	998	998
Successful requests	72.52	62.15
Total generated tokens	117376	117504
Request throughput (req/s)	13.76	16.06
Output token throughput (tok/s)	1618.52	1890.73
Mean TTFT (ms)	35483.34	28623.5
Mean TPOT (ms)	264.74	233.7
Mean ITL (ms)	256.56	227.07

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

vision Signed-off-by: Lucas Kabela <[email protected]>

Signed-off-by: Lucas Kabela <[email protected]>

Lucaskabela · 2025-10-31T22:41:48Z

Signed-off-by: Lucas Kabela <[email protected]>

Lucaskabela · 2025-11-04T18:17:51Z

cc @zou3519 @ProExpertProg @ywang96

Lucaskabela added 2 commits October 29, 2025 08:38

Add compile for llama4

f4a08d0

vision Signed-off-by: Lucas Kabela <[email protected]>

Move to encoder layer to observe speedup

8a8a772

Signed-off-by: Lucas Kabela <[email protected]>

mergify bot added the llama Related to Llama models label Oct 31, 2025

Lucaskabela changed the title ~~[Misc][LLaMa4] Compile LLaMa Vision Encoder layers~~ [Draft][DO NOT MERGE][Misc][LLaMa4] Compile LLaMa Vision Encoder layers Oct 31, 2025

Fix to VisionEncoder instead of any one layer

d807a34

Signed-off-by: Lucas Kabela <[email protected]>

Lucaskabela force-pushed the lucaskabela/compile_llama4 branch from 29dec46 to d807a34 Compare November 4, 2025 17:56

Merge branch 'main' into lucaskabela/compile_llama4

9cfbfbe

Lucaskabela marked this pull request as ready for review November 4, 2025 18:17

Lucaskabela changed the title ~~[Draft][DO NOT MERGE][Misc][LLaMa4] Compile LLaMa Vision Encoder layers~~ [Misc][LLaMa4] Compile LLaMa Vision Encoder layers Nov 4, 2025

zou3519 requested review from ProExpertProg, ywang96 and zou3519 November 5, 2025 15:20

Merge branch 'main' into lucaskabela/compile_llama4

0b1cffc

ywang96 approved these changes Nov 7, 2025

View reviewed changes

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Misc][LLaMa4] Compile LLaMa Vision Encoder layers #27900

[Misc][LLaMa4] Compile LLaMa Vision Encoder layers #27900

Lucaskabela commented Oct 31, 2025 •

edited by github-actions bot

Loading

Uh oh!

Lucaskabela commented Oct 31, 2025 •

edited

Loading

Uh oh!

Lucaskabela commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Misc][LLaMa4] Compile LLaMa Vision Encoder layers #27900

Are you sure you want to change the base?

[Misc][LLaMa4] Compile LLaMa Vision Encoder layers #27900

Conversation

Lucaskabela commented Oct 31, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Lucaskabela commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lucaskabela commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Lucaskabela commented Oct 31, 2025 •

edited by github-actions bot

Loading

Lucaskabela commented Oct 31, 2025 •

edited

Loading