Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion vllm/model_executor/models/transformers/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
from transformers.modeling_utils import ALL_ATTENTION_FUNCTIONS

from vllm.attention import Attention, AttentionType
from vllm.attention.layers.encoder_only_attention import EncoderOnlyAttention
from vllm.config.utils import getattr_iter
from vllm.distributed import get_pp_group, get_tp_group
from vllm.distributed.utils import get_pp_indices
Expand Down Expand Up @@ -336,7 +337,12 @@ def create_attention_instances(self) -> dict[int, Attention]:
):
per_layer_sliding_window = self.config.sliding_window

attention_instances[i] = Attention(
attn_cls = (
Copy link
Collaborator

@NickLucche NickLucche Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heheda12345 do you want to handle it inside the Attention class init to signal deprecation with a warning?

Copy link
Collaborator

@heheda12345 heheda12345 Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to handle it like this now. For the deprecation warning, just add one line of warning in Attention class? (not necessary in this PR)

EncoderOnlyAttention
if attn_type == AttentionType.ENCODER_ONLY
else Attention
)
attention_instances[i] = attn_cls(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does passing attn_type not work? Are the two not equivalent?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_heads=num_heads,
head_size=head_size,
# NOTE: We use Llama scale as default, if it's set by
Expand Down