Skip to content

Commit d7db679

Browse files
GDzhu01MengqingCao
andauthored
[Bugfix] Support for mlapo in deepseekv3.1 w4a8 (#4828)
### What this PR does / why we need it? Support for mlapo in deepseekv3.1 w4a8, since the csrc of mlapo requires the input args `enable_inner_out` and `inner_out`, we add the dummy args here - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: GDzhu01 <[email protected]> Co-authored-by: Mengqing Cao <[email protected]>
1 parent 8bb0284 commit d7db679

File tree

1 file changed

+4
-2
lines changed

1 file changed

+4
-2
lines changed

vllm_ascend/attention/mla_v1.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1064,7 +1064,8 @@ def _process_weights_for_fused_mlapo(self, act_dtype: torch.dtype):
10641064

10651065
device = self.q_proj.weight.device
10661066
self.gamma1 = self.q_a_layernorm.weight.data
1067-
self.beta1 = self.q_a_layernorm.bias.data
1067+
self.beta1 = torch.zeros_like(self.gamma1) if (
1068+
_bias := self.q_a_layernorm.bias) is None else _bias.data
10681069
self.gamma2 = self.kv_a_layernorm.weight.data
10691070
self.quant_scale0 = self.fused_qkv_a_proj.input_scale.data
10701071
self.quant_offset0 = self.fused_qkv_a_proj.input_offset.data
@@ -1460,7 +1461,8 @@ def _mla_decode_preprocess(self, hidden_states, kv_cache, attn_metadata):
14601461
kv_cache_out0=decode_k_nope,
14611462
q_out1=decode_q_pe,
14621463
kv_cache_out1=decode_k_pe,
1463-
)
1464+
enable_inner_out=False,
1465+
inner_out=torch.tensor([], device=hidden_states.device))
14641466
decode_q_nope = decode_q_nope.view(bsz, self.num_heads,
14651467
self.kv_lora_rank)
14661468
decode_q_pe = decode_q_pe.view(bsz, self.num_heads, -1)

0 commit comments

Comments
 (0)