Skip to content

Commit e0c5073

Browse files
[Bugfix]fix bmm_transpose ops for cann version (#4653)
### What this PR does / why we need it? Due to the upgrade of CANN version, custom op cannot be used in high version. In the high level cann version, the ops will start with redundant vector core while this ops will only use cube core, this results in the missalign when copy data from ub memory to global memory. So add limitation to the ops to make it use cube core only. ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: hust17yixuan <[email protected]> Co-authored-by: wangxiyuan <[email protected]>
1 parent a78f49e commit e0c5073

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

csrc/batch_matmul_transpose/op_kernel/batch_matmul_transpose_kernel.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -658,6 +658,7 @@ class PpMatmulEinSum
658658
extern "C" __global__ __aicore__ void batch_matmul_transpose(GM_ADDR gm_a, GM_ADDR gm_b, GM_ADDR gm_c,
659659
GM_ADDR gm_tiling_data)
660660
{
661+
KERNEL_TASK_TYPE_DEFAULT(KERNEL_TYPE_AIC_ONLY);
661662
PpMatmulEinSum<0, false, false, half, half, DataFormat::ND>
662663
einsum_0_n_fp16_nd; // swizzleDir[0] transA[0] transB[0] DtypeA[001] DtypeB[001] DtypeC[001] DataFormatA[0]
663664
// DataFormatB[0]

0 commit comments

Comments
 (0)