-
Notifications
You must be signed in to change notification settings - Fork 653
Fix 310P issues in main #3779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix 310P issues in main #3779
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces fixes for the 310P platform. The changes include correcting the model used in a vision-language test, adding a dedicated KV cache initialization path for 310P, and skipping ATB warmup on this platform.
My review identifies a couple of issues in the new _initialize_kv_cache_tensors_310p method. There's a critical bug where the KV cache tensors are created with a hardcoded torch.float16 dtype, which should be corrected to use the dtype from the kv_cache_spec. Additionally, there's a misleading error message when KV cache transfer is used, which could make debugging difficult. I've provided suggestions to fix both of these issues.
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
4267e55 to
588c472
Compare
Signed-off-by: leo-pony <[email protected]>
588c472 to
eba8d34
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
@leo-pony Hello, when do you plan to merge this code? My current feature development depends on this modification. |
这部分修改好像只能启动qwen3-vl,启动后可以进行文本推理,但是一旦执行图像推理的时候就挂掉了,报错详情如下: [rank1]:[E1211 14:16:37.931188062 compiler_depend.ts:444] SelfAttentionOperation setup failed! |

What this PR does / why we need it?
Fix 310P issues in main.
Test on 310p NPU host, models as following:

Qwen/Qwen3-0.6B
Qwen/Qwen2.5-7B-Instruct
Qwen2.5-VL-3B-Instruct
Test cases are passed.
Know Issue:
For pangu-pro-moe-model model, has error:
ValueError: Unsupported soc_version: AscendSocVersion.UNDEFINED
Detail can see in PR:#2676
Does this PR introduce any user-facing change?
How was this patch tested?