-
Notifications
You must be signed in to change notification settings - Fork 624
[Fix] Fix FIA query and query_start_loc shape mismatch error
#4518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix a validation error in the FIA operator by adjusting the padding logic for queries in graph mode and multi-DP scenarios. The changes correctly generalize the padding logic and fix a bug in how sequence lengths and token locations were padded. However, I've identified a critical issue where the padding for block_table is inconsistent with the padding for seq_lens, which will likely lead to a shape mismatch and runtime errors. The proposed fix addresses this inconsistency.
|
Please check whether padding block_table is necessary and whether it causes potential excessive memory consumption. cc @yiz-liu |
query and query_start_loc shape mismatch error
98841f5 to
abcb13f
Compare
8e02166 to
5b61368
Compare
Signed-off-by: wangxiaoxin-sherie <[email protected]>
…m-project#4518) ### What this PR does / why we need it? Due to the requirement of the FIA operator that the **query.shape[0]** must match **actual_seq_len[-1]**, in graph mode and multi-DP scenarios, the query is padded to the size of **num_input_token**. This leads to validation errors during tiling in the operator. However, since the padding is applied at the end of the query, it does not affect the actual execution result of the operator, and the precision remains unaffected. <img width="2434" height="49" alt="image" src="https://github.com/user-attachments/assets/63520816-fbc3-4382-82b9-89dbb1492f6c" /> Our modification padding both **actual_seq_len** and **actual_seq_len_kv** to resolve the validation issue in the operator. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: wangxiaoxin-sherie <[email protected]> Co-authored-by: wangxiaoxin-sherie <[email protected]> Signed-off-by: Che Ruan <[email protected]>
…m-project#4518) ### What this PR does / why we need it? Due to the requirement of the FIA operator that the **query.shape[0]** must match **actual_seq_len[-1]**, in graph mode and multi-DP scenarios, the query is padded to the size of **num_input_token**. This leads to validation errors during tiling in the operator. However, since the padding is applied at the end of the query, it does not affect the actual execution result of the operator, and the precision remains unaffected. <img width="2434" height="49" alt="image" src="https://github.com/user-attachments/assets/63520816-fbc3-4382-82b9-89dbb1492f6c" /> Our modification padding both **actual_seq_len** and **actual_seq_len_kv** to resolve the validation issue in the operator. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: wangxiaoxin-sherie <[email protected]> Co-authored-by: wangxiaoxin-sherie <[email protected]> Signed-off-by: Che Ruan <[email protected]>
What this PR does / why we need it?
Due to the requirement of the FIA operator that the query.shape[0] must match actual_seq_len[-1], in graph mode and multi-DP scenarios, the query is padded to the size of num_input_token. This leads to validation errors during tiling in the operator. However, since the padding is applied at the end of the query, it does not affect the actual execution result of the operator, and the precision remains unaffected.

Our modification padding both actual_seq_len and actual_seq_len_kv to resolve the validation issue in the operator.
Does this PR introduce any user-facing change?
How was this patch tested?