Skip to content

Conversation

@momo609
Copy link
Collaborator

@momo609 momo609 commented Nov 28, 2025

What this PR does / why we need it?

Due to the requirement of the FIA operator that the query.shape[0] must match actual_seq_len[-1], in graph mode and multi-DP scenarios, the query is padded to the size of num_input_token. This leads to validation errors during tiling in the operator. However, since the padding is applied at the end of the query, it does not affect the actual execution result of the operator, and the precision remains unaffected.
image
Our modification padding both actual_seq_len and actual_seq_len_kv to resolve the validation issue in the operator.

Does this PR introduce any user-facing change?

How was this patch tested?

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a validation error in the FIA operator by adjusting the padding logic for queries in graph mode and multi-DP scenarios. The changes correctly generalize the padding logic and fix a bug in how sequence lengths and token locations were padded. However, I've identified a critical issue where the padding for block_table is inconsistent with the padding for seq_lens, which will likely lead to a shape mismatch and runtime errors. The proposed fix addresses this inconsistency.

@Angazenn
Copy link
Collaborator

Please check whether padding block_table is necessary and whether it causes potential excessive memory consumption. cc @yiz-liu

@yiz-liu
Copy link
Collaborator

yiz-liu commented Nov 28, 2025

Please check whether padding block_table is necessary and whether it causes potential excessive memory consumption. cc @yiz-liu

Why did we pad this in the first place? @Angazenn

@yiz-liu yiz-liu changed the title fix fia error. [Fix] Fix FIA query and query_start_loc shape mismatch error Nov 28, 2025
@yiz-liu
Copy link
Collaborator

yiz-liu commented Nov 28, 2025

Please check whether padding block_table is necessary and whether it causes potential excessive memory consumption. cc @yiz-liu

Why did we pad this in the first place? @Angazenn

I think it's OK to just remove it, better check it with op developer.

@momo609 momo609 force-pushed the adde2e branch 4 times, most recently from 98841f5 to abcb13f Compare December 1, 2025 08:32
@MengqingCao MengqingCao added ready read for review ready-for-test start test by label for PR labels Dec 2, 2025
@momo609 momo609 force-pushed the adde2e branch 3 times, most recently from 8e02166 to 5b61368 Compare December 2, 2025 03:24
Signed-off-by: wangxiaoxin-sherie <[email protected]>
@weijinqian0 weijinqian0 merged commit 15dc01f into vllm-project:main Dec 3, 2025
22 checks passed
Mercykid-bash pushed a commit to Mercykid-bash/vllm-ascend that referenced this pull request Dec 4, 2025
…m-project#4518)

### What this PR does / why we need it?
Due to the requirement of the FIA operator that the **query.shape[0]**
must match **actual_seq_len[-1]**, in graph mode and multi-DP scenarios,
the query is padded to the size of **num_input_token**. This leads to
validation errors during tiling in the operator. However, since the
padding is applied at the end of the query, it does not affect the
actual execution result of the operator, and the precision remains
unaffected.
<img width="2434" height="49" alt="image"
src="https://github.com/user-attachments/assets/63520816-fbc3-4382-82b9-89dbb1492f6c"
/>
Our modification padding both **actual_seq_len** and
**actual_seq_len_kv** to resolve the validation issue in the operator.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.2
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

Signed-off-by: wangxiaoxin-sherie <[email protected]>
Co-authored-by: wangxiaoxin-sherie <[email protected]>
Signed-off-by: Che Ruan <[email protected]>
Mercykid-bash pushed a commit to Mercykid-bash/vllm-ascend that referenced this pull request Dec 4, 2025
…m-project#4518)

### What this PR does / why we need it?
Due to the requirement of the FIA operator that the **query.shape[0]**
must match **actual_seq_len[-1]**, in graph mode and multi-DP scenarios,
the query is padded to the size of **num_input_token**. This leads to
validation errors during tiling in the operator. However, since the
padding is applied at the end of the query, it does not affect the
actual execution result of the operator, and the precision remains
unaffected.
<img width="2434" height="49" alt="image"
src="https://github.com/user-attachments/assets/63520816-fbc3-4382-82b9-89dbb1492f6c"
/>
Our modification padding both **actual_seq_len** and
**actual_seq_len_kv** to resolve the validation issue in the operator.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.2
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

Signed-off-by: wangxiaoxin-sherie <[email protected]>
Co-authored-by: wangxiaoxin-sherie <[email protected]>
Signed-off-by: Che Ruan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants