Fixed paged|FA2 kernel loading logic and UT. #42547

YangKai0616 · 2025-12-02T08:48:20Z

What does this PR do?

1. [General] When running the test case tests/generation/test_continuous_batching.py::ContinuousBatchingTest::test_continuous_batching_parity_qwen_flash, it reports error as follows:

                flash_attn_func = getattr(kernel, "flash_attn_func", None)
                flash_attn_varlen_func = getattr(kernel, "flash_attn_varlen_func", None)
                if flash_attn_varlen_func is None:
>                   raise ValueError(
                        f"Could not find the currently requested flash attention implementation at `{implementation}`."
                        "Make sure that you request a valid kernel from the hub, e.g. `kernels-community/flash-attn2`."
                    )
E                   ValueError: Could not find the currently requested flash attention implementation at `flash_attention_2`.Make sure that you request a valid kernel from the hub, e.g. `kernels-community/flash-attn2`.

src/transformers/modeling_flash_attention_utils.py:115: ValueError

The root cause is a kernel loading failure of paged|flash_attention_2. This PR fixes the issue.

2. [XPU] The above test case reports error E AssertionError: Test request_id = 'req_1' failed, no expected output was provided..... after successfully running on both XPU and CUDA. However, on the XPU side, the test case can stably pass by using require_deterministic_for_xpu.

For enabling the related tests on XPU, please refer to PR #42536.

YangKai0616 · 2025-12-02T08:50:52Z

@vasqu , please help review, thanks!

vasqu

I would like to extend this a bit to allow general CB support for kernels without 2x loading the kernel.

The idea is to modify the fallback to include the paged| prefix directly + load the kernel properly via lazy_import_paged_flash_attention. This should allow us to use the prefix version directly as well, e.g. attn_implementation="paged|kernels-community/flash-attn2"

src/transformers/modeling_utils.py

vasqu

Let's simplify a bit, introducing another variable will be even more confusing on second thought. We can avoid that

src/transformers/modeling_utils.py

YangKai0616 · 2025-12-03T15:06:52Z

Strange, the CI failing example tests/models/mvp/test_modeling_mvp.py::MvpHeadTests::test_generate_beam_search can pass on both CUDA (A100) and XPU for me (with torch 2.9).

vasqu · 2025-12-03T16:16:20Z

Yup, not sure what's going on. It's an annoying flaky test. Added a new commit to simplify something, let's see if CI passes this time otherwise I'll try to get a core maintainer to merge if necessary.

vasqu

Forgot to approve but lgtm now

Fixed UT and kernel loading logic.

550f03e

YangKai0616 changed the title ~~Fixed UT and kernel loading logic.~~ Fixed paged|FA2 kernel loading logic and UT. Dec 2, 2025

vasqu reviewed Dec 2, 2025

View reviewed changes

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

YangKai0616 added 2 commits December 3, 2025 02:52

Revision based on comments

e385b22

Merge branch 'main' into main

2f4ab63

vasqu reviewed Dec 3, 2025

View reviewed changes

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

YangKai0616 added 3 commits December 3, 2025 14:35

Simplify code

5fb03a1

Merge branch 'main' into main

111f716

make style

f1884fb

simplify CB part

39b7e70

Merge branch 'main' into main

a1f760c

vasqu approved these changes Dec 3, 2025

View reviewed changes

retrigger ci

5ee5c9c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixed paged|FA2 kernel loading logic and UT. #42547

Fixed paged|FA2 kernel loading logic and UT. #42547

YangKai0616 commented Dec 2, 2025

Uh oh!

YangKai0616 commented Dec 2, 2025

Uh oh!

vasqu left a comment

Uh oh!

Uh oh!

Uh oh!

vasqu left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

YangKai0616 commented Dec 3, 2025

Uh oh!

vasqu commented Dec 3, 2025

Uh oh!

vasqu left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fixed paged|FA2 kernel loading logic and UT. #42547

Are you sure you want to change the base?

Fixed paged|FA2 kernel loading logic and UT. #42547

Conversation

YangKai0616 commented Dec 2, 2025

What does this PR do?

Uh oh!

YangKai0616 commented Dec 2, 2025

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

YangKai0616 commented Dec 3, 2025

Uh oh!

vasqu commented Dec 3, 2025

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants