Skip to content

Conversation

@ILikeIneine
Copy link
Contributor

@ILikeIneine ILikeIneine commented Oct 22, 2025

Purpose

There's some problem while supporting fla on plugin.
While importing the fla/ops/utils, it crashed on here.

Since in plugin the device might got their own value (here in vllm-metax is maca) and device_torch_lib still need to be their own. (here in vllm-metax is torch.cuda)

So I use is_cuda_alike and set default value to None on getattr to make some compatibilities for handling the corner cases. The semantics should be consistent with the original code.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@ILikeIneine ILikeIneine changed the title fix fla crash on plugin [Bugfix][plugin] fla crash on plugin Oct 22, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a crash in Flash-Linear-Attention (FLA) operations when used with plugins. The changes in vllm/model_executor/layers/fla/ops/utils.py are well-reasoned and effective. By leveraging current_platform.is_cuda_alike(), the code now correctly identifies CUDA-compatible platforms (including plugins) and sets the device library appropriately. Adding a None default to getattr is a good defensive measure that prevents crashes on other platforms like CPU, making the utility more robust. The fix is correct and improves the overall stability of FLA operations in diverse environments.

Copy link
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks fine but I don't have the context on fla.
Perhaps @youkaichao can take a quick look at it.

@ILikeIneine ILikeIneine force-pushed the fix-fla-crash-on-plugin branch from 873d0a9 to b483cc9 Compare October 24, 2025 02:23
@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 1, 2025
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks simple enough to me. I believe the logic is kept the same for Nvidia and AMD, so nothing else changes for Intel or CPU

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants