-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
[AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault #23692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault #23692
Conversation
Signed-off-by: Randall Smith <[email protected]>
…libi Signed-off-by: Randall Smith <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly addresses a critical integer overflow in the _fwd_kernel for AMD GPUs by casting the bn tensor to tl.int64, which prevents a potential GPU segfault. However, as noted in the pull request description, similar vulnerabilities exist in _fwd_kernel_flash_attn_v2 and _fwd_kernel_alibi. The fixes for these functions are currently missing from the patch. It is crucial to include these changes to ensure the bug is fully resolved across all relevant kernels.
* 'main' of https://github.com/845473182/vllm: (457 commits) [BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models (vllm-project#24132) [Misc] Add check for dual_chunk_attention (vllm-project#24070) [Doc]: fix typos in Python comments (vllm-project#24115) [Doc]: fix typos in Python comments (vllm-project#24093) [Compile] Fix Compile Warning for `w4a8_mm_entry.cu` (vllm-project#23660) fix some typos (vllm-project#24071) [V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing (vllm-project#23656) Upgrade xgrammar to 0.1.23 (vllm-project#22988) Update release pipeline post PyTorch 2.8.0 update (vllm-project#24073) [XPU] Fix the bug of LoRA logits on the XPU platform (vllm-project#24081) [CI/Build] Disable SiluMul NVFP4 quant fusion tests (vllm-project#24121) [Bug] R1 Accuracy: Fix `routed_scaling_factor` Double Mul Issue (vllm-project#24119) [AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault (vllm-project#23692) [CI] Enable all hf transformers baselines in test_hybrid (vllm-project#23936) [Log] Only Print Profiler Results on Rank 0 (vllm-project#23370) Fix weights loading for Apertus (vllm-project#24100) [Metrics] Deprecate TPOT in favor of ITL (vllm-project#24110) [Bugfix] Fix packed_factor missing attribute error (vllm-project#23902) Run ruff format on a few files. (vllm-project#24075) [Bugfix] Fix transform_config parsing in Compressed Tensors (vllm-project#23945) ...
… segfault (vllm-project#23692) Signed-off-by: Randall Smith <[email protected]>
… segfault (vllm-project#23692) Signed-off-by: Randall Smith <[email protected]>
The tensor loaded into
bnis multiplied bystride_k_cache_bsin the_fwd_kernelinprefix_prefill.pyand produces an integer overflow resulting in negative offsets which result in a GPU segfault. Changingstride_k_cache_bsto betl.int64in the function signature did not work. Casting thebntensor totl.int64fixes the problem. I added some additional casts into_fwd_kernel_flash_attn_v2and_fwd_kernel_alibias well.