Skip to content

Conversation

@rasmith
Copy link
Contributor

@rasmith rasmith commented Aug 26, 2025

The tensor loaded into bn is multiplied by stride_k_cache_bs in the _fwd_kernel in prefix_prefill.py and produces an integer overflow resulting in negative offsets which result in a GPU segfault. Changing stride_k_cache_bs to be tl.int64 in the function signature did not work. Casting the bn tensor to tl.int64 fixes the problem. I added some additional casts into _fwd_kernel_flash_attn_v2 and _fwd_kernel_alibi as well.

@mergify mergify bot added the rocm Related to AMD ROCm label Aug 26, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a critical integer overflow in the _fwd_kernel for AMD GPUs by casting the bn tensor to tl.int64, which prevents a potential GPU segfault. However, as noted in the pull request description, similar vulnerabilities exist in _fwd_kernel_flash_attn_v2 and _fwd_kernel_alibi. The fixes for these functions are currently missing from the patch. It is crucial to include these changes to ensure the bug is fully resolved across all relevant kernels.

@gshtras gshtras added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 2, 2025
@gshtras gshtras enabled auto-merge (squash) September 2, 2025 16:51
@gshtras gshtras merged commit 457e471 into vllm-project:main Sep 2, 2025
39 checks passed
845473182 pushed a commit to 845473182/vllm that referenced this pull request Sep 3, 2025
* 'main' of https://github.com/845473182/vllm: (457 commits)
  [BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models (vllm-project#24132)
  [Misc] Add check for dual_chunk_attention (vllm-project#24070)
  [Doc]: fix typos in Python comments (vllm-project#24115)
  [Doc]: fix typos in Python comments (vllm-project#24093)
  [Compile] Fix Compile Warning for `w4a8_mm_entry.cu` (vllm-project#23660)
  fix some typos (vllm-project#24071)
  [V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing (vllm-project#23656)
  Upgrade xgrammar to 0.1.23 (vllm-project#22988)
  Update release pipeline post PyTorch 2.8.0 update (vllm-project#24073)
  [XPU] Fix the bug of LoRA logits on the XPU platform (vllm-project#24081)
  [CI/Build] Disable SiluMul NVFP4 quant fusion tests (vllm-project#24121)
  [Bug] R1 Accuracy: Fix `routed_scaling_factor` Double Mul Issue (vllm-project#24119)
  [AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault (vllm-project#23692)
  [CI] Enable all hf transformers baselines in test_hybrid (vllm-project#23936)
  [Log] Only Print Profiler Results on Rank 0 (vllm-project#23370)
  Fix weights loading for Apertus (vllm-project#24100)
  [Metrics] Deprecate TPOT in favor of ITL (vllm-project#24110)
  [Bugfix] Fix packed_factor missing attribute error (vllm-project#23902)
  Run ruff format on a few files. (vllm-project#24075)
  [Bugfix] Fix transform_config parsing in Compressed Tensors (vllm-project#23945)
  ...
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants