Skip to content

SDPA and FA2 produce different outputs #42550

@SimJeg

Description

@SimJeg

System Info

Hi,

We noticed a new failure in the CI/CD of kvpress which is related to differences between SDPA and FA2.

Here is my system info:

  • transformers version: 4.57.3
  • Platform: Linux-6.1.123+-x86_64-with-glibc2.39
  • Python version: 3.12.3
  • Huggingface_hub version: 0.36.0
  • Safetensors version: 0.7.0
  • Accelerate version: 1.12.0
  • PyTorch version (accelerator?): 2.9.1+cu128 (CUDA)
  • GPU type: NVIDIA H100 80GB HBM3

Who can help?

@ArthurZucker @Cyrilvallez @Rocketknight1

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

The following code produce different outputs for SDPA and FA2, while do_sample is set to False

from transformers import pipeline

model_name = "meta-llama/Llama-3.2-1B-Instruct"
prompt = "Hello, how are you?"

pipe_sdpa = pipeline("text-generation", model=model_name, device_map="auto", dtype="auto", model_kwargs={"attn_implementation":"flash_attention_2"})
pipe_fa2 = pipeline("text-generation", model=model_name, device_map="auto", dtype="auto", model_kwargs={"attn_implementation":"sdpa"})  

for _ in range(3):
    print(pipe_sdpa(prompt, max_new_tokens=15, do_sample=False)[0]["generated_text"])
    print(pipe_fa2(prompt, max_new_tokens=15, do_sample=False)[0]["generated_text"])
Hello, how are you? I'm excited to be here today to talk about a very important topic that
Hello, how are you? I'm excited to be here today to talk to you about something that I
Hello, how are you? I'm excited to be here today to talk about a very important topic that
Hello, how are you? I'm excited to be here today to talk to you about something that I
Hello, how are you? I'm excited to be here today to talk about a very important topic that
Hello, how are you? I'm excited to be here today to talk to you about something that I

Expected behavior

Is this behavior expected ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions