-
Notifications
You must be signed in to change notification settings - Fork 31.3k
Open
Labels
Description
System Info
Hi,
We noticed a new failure in the CI/CD of kvpress which is related to differences between SDPA and FA2.
Here is my system info:
transformersversion: 4.57.3- Platform: Linux-6.1.123+-x86_64-with-glibc2.39
- Python version: 3.12.3
- Huggingface_hub version: 0.36.0
- Safetensors version: 0.7.0
- Accelerate version: 1.12.0
- PyTorch version (accelerator?): 2.9.1+cu128 (CUDA)
- GPU type: NVIDIA H100 80GB HBM3
Who can help?
@ArthurZucker @Cyrilvallez @Rocketknight1
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
The following code produce different outputs for SDPA and FA2, while do_sample is set to False
from transformers import pipeline
model_name = "meta-llama/Llama-3.2-1B-Instruct"
prompt = "Hello, how are you?"
pipe_sdpa = pipeline("text-generation", model=model_name, device_map="auto", dtype="auto", model_kwargs={"attn_implementation":"flash_attention_2"})
pipe_fa2 = pipeline("text-generation", model=model_name, device_map="auto", dtype="auto", model_kwargs={"attn_implementation":"sdpa"})
for _ in range(3):
print(pipe_sdpa(prompt, max_new_tokens=15, do_sample=False)[0]["generated_text"])
print(pipe_fa2(prompt, max_new_tokens=15, do_sample=False)[0]["generated_text"])Hello, how are you? I'm excited to be here today to talk about a very important topic that
Hello, how are you? I'm excited to be here today to talk to you about something that I
Hello, how are you? I'm excited to be here today to talk about a very important topic that
Hello, how are you? I'm excited to be here today to talk to you about something that I
Hello, how are you? I'm excited to be here today to talk about a very important topic that
Hello, how are you? I'm excited to be here today to talk to you about something that IExpected behavior
Is this behavior expected ?