-
Notifications
You must be signed in to change notification settings - Fork 6.6k
Description
Describe the bug
Hello! I set device_map='balanced' and get images generated in 2.5 minutes (expected in 12-20 seconds), while in pipe.hf_device_map it shows that the devices are distributed like this:
{
"transformer": "cuda:0",
"text_encoder_2": "cuda:2",
"text_encoder": "cuda:0",
"vae": "cuda:1"
}
I have 3 video cards 3090 Ti 24 GB and I can’t run it on them.
I also tried this way:
pipe.transformer.to('cuda:2')
pipe.text_encoder.to('cuda:2')
pipe.text_encoder_2.to('cuda:1')
pipe.vae.to('cuda:0')
What is the best way to launch it so that generation occurs on the GPU and quickly?
Reproduction
pipe = FluxPipeline.from_pretrained(
path_chkpt,
torch_dtype=torch.bfloat16,
device_map='balanced',
)Logs
No response
System Info
ubuntu 22.04 3 GPU: 3090 TI 24 GB
accelerate==0.30.1
addict==2.4.0
apscheduler==3.9.1
autocorrect==2.5.0
chardet==4.0.0
cryptography==37.0.2
curl_cffi
diffusers==0.30.0
beautifulsoup4==4.11.2
einops
facexlib>=0.2.5
fastapi==0.92.0
hidiffusion==0.1.6
invisible-watermark>=0.2.0
numpy==1.24.3
opencv-python==4.8.0.74
pandas==2.0.3
pycocotools==2.0.6
pymystem3==0.2.0
pyyaml==6.0
pyjwt==2.6.0
python-multipart==0.0.5
pytrends==4.9.1
psycopg2-binary
realesrgan==0.3.0
redis==4.5.1
sacremoses==0.0.53
selenium==4.2.0
sentencepiece==0.1.97
scipy==1.10.1
scikit-learn==0.24.1
supervision==0.16.0
tb-nightly==2.14.0a20230629
tensorboard>=2.13.0
tomesd
transformers==4.40.1
timm==0.9.16
yapf==0.32.0
uvicorn==0.20.0
spacy==3.7.2
nest_asyncio==1.5.8
httpx==0.25.0
torchvision==0.15.2
insightface==0.7.3
psutil==5.9.6
tk==0.1.0
customtkinter==5.2.1
tensorflow==2.13.0
opennsfw2==0.10.2
protobuf==4.24.4
gfpgan==1.3.8
Who can help?
No response