-
Notifications
You must be signed in to change notification settings - Fork 97
Open
Description
decode_whisper_large_linear_vicuna_7b.txt
Hello everybody, I am attempting to perform ASR using Vicuna-7B and Whisper Large V3 on a system with two NVIDIA GPUs (24GB each) connected via NVLink. However, I consistently encounter CUDA out of memory (OOM) errors, and only one GPU is utilized during execution, despite specifying multiple GPUs.
So far, I've tried:
- Reducing Model Size: I attempted to downscale Whisper and use lower-precision settings (whisper base).
- Mixed Precision: Enabled
mixed_precision=true, but lowering precision further raised errors. - FSDP & DeepSpeed: Enabled both
enable_fsdp=trueandenable_deepspeed=trueto optimize memory usage. - Multi-GPU Configuration: Set
CUDA_VISIBLE_DEVICES=0,1, but Vicuna and Whisper both load onto the same GPU, leaving the second GPU idle. - Gradient Accumulation & Batch Size Adjustments: Lowered batch size (
val_batch_size=1) and accumulation steps (gradient_accumulation_steps=1) without success.
So the issue is: I constantly get CUDA OOM and only one GPU is being used, while the second remains idle. I would like to load Vicuna-7B on one GPU and Whisper Large V3 on the other to distribute memory usage effectively, if possible. Is there a way to ensure that DeepSpeed or FSDP correctly distributes the model across both GPUs?
I've attached my configuration file.
thank you in advance
Metadata
Metadata
Assignees
Labels
No labels