[ASR] -  CUDA OOM with Vicuna-7B & Whisper Large V3

[decode_whisper_large_linear_vicuna_7b.txt](https://github.com/user-attachments/files/19158669/decode_whisper_large_linear_vicuna_7b.txt)

Hello everybody, I am attempting to perform ASR using Vicuna-7B and Whisper Large V3 on a system with two NVIDIA GPUs (24GB each) connected via NVLink. However, I consistently encounter CUDA out of memory (OOM) errors, and only one GPU is utilized during execution, despite specifying multiple GPUs.

So far, I've tried:
-  **Reducing Model Size**: I attempted to downscale Whisper and use lower-precision settings (whisper base).
- **Mixed Precision**: Enabled `mixed_precision=true`, but lowering precision further raised errors.
- **FSDP & DeepSpeed**: Enabled both `enable_fsdp=true` and ` enable_deepspeed=true` to optimize memory usage.
- **Multi-GPU Configuration**: Set `CUDA_VISIBLE_DEVICES=0,1`, but Vicuna and Whisper both load onto the same GPU, leaving the second GPU idle.
- **Gradient Accumulation & Batch Size Adjustments**: Lowered batch size (`val_batch_size=1`) and accumulation steps (`gradient_accumulation_steps=1`) without success.

So the issue is: I constantly get CUDA OOM and only one GPU is being used, while the second remains idle. I would like to load Vicuna-7B on one GPU and Whisper Large V3 on the other to distribute memory usage effectively, if possible. Is there a way to ensure that DeepSpeed or FSDP correctly distributes the model across both GPUs?

I've attached my configuration file.

thank you in advance


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ASR] - CUDA OOM with Vicuna-7B & Whisper Large V3 #214

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ASR] - CUDA OOM with Vicuna-7B & Whisper Large V3 #214

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions