Skip to content

[ASR] - CUDA OOM with Vicuna-7B & Whisper Large V3 #214

@ritaaadr

Description

@ritaaadr

decode_whisper_large_linear_vicuna_7b.txt

Hello everybody, I am attempting to perform ASR using Vicuna-7B and Whisper Large V3 on a system with two NVIDIA GPUs (24GB each) connected via NVLink. However, I consistently encounter CUDA out of memory (OOM) errors, and only one GPU is utilized during execution, despite specifying multiple GPUs.

So far, I've tried:

  • Reducing Model Size: I attempted to downscale Whisper and use lower-precision settings (whisper base).
  • Mixed Precision: Enabled mixed_precision=true, but lowering precision further raised errors.
  • FSDP & DeepSpeed: Enabled both enable_fsdp=true and enable_deepspeed=true to optimize memory usage.
  • Multi-GPU Configuration: Set CUDA_VISIBLE_DEVICES=0,1, but Vicuna and Whisper both load onto the same GPU, leaving the second GPU idle.
  • Gradient Accumulation & Batch Size Adjustments: Lowered batch size (val_batch_size=1) and accumulation steps (gradient_accumulation_steps=1) without success.

So the issue is: I constantly get CUDA OOM and only one GPU is being used, while the second remains idle. I would like to load Vicuna-7B on one GPU and Whisper Large V3 on the other to distribute memory usage effectively, if possible. Is there a way to ensure that DeepSpeed or FSDP correctly distributes the model across both GPUs?

I've attached my configuration file.

thank you in advance

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions