-
Notifications
You must be signed in to change notification settings - Fork 132
Open
Labels
questionFurther information is requestedFurther information is requested
Description
Hi,
Thank you for your excellent work on this project!
I'm currently trying to train on the blendedMVS dataset using 8x H00 GPUs (each with 80GB memory). I ran the training script via bash bash_scripts/train/examples/mapa_curri_4v_bmvs_48ipg_8g.sh 8, but I'm encountering an Out-of-Memory (OOM) error despite several optimizations.
To mitigate the issue, I've already:
- Reduced the number of views to 2.
- Add accum_iter to 12
- Enabled gradient checkpointing by setting
model.info_sharing.module_args.gradient_checkpointing=trueandmodel.pred_head.gradient_checkpointing=true.
However, the OOM error persists. What confuses me is that the model only has around 500M parameters—why does it require such a large amount of memory?
Could you please provide some insights or suggestions on how to resolve this? Thank you in advance for your help!
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested