[TabArena] OOM Prevention during Bagging

Need to implement logic that avoids the following scenario:

Bagging a model which trains iteratively and increases in memory usage each iteration.

If all fold models train to full iterations and then save at the same time, they consume more total memory than estimated.

Because each fold model checks if it should early stop based on its own memory usage compared to the system available memory, it may use more peak memory than it should individually.

Because LightGBM and XGBoost peak in memory usage during save / train finish, the following occurs:

1. All folds train until ~10k estimators (Total Mem 24GB, per-child: 2GB, remaining: 8 GB)
2. The first to finish saves, spikes to 5 GB used: remaining = 5 GB
3. The second to finish saves, spikes to 5 GB uses: remaining = 2 GB
4. Early stopping due to low memory triggers for remaining 6 models, so they all spike to 5 GB by saving at the same time: remaining = -16 GB -> OOM

Logic to avoid:

1. Max mem per child = 2.4 GB (give 20% overhead) -> pass as `fit` argument -> `mem_limit`.
2. Know that peak mem = 2.5x model size
3. Once child reaches 1 GB size at 5000 estimators, early stopping triggers -> spike to 2.5GB
4. 2.5 GB x 8 = 20 GB, still safe, doesn't go OOM, 4 GB on machine remaining to spare.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TabArena] OOM Prevention during Bagging #135

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[TabArena] OOM Prevention during Bagging #135

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions