Skip to content
This repository was archived by the owner on Jun 4, 2025. It is now read-only.

NM Transformers v0.10

Choose a tag to compare

@markurtz markurtz released this 24 Jan 19:37
c7b33f0
Fix incorrect steps calculation when gradient acc. (#31)

When gradient accumulation is used, the effective batch size is `gradent_accumulation_steps` times larger.