Make gradient-checkpoint enabling tolerant of models without get_input_embeddings #42558

molbap · 2025-12-02T14:17:07Z

What does this PR do?

Ad title indicates, #42542 and likely a few other models are broken by merged #41993 . This adds an embedding getter and attempts to test the feature with more coverage.

Basically what it does

Stop hard-failing gradient_checkpointing_enable when a model lacks a get_input_embeddings. We now just call enable_input_require_grads, let it attach hooks where it can, and issue a single warning if no embedding module is found.
Simplify enable_input_require_grads (and the InternVL/MLCD and a couple more model overrides/adjustments) by making them responsible for the warning.
Adds a big test to make sure all of that works (please take a look)

Should help GC for PEFT adapters for many VLMs hopefully (and normal models too)

HuggingFaceDocBuilderDev · 2025-12-02T14:26:43Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

molbap · 2025-12-02T18:21:31Z

added a test as well - but can't find a clean way around the models for which it is not relevant to have a getter method and not causing as many side-effects. WDYT @zucchini-nlp ? kind of stumped (try/excepting at higher level would always work but hides a lot)

zucchini-nlp

models for which it is not relevant to have a getter method and not causing as many side-effects

Would this also mean that we can't support correctly PEFT and GC with these models, or do they have a custom way to set grad on the inputs? We could raise an error with a better message saying that models doesn't support unless it has a way to get its input embeddings, wdyt?

zucchini-nlp · 2025-12-03T09:29:05Z

src/transformers/modeling_utils.py

+        base_model = getattr(self, "base_model_prefix", None)
+        if base_model is not None:
+            base_model = getattr(self, base_model, None)


nit: self.base_model property has the same functionality

zucchini-nlp · 2025-12-03T09:43:48Z

src/transformers/modeling_utils.py

    _input_embed_layer = "embed_tokens"  # default layer that holds input embeddings.

-    def get_input_embeddings(self) -> nn.Module:
+    def _get_input_embeddings_no_raise(self) -> Optional[nn.Module]:


oh interesting, I was assuming the base get_input_embedding already returns None

well I ended up in some many little edge cases lol

molbap · 2025-12-03T10:58:58Z

Yes it's a good idea to raise/inform for downstream users. I reverted a couple things and will update the test so it actually checks that enabling GC works (probably add another test)

molbap · 2025-12-04T15:11:37Z

src/transformers/utils/import_utils.py

This is mostly to fix a broken env situation that can be caused around timm_wrapper (or timm_backbone?) so it protects a few imports

molbap · 2025-12-04T15:26:26Z

I reverted a few models to inner positional embeddings calls as mentioned in #38913 .

Modified a few others models as the test I added (test_enable_input_require_grads_with_gradient_checkpointing ) was a bit naive and I was just continue-ing, now it's a proper skip if the loss is undefined.

Hopefully that helps VLMs + GC and does not break adapters

molbap · 2025-12-04T16:29:34Z

src/transformers/modeling_utils.py

+            try:
+                input_embeddings = module.get_input_embeddings()
+            except NotImplementedError:
+                continue


no simple way around this unfortunately

molbap · 2025-12-04T16:29:48Z

src/transformers/modeling_utils.py

+        if not found_embeddings:
+            logger.warning_once(
+                f"{self.__class__.__name__} does not expose input embeddings. Gradients cannot flow back to the token "
+                "embeddings when using adapters or gradient checkpointing. Override `get_input_embeddings` to fully "
+                "support those features."


at least we can warn users!

github-actions · 2025-12-04T16:29:52Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: align, altclip, clvp, falcon_mamba, fast_vlm, internvl, layoutlm, layoutlmv3, lilt, mamba, mlcd, poolformer, siglip, siglip2, splinter, switch_transformers

add embedding getter

d5909a7

molbap added 7 commits December 2, 2025 17:46

modify your own logic

a9eb634

a common test

b520cc7

some adapters are not PreTrainedModel s

7ce45fe

few fixes

d41e204

implement correct-ish fix?

0e93a61

fixup

de8ff71

this is needed likely

b2618b3

woops

5d61150

zucchini-nlp reviewed Dec 3, 2025

View reviewed changes

molbap added 5 commits December 3, 2025 11:32

solving some cross-imports issues here and there

ef55499

more ximports issues

44ab4c6

finally

fe89c1c

revert changes

2920d00

fixups

b8ccd0f

molbap added 6 commits December 3, 2025 14:47

improve message

b5ae5a6

add common tests for input_ids first

d209ff5

increase test coverage

79665d4

Merge branch 'main' into fix_enable_grads_again

844c707

bigger update for GC

fcc84a4

copies

e970fad

molbap commented Dec 4, 2025

View reviewed changes

molbap added 2 commits December 4, 2025 16:24

mlcd is getting on my nerves a bit

b4f5c15

ah yes

0246a70

for BC

81940dd

molbap commented Dec 4, 2025

View reviewed changes

molbap changed the title ~~Add embedding getter + test~~ Make gradient-checkpoint enabling tolerant of models without get_input_embeddings Dec 4, 2025

Make gradient-checkpoint enabling tolerant of models without get_input_embeddings #42558

Are you sure you want to change the base?

Make gradient-checkpoint enabling tolerant of models without get_input_embeddings #42558

Conversation

molbap commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Dec 2, 2025

Uh oh!

molbap commented Dec 2, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

molbap Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

molbap Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

molbap commented Dec 3, 2025

Uh oh!

molbap Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

molbap commented Dec 4, 2025

Uh oh!

molbap Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

molbap Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

molbap commented Dec 2, 2025 •

edited

Loading