Add auto-update system for integration test hardcoded expectations #42579

yonigozlan · 2025-12-02T22:40:04Z

This PR introduces a decorator-based system to automatically update hardcoded values in integration tests, making them easier to maintain when switching to different hardware, or other changes impact a large amount of integration tests.

For context, another PR I'm working on to load fast image processors by default (#41388) is breaking a lot of integration tests, which is expected, but really annoying to manually fix. I could instead force all the tests to use slow image processor, but that would just be postponing fixing this issue, as we might fully deprecate slow image processors in the future.

The new @record_expectations decorator automatically captures actual values from tests and updates hardcoded expected values directly in the source file when running with UPDATE_EXPECTATIONS=1.

How to use in the test files

Basic usage:

@record_expectations(pairs=[("actual_logits", "expected_logits")])
def test_inference(self):
    actual_logits = model(**inputs).logits
    expected_logits = torch.tensor([[24.5701, 19.3049]])  # Auto-updated
    torch.testing.assert_close(actual_logits, expected_logits)

This also works with the recently introduced Expectations object, where values will be updated only for the current hardware configuration. Cc @ydshieh

@record_expectations(pairs=[("decoded_text", "expected_decoded_text")])
def test_generation(self):
    decoded_text = processor.decode(model.generate(**inputs)[0])
    expected_decoded_text = Expectations({
        ("cuda", (7, None)): "Output on T4...",
        ("cuda", (8, None)): "Output on A10...",
    }).get_expectation()  # Auto-updated per hardware
    self.assertEqual(decoded_text, expected_decoded_text)

We could also add flags to decide what we want to update ( ("cuda", None) by default? or specific hardware ("cuda", (8, 6) by default?). I'm not fully sure what would be best here, so I'd love to hear your thoughts

How to update hardcoded values

To update expectations, we can just pass the UPDATE_EXPECTATIONS=1 flag to the usual pytest commands:

UPDATE_EXPECTATIONS=1 RUN_SLOW=1 python -m pytest tests/models/clip/test_modeling_clip.py -k CLIPModelIntegrationTest

I only added the decorators and made other necessary modifications for a few modeling tests for now, but happy to extend this to others once I get some feedback

I think this could make our lives easier, but I'd love to have your thoughts on this!
Cc @ydshieh @ArthurZucker @Cyrilvallez @molbap @zucchini-nlp

github-actions · 2025-12-02T22:41:07Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: clip, kosmos2_5, llava_next_video

HuggingFaceDocBuilderDev · 2025-12-02T22:48:50Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

change values back

3719691

yonigozlan requested review from ArthurZucker, Cyrilvallez, molbap, ydshieh and zucchini-nlp December 2, 2025 22:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add auto-update system for integration test hardcoded expectations #42579

Add auto-update system for integration test hardcoded expectations #42579

yonigozlan commented Dec 2, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 2, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add auto-update system for integration test hardcoded expectations #42579

Are you sure you want to change the base?

Add auto-update system for integration test hardcoded expectations #42579

Conversation

yonigozlan commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use in the test files

How to update hardcoded values

Uh oh!

github-actions bot commented Dec 2, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yonigozlan commented Dec 2, 2025 •

edited

Loading