Skip to content

Conversation

@yonigozlan
Copy link
Member

@yonigozlan yonigozlan commented Dec 2, 2025

This PR introduces a decorator-based system to automatically update hardcoded values in integration tests, making them easier to maintain when switching to different hardware, or other changes impact a large amount of integration tests.

For context, another PR I'm working on to load fast image processors by default (#41388) is breaking a lot of integration tests, which is expected, but really annoying to manually fix. I could instead force all the tests to use slow image processor, but that would just be postponing fixing this issue, as we might fully deprecate slow image processors in the future.

The new @record_expectations decorator automatically captures actual values from tests and updates hardcoded expected values directly in the source file when running with UPDATE_EXPECTATIONS=1.

How to use in the test files

Basic usage:

@record_expectations(pairs=[("actual_logits", "expected_logits")])
def test_inference(self):
    actual_logits = model(**inputs).logits
    expected_logits = torch.tensor([[24.5701, 19.3049]])  # Auto-updated
    torch.testing.assert_close(actual_logits, expected_logits)

This also works with the recently introduced Expectations object, where values will be updated only for the current hardware configuration. Cc @ydshieh

@record_expectations(pairs=[("decoded_text", "expected_decoded_text")])
def test_generation(self):
    decoded_text = processor.decode(model.generate(**inputs)[0])
    expected_decoded_text = Expectations({
        ("cuda", (7, None)): "Output on T4...",
        ("cuda", (8, None)): "Output on A10...",
    }).get_expectation()  # Auto-updated per hardware
    self.assertEqual(decoded_text, expected_decoded_text)

We could also add flags to decide what we want to update ( ("cuda", None) by default? or specific hardware ("cuda", (8, 6) by default?). I'm not fully sure what would be best here, so I'd love to hear your thoughts

How to update hardcoded values

To update expectations, we can just pass the UPDATE_EXPECTATIONS=1 flag to the usual pytest commands:

UPDATE_EXPECTATIONS=1 RUN_SLOW=1 python -m pytest tests/models/clip/test_modeling_clip.py -k CLIPModelIntegrationTest

I only added the decorators and made other necessary modifications for a few modeling tests for now, but happy to extend this to others once I get some feedback

I think this could make our lives easier, but I'd love to have your thoughts on this!
Cc @ydshieh @ArthurZucker @Cyrilvallez @molbap @zucchini-nlp

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: clip, kosmos2_5, llava_next_video

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants