Skip to content

Conversation

@f14-bertolotti
Copy link
Contributor

Overview

This PR migrates the custom SampleCache system to the more robust and well-maintained diskcache library.
It follows up on issue #1053.

What’s Changed

The legacy SampleCache class and its associated cached decorator have been fully removed and replaced with a new cached decorator built on top of diskcache.

The new decorator:

  • Uses model.config.cache_dir to instantiate a shared cache via diskcache.Cache(model.config.cache_dir).
  • Detects which documents are already cached, and calls the underlying sampling method only on the subset of uncached documents.
  • Stores all newly returned samples back into the cache after each sampling method finishes.
  • Supports both synchronous and asynchronous sampling methods, enabling seamless use with both async and sync vLLM.

Most of the logic lives in src/lighteval/utils/cache_management.py, where the decorator was rewritten from scratch. All remaining changes simply remove references to the previous caching implementation.

Currently, cache writes occur only after a full sampling method completes (which can take hours for certain tasks). In the future, we can decorate inner per-request methods to enable more granular incremental caching.

Additional Improvements

  • Added a progress bar for async vLLM to improve user feedback during long-running evaluations.

Testing

The implementation has been tested with:

uv run lighteval vllm examples/model_configs/vllm_model_config.yaml aime25   # async + sync vLLM
uv run lighteval vllm examples/model_configs/transformers_model.yaml aime25
uv run pytest

Co-authored-by: Francesco Bertolotti <[email protected]>
@f14-bertolotti f14-bertolotti changed the title graceful shutdown of vllm async diskcache for caching Nov 19, 2025
@NathanHB
Copy link
Member

oh great !! will test it today and come back to you for review :)
cc @clefourrier

@HuggingFaceDocBuilderDev
Copy link
Collaborator

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@clefourrier clefourrier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking nice overall, a couple nits - could you also try testing it with the transformers and inference provider backends?

Comment on lines 623 to 646
with rich.progress.Progress(
"[progress.description]{task.description}",
rich.progress.BarColumn(),
"[progress.completed]{task.completed}/{task.total}",
"•",
rich.progress.TimeElapsedColumn(),
"•",
rich.progress.TimeRemainingColumn(),
) as pbar:
task_id = pbar.add_task("[green]Sending Requests...", total=len(docs))

async def track(coro):
"""Wraps a coroutine to update progress bar when done."""
result = await coro
pbar.update(task_id, advance=1)
return result

wrapped = [
track(self._async_one_item(index=index, doc=doc, generative=generative))
for index, doc in enumerate(docs)
]

result = await asyncio.gather(*wrapped)
return result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to current PR, would make sense to have it in its standalone PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make a separate PR for it

# Save updated dataset
dataset = Dataset.from_list(all_samples)
dataset.to_parquet(str(cache_file))
def default_json_encoder(obj):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a json encoder in the utils iirc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am having trouble finding it. Could you point it out for me?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnhancedJSONEncoder in src/lighteval/logging/evaluation_tracker.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NathanHB wdyt of moving it to the utils? it's not logging specific

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That encoder should work perfectly. I need only to add .model_dump() for pydantic objects

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect, ty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick note about the encoder: I noticed that the method is defined as an instance method, which means it expects a self parameter. I can work around this by forcing self to None:

functools.partial(EnhancedJSONEncoder.default, None)

However, it might be cleaner to define it as a @staticmethod within EnhancedJSONEncoder instead.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this - it's supposed to be used with the cls arg in json.dumps, so instead of doing default=your_func you do cls=the_class

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm misunderstanding your issue, so really feel free to extend a bit on what you need if relevant :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mb, I didn't even know that you could pass to json.dumps a class.

Comment on lines 76 to 77
if isinstance(docs, Doc):
docs = [docs]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could keep as_list

doc, sampling_method=sampling_method, config=self.config, args=args, kwargs=kwargs
)
if key in cache:
logger.info("Cache hit")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're going to get too many logs with this, especially on datasets like mmlu with 10K+ samples - maybe just store how many times the cache was hit and log it at the end?
Cache was hit for x documents out of y.

logger.info("Cache hit")
results[idx] = cache[key]["response"]
else:
logger.info("Cache miss")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem


return wrapper
def decorator(sampler: Callable): # noqa: C901
@functools.wraps(sampler)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sampler is a bit misleading here - maybe model_call would be clearer?

model_outputs = await self.model.loglikelihood(docs)
outputs[sampling_method] = model_outputs

self.model.cleanup()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch

Copy link
Member

@NathanHB NathanHB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just gave it a go locally, it's great ! Thanks for the contrib.
Will merge once nits above are addressed and tests are fixed :)

@f14-bertolotti
Copy link
Contributor Author

Ok, I think I have addressed all the issues. I am opening another PR for the progress bar.

@f14-bertolotti
Copy link
Contributor Author

I have added a last minute commit. I forgot to remove two lines relative to this comment #1068 (comment)

Comment on lines 105 to 107
if isinstance(docs, Doc):
docs = [docs]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should use as_list not remove entirely ^^"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mb, I misunderstood your comment. the new commit uses as_list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants