[FT] Prompt Level Caching

## Issue Encountered

Currently, the caching mechanism stores the entire evaluation split but does not consider sampling parameters (e.g., temperature). These parameters significantly influence model behavior, meaning that cached responses may not align with new configurations if the sampling settings change.

In particular, when modifying LiteLLM sampling parameters, cache hits can occur that produce inconsistent or incoherent results relative to the new settings.

## Proposed Solution / Feature

Instead of the current caching approach, a **request-based caching mechanism** could be implemented using a library like [`diskcache`](https://grantjenks.com/docs/diskcache). This method would cache based on the full request payload, ensuring that changes to sampling parameters (or any other request field) generate distinct cache keys.

Below is an illustrative example using an OpenAI-style request:

```python
import hashlib, json, requests, diskcache

# Prepare request parameters
request_params = {
    "url": f"{self.config.base_url}/chat/completions",
    "headers": {
        "Authorization": f"Bearer {self.config.api_key}",
        "Content-Type": "application/json",
    },
    "json": {
        "model": self.config.model_name,
        "messages": [{"role": "user", "content": doc.query}],
        "n": doc.num_samples,
        "max_tokens": self.config.max_tokens,
        "temperature": self.config.temperature,
        "top_p": self.config.top_p,
        "min_p": self.config.min_p,
        "seed": self.config.seed,
        **self.config.extra_body,
    },
    "timeout": self.config.timeout,
}

# Cache lookup and update
with diskcache.Cache(self.config.cache_dir or "/tmp/vllm_cache") as cache:
    key = hashlib.sha256(json.dumps(request_params, sort_keys=True).encode()).hexdigest()
    if key not in cache:
        cache[key] = {
            "response": requests.post(**request_params).json(),
            "request": request_params,
        }
    response = cache[key]["response"]
```

This approach allows multiple concurrent processes or threads to safely access the same cache, enabling consistent reuse across evaluations on a shared filesystem.

Adopting diskcache would also reduce maintenance complexity by replacing the custom caching logic with a robust, well-tested framework.

If this proposal is accepted, I’d be happy to open a PR implementing it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FT] Prompt Level Caching #1053

Issue Encountered

Proposed Solution / Feature

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FT] Prompt Level Caching #1053

Description

Issue Encountered

Proposed Solution / Feature

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions