Skip to content

Conversation

@ggerganov
Copy link
Member

@ggerganov ggerganov commented Nov 7, 2025

fix #17060 #17118
cont #16391

With unified caches, restoring an old prompt from the host-memory cache is not guaranteed to be successful because there might not be enough free room in the context memory to fit it. Handle this gracefully by reprocessing the prompt from scratch.

@ggerganov ggerganov marked this pull request as ready for review November 8, 2025 08:45
@ggerganov ggerganov requested a review from ngxson as a code owner November 8, 2025 08:45
@github-actions github-actions bot added the python python script changes label Nov 8, 2025
@ggerganov ggerganov merged commit cb1adf8 into master Nov 9, 2025
81 of 83 checks passed
@ggerganov ggerganov deleted the gg/server-cache-failures branch November 9, 2025 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: docker for llama server crashing with gpt-oss-20b

2 participants