Fix: usage from earlier stream chunks when later chunks have none #2126
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Resolved: #2122
When using
xai/grok-4-1-fast-reasoning, the LiteLLM streaming output includesusagein a non-final chunk, instead of the last one:However, the current SDK logic overwrites
usagewithNoneif later chunks do not include it. This causes valid usage information to be lost in the final response.Repro
Output:
As shown above, all usage values are reported as 0. This happens because the final streaming chunk does not include usage data, which causes valid usage from earlier chunks to be overwritten.
Root cause
Here is an example of streaming chunks from LiteLLM output (
xai/grok-4-1-fast-reasoning):As shown above,
usageappears in the second-to-last chunk, not the final chunkSolution
This PR updates the stream handler to:
usagewhen the current chunk actually includes usage datausageinstead of overwriting it withNoneThis ensures correct token accounting even when providers (e.g. LiteLLM) do not attach
usage to the final chunk.
Note
This behavior is likely a LiteLLM issue. I have reported it here: BerriAI/litellm#17136
That said, adding this defensive handling in the SDK is harmless, simple and allows us to
gracefully handle this case immediately without waiting for an upstream fix.