-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
What happened?
When using xai/grok-4-1-fast-non-reasoning with streaming and include_usage=true, the usage is not in the last chunk.
Instead:
- The second-to-last chunk has valid
usage - The last chunk has empty
choices
Repro
from litellm import completion
result = completion(
model="xai/grok-4-1-fast-non-reasoning",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
stream=True,
stream_options={"include_usage": True}
)
for chunk in result:
print(chunk)Output
...
ModelResponseStream(id='923d4779-8674-4a1e-a509-b9ecc58263a9', created=1764152777, model='grok-4-1-fast-non-reasoning', object='chat.completion.chunk', system_fingerprint=None, choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(provider_specific_fields=None, content='😊', role=None, function_call=None, tool_calls=None, audio=None), logprobs=None)], provider_specific_fields=None, citations=None)
ModelResponseStream(id='923d4779-8674-4a1e-a509-b9ecc58263a9', created=1764152777, model='grok-4-1-fast-non-reasoning', object='chat.completion.chunk', system_fingerprint=None, choices=[StreamingChoices(finish_reason='stop', index=0, delta=Delta(provider_specific_fields=None, content=None, role=None, function_call=None, tool_calls=None, audio=None), logprobs=None)], provider_specific_fields=None)
ModelResponseStream(id='923d4779-8674-4a1e-a509-b9ecc58263a9', created=1764152777, model='grok-4-1-fast-non-reasoning', object='chat.completion.chunk', system_fingerprint=None, choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(provider_specific_fields=None, content=None, role=None,
function_call=None, tool_calls=None, audio=None), logprobs=None)], provider_specific_fields=None, usage=Usage(completion_tokens=11, prompt_tokens=19, total_tokens=30, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=None))
ModelResponseStream(id='923d4779-8674-4a1e-a509-b9ecc58263a9', created=1764152777, model='grok-4-1-fast-non-reasoning', object='chat.completion.chunk', system_fingerprint=None, choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(provider_specific_fields=None, content=None, role=None, function_call=None, tool_calls=None, audio=None), logprobs=None)], provider_specific_fields=None)
Expected
usage should be on the last chunk
Possible Cause
I tested the Grok API directly with curl:
curl https://api.x.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $XAI_API_KEY" \
-m 3600 \
-d '{
"messages": [
{
"role": "user",
"content": "Just say hi"
}
],
"model": "grok-4-1-fast-non-reasoning",
"stream": true,
"stream_options": { "include_usage": true }
}'Output:
...
data: {"id":"263b380c-e12f-d0a8-f4ce-5ede255a5cd0","object":"chat.completion.chunk","created":1764155221,"model":"grok-4-1-fast-non-reasoning","choices":[{"index":0,"delta":{"content":"!"}}],"system_fingerprint":"fp_ed3b9934bf"}
data: {"id":"263b380c-e12f-d0a8-f4ce-5ede255a5cd0","object":"chat.completion.chunk","created":1764155221,"model":"grok-4-1-fast-non-reasoning","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"system_fingerprint":"fp_ed3b9934bf"}
data: {"id":"263b380c-e12f-d0a8-f4ce-5ede255a5cd0","object":"chat.completion.chunk","created":1764155221,"model":"grok-4-1-fast-non-reasoning","choices":[],"usage":{"prompt_tokens":171,"completion_tokens":2,"total_tokens":173,"prompt_tokens_details":{"text_tokens":171,"audio_tokens":0,"image_tokens":0,"cached_tokens":161},"completion_tokens_details":{"reasoning_tokens":0,"audio_tokens":0,"accepted_prediction_tokens":0,"rejected_prediction_tokens":0},"num_sources_used":0},"system_fingerprint":"fp_ed3b9934bf"}
data: [DONE]
From the raw stream output:
The last chunk contains usage but has blank content. This may cause LiteLLM to parse the final chunk incorrectly.
What LiteLLM version are you on ?
v1.80.5
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working