Skip to content

[Bug]: Grok streaming returns usage in the wrong chunk (extra empty final chunk) #17136

@ihower

Description

@ihower

What happened?

When using xai/grok-4-1-fast-non-reasoning with streaming and include_usage=true, the usage is not in the last chunk.

Instead:

  • The second-to-last chunk has valid usage
  • The last chunk has empty choices

Repro

from litellm import completion

result = completion(
  model="xai/grok-4-1-fast-non-reasoning",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ],
  stream=True,
  stream_options={"include_usage": True}
)

for chunk in result:
    print(chunk)

Output

...
ModelResponseStream(id='923d4779-8674-4a1e-a509-b9ecc58263a9', created=1764152777, model='grok-4-1-fast-non-reasoning', object='chat.completion.chunk', system_fingerprint=None, choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(provider_specific_fields=None, content='😊', role=None, function_call=None, tool_calls=None, audio=None), logprobs=None)], provider_specific_fields=None, citations=None)

ModelResponseStream(id='923d4779-8674-4a1e-a509-b9ecc58263a9', created=1764152777, model='grok-4-1-fast-non-reasoning', object='chat.completion.chunk', system_fingerprint=None, choices=[StreamingChoices(finish_reason='stop', index=0, delta=Delta(provider_specific_fields=None, content=None, role=None, function_call=None, tool_calls=None, audio=None), logprobs=None)], provider_specific_fields=None)

ModelResponseStream(id='923d4779-8674-4a1e-a509-b9ecc58263a9', created=1764152777, model='grok-4-1-fast-non-reasoning', object='chat.completion.chunk', system_fingerprint=None, choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(provider_specific_fields=None, content=None, role=None, 
function_call=None, tool_calls=None, audio=None), logprobs=None)], provider_specific_fields=None, usage=Usage(completion_tokens=11, prompt_tokens=19, total_tokens=30, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=None))

ModelResponseStream(id='923d4779-8674-4a1e-a509-b9ecc58263a9', created=1764152777, model='grok-4-1-fast-non-reasoning', object='chat.completion.chunk', system_fingerprint=None, choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(provider_specific_fields=None, content=None, role=None, function_call=None, tool_calls=None, audio=None), logprobs=None)], provider_specific_fields=None)

Expected

usage should be on the last chunk

Possible Cause

I tested the Grok API directly with curl:

curl https://api.x.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $XAI_API_KEY" \
-m 3600 \
-d '{
    "messages": [
        {
            "role": "user",
            "content": "Just say hi"
        }
    ],
    "model": "grok-4-1-fast-non-reasoning",
    "stream": true,
    "stream_options": { "include_usage": true }
}'

Output:

...
data: {"id":"263b380c-e12f-d0a8-f4ce-5ede255a5cd0","object":"chat.completion.chunk","created":1764155221,"model":"grok-4-1-fast-non-reasoning","choices":[{"index":0,"delta":{"content":"!"}}],"system_fingerprint":"fp_ed3b9934bf"}

data: {"id":"263b380c-e12f-d0a8-f4ce-5ede255a5cd0","object":"chat.completion.chunk","created":1764155221,"model":"grok-4-1-fast-non-reasoning","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"system_fingerprint":"fp_ed3b9934bf"}

data: {"id":"263b380c-e12f-d0a8-f4ce-5ede255a5cd0","object":"chat.completion.chunk","created":1764155221,"model":"grok-4-1-fast-non-reasoning","choices":[],"usage":{"prompt_tokens":171,"completion_tokens":2,"total_tokens":173,"prompt_tokens_details":{"text_tokens":171,"audio_tokens":0,"image_tokens":0,"cached_tokens":161},"completion_tokens_details":{"reasoning_tokens":0,"audio_tokens":0,"accepted_prediction_tokens":0,"rejected_prediction_tokens":0},"num_sources_used":0},"system_fingerprint":"fp_ed3b9934bf"}

data: [DONE]

From the raw stream output:

The last chunk contains usage but has blank content. This may cause LiteLLM to parse the final chunk incorrectly.

What LiteLLM version are you on ?

v1.80.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions