OpenAI Compatible stream api send one more empty chunk after finish chunk. #4955

joneepenk · 2025-12-03T11:51:07Z

joneepenk
Dec 3, 2025

calling Qwen API of Alibabacloud proxied by t0: https://dashscope.aliyuncs.com/compatible-mode/v1
calling as following code:

    completion = client.chat.completions.create(
        # model="tensorzero::model_name::qwen3-max",
        model="qwen3-max",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "your name?"},
        ],
        stream=True,
    )

After Qwen API returned a finish chunk and t0 forwarded it, t0 send an extra empty chunk to client.

finish chunk returned by Qwen API

{"id":"chatcmpl-f95c8cd3-fb95-445e-9440-3d48640609d7","choices":[{"delta":{"content":"","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":"stop","index":0,"logprobs":null}],"created":1764762469,"model":"qwen3-max","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}

extra empty chunk returned by t0

{"id":"019ae40c-ea36-70b0-aa98-c9f80ff54c03","choices":[{"delta":{"content":null,"function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1764762644,"model":"tensorzero::model_name::qwen3-max","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":"","usage":null,"episode_id":"019ae40c-ea37-7f12-a95c-0ab9805f67da"}

GabrielBianconi · 2025-12-03T23:18:46Z

GabrielBianconi
Dec 3, 2025
Maintainer

Hi @joneepenk - could you please share a full response from Qwen API and the corresponding stream from TensorZero, so we can more easily debug what's causing this? I don't have the Qwen API set up on my end so I can't reproduce it.

I believe this last chunk is because of usage. TensorZero always requests it for observability, but you didn't request it, so it's a nominal chunk.

More broadly, does this present an issue for your application? How?

Thanks

2 replies

joneepenk Dec 4, 2025
Author

@GabrielBianconi thanks for pay attention.

for the same request code and with this code to print response chunk:

    for chunk in completion:
        print(chunk.model_dump_json())

last 2 chunk from qwen api like this, last chunk says "finish_reason":"stop":

{"id":"chatcmpl-d1cc4650-febf-4edb-aaa5-7aa55fe7d910","choices":[{"delta":{"content":" serve you!","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1764824140,"model":"qwen3-max","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-d1cc4650-febf-4edb-aaa5-7aa55fe7d910","choices":[{"delta":{"content":"","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":"stop","index":0,"logprobs":null}],"created":1764824140,"model":"qwen3-max","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}

and last 3 chunk from t0 gateway is, "finish_reason":"stop" show in second last chunk:

{"id":"019ae7b3-cc53-7611-8084-34743a6a5096","choices":[{"delta":{"content":" you! 😊","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1764823912,"model":"tensorzero::model_name::qwen3-max","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":"","usage":null,"episode_id":"019ae7b3-cc53-7611-8084-348eb7f4e604"}
{"id":"019ae7b3-cc53-7611-8084-34743a6a5096","choices":[{"delta":{"content":"","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":"stop","index":0,"logprobs":null}],"created":1764823912,"model":"tensorzero::model_name::qwen3-max","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":"","usage":null,"episode_id":"019ae7b3-cc53-7611-8084-348eb7f4e604"}
{"id":"019ae7b3-cc53-7611-8084-34743a6a5096","choices":[{"delta":{"content":null,"function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1764823912,"model":"tensorzero::model_name::qwen3-max","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":"","usage":null,"episode_id":"019ae7b3-cc53-7611-8084-348eb7f4e604"}

another diff is value of system_fingerprint, t0 gives empty string while qwen gives null.

joneepenk Dec 4, 2025
Author

More broadly, does this present an issue for your application? How?

it's a slight effect because we do concat all chunk content and present result to user, the suffix showing like ...None at first.
now we just filtered None content and it's normal.

GabrielBianconi · 2026-03-31T18:36:25Z

GabrielBianconi
Mar 31, 2026
Maintainer

We decided to maintain the extra chunk to more easily normalize across providers. In the future we'll offer transparent proxies, at which point this should be solved.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenAI Compatible stream api send one more empty chunk after finish chunk. #4955

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

OpenAI Compatible stream api send one more empty chunk after finish chunk. #4955

Uh oh!

joneepenk Dec 3, 2025

Replies: 2 comments · 2 replies

Uh oh!

GabrielBianconi Dec 3, 2025 Maintainer

Uh oh!

joneepenk Dec 4, 2025 Author

Uh oh!

Uh oh!

joneepenk Dec 4, 2025 Author

Uh oh!

GabrielBianconi Mar 31, 2026 Maintainer

joneepenk
Dec 3, 2025

Replies: 2 comments 2 replies

GabrielBianconi
Dec 3, 2025
Maintainer

joneepenk Dec 4, 2025
Author

joneepenk Dec 4, 2025
Author

GabrielBianconi
Mar 31, 2026
Maintainer