Skip to content

fix: handle empty AI Gateway streams and model routing#26005

Draft
ibetitsmike wants to merge 3 commits into
mainfrom
mike/ai-gateway-jw3c
Draft

fix: handle empty AI Gateway streams and model routing#26005
ibetitsmike wants to merge 3 commits into
mainfrom
mike/ai-gateway-jw3c

Conversation

@ibetitsmike
Copy link
Copy Markdown
Collaborator

OpenRouter can return comment-only SSE heartbeats with no data frames, which previously surfaced as a generic JSON EOF error.

This adds SSE stream accounting for chat completions so empty or malformed upstream streams become clear 502 responses, and rejects OpenRouter-like providers typed as native openai when slash-namespaced models would otherwise be stripped. Valid openrouter and openai-compat routes continue to preserve full model IDs.

Mux created this PR on behalf of Mike.

@ibetitsmike
Copy link
Copy Markdown
Collaborator Author

/coder-agents-review

@coder-agents-review
Copy link
Copy Markdown
Contributor

coder-agents-review Bot commented Jun 2, 2026

Chat: Review posted | View chat
Requested: 2026-06-03 02:02 UTC by @ibetitsmike
Spend: $84.06 / $100.00

Review history
  • R1 (2026-06-02): 20 reviewers, 8 Nit, 1 Note, 3 P2, 10 P3, 1 P4, COMMENT. Review
  • R2 (2026-06-03), 8 Nit, 1 Note, 3 P2, 10 P3, 1 P4, COMMENT. Review
  • R3 (2026-06-03): 8 reviewers, 8 Nit, 1 Note, 3 P2, 11 P3, 1 P4, COMMENT. Review
  • R4 (2026-06-03), 8 Nit, 1 Note, 3 P2, 11 P3, 1 P4, COMMENT. Review

deep-review v0.6.1 | Round 4 | efbaede..3f79eb5

Last posted: Round 4, 24 findings (3 P2, 11 P3, 1 P4, 8 Nit, 1 Note), COMMENT. Review

Finding inventory

Findings

# Sev Status Location Summary Round Reviewer Posted
CRF-1 Nit Author contested; panel closed R3 (8/8 accept) sse_filter.go:73 isEventStreamContentType duplicates isSSEContentType in chatdebug/transport.go R1 Netero Yes
CRF-2 P3 Author fixed (e991ec6) exp_chats.go:7049 updateChatModelConfig model-only path validates provider pre-tx but not in-tx (TOCTOU) R1 Netero Yes
CRF-3 Note Author contested; panel closed R3 (7/7 accept) sse_filter.go:1 sse_filter.go (193 lines, new file) has no dedicated unit test file R1 Netero Yes
CRF-4 P2 Author fixed (e991ec6) model_routing_aibridge.go:150 Runtime validation error not classified as ChatErrorKindConfig; pre-existing misconfigs produce generic chat error R1 Chopper P3, Knov P3, Mafuuu P3, Ryosuke P2 Yes
CRF-5 P2 Author fixed (e991ec6) exp_chats.go:7049 Update validation blocks all edits on pre-existing misconfigured model configs, not just the misconfigured fields R1 Hisoka Yes
CRF-6 P3 Author fixed (e991ec6) streaming.go:567 isJSONDecodeStreamError io.EOF arm misclassifies transport failures as malformed JSON in operator logs R1 Hisoka P3, Meruem P3 Yes
CRF-7 P3 Author fixed (e991ec6) model_routing_aibridge.go:97 Validation heuristic is OpenRouter-specific but slash-model stripping affects all non-gateway providers typed as openai R1 Pariston P3, Luffy P3 Yes
CRF-8 P3 Author fixed (e991ec6) sse_filter.go:66 DONE-only streams escape empty-stream detection, producing silent empty response R1 Meruem P3 Yes
CRF-9 P3 Author fixed (e991ec6) streaming_internal_test.go:131 No test for DONE-only upstream stream R1 Bisky Yes
CRF-10 P3 Author fixed (e991ec6) streaming.go:499 User-facing error messages use backend jargon (upstream stream, data events) R1 Leorio Yes
CRF-11 P2 Author fixed (e991ec6) streaming.go:491 mapStreamError doc comment stale after adding empty-stream detection R1 Gon Yes
CRF-12 P3 Author fixed (e991ec6) sse_filter.go:116 Magic shift <<9 (32MB) with no named constant or comment R1 Gon Yes
CRF-13 P3 Author fixed (e991ec6) model_routing_aibridge.go:116 Base URL hostname extraction duplicates openai_compat_patches.go with scheme fallback divergence R1 Robin Yes
CRF-14 P3 Author fixed (e991ec6) exp_chats_test.go:3850 No integration test for updating AIProviderID to misconfigured provider when model already has slash R1 Bisky Yes
CRF-15 Nit Author fixed (e991ec6) streaming.go:555 Log field saw_data_chunk diverges from struct field dataEvents and method hasDataEvents() R1 Gon Yes
CRF-16 Nit Author fixed (e991ec6) exp_chats.go:6771 validateChatModelConfigAIProvider swaps parameter order relative to ValidateAIGatewayProviderModel R1 Gon Yes
CRF-17 Nit Author fixed (e991ec6) streaming.go:570 errors.AsType available since Go 1.26; codebase already uses it R1 Ging-Go Yes
CRF-18 Nit Author fixed (e991ec6) sse_filter.go:81 sync.OnceValue available since Go 1.21; would eliminate two struct fields R1 Ging-Go Yes
CRF-19 Nit Author fixed (e991ec6) sse_filter.go:72 strings.Cut instead of strings.Split; same file uses bytes.Cut R1 Ging-Go Yes
CRF-20 Nit Author fixed (e991ec6) exp_chats.go:6944 aiProviderValidationResp nil guard is unreachable dead code R1 Zoro Yes
CRF-21 P3 Author fixed (e991ec6) model_routing_aibridge.go:91 Doc comment understates failure mode: also causes provider change and request format mismatch R1 Razor P3, Leorio Nit Yes
CRF-22 P4 Author fixed (e991ec6) sse_filter.go:163 emptyEvents counter inflated by comment-only segments; diagnostic signal misleading R1 Razor P4, Ryosuke P3 Yes
CRF-23 P4 Dropped by orchestrator (function is 3 lines; existing tests cover important cases) exp_chats_internal_test.go:12 Boundary inputs for model string validation untested R1 Bisky No
CRF-24 P4 Dropped by orchestrator (error message matches current heuristic scope) model_routing_aibridge.go:100 Error message hardcodes OpenRouter R1 Luffy No
CRF-25 Nit Author fixed (e991ec6) exp_chats.go:6774 Error message says OpenRouter model when issue is provider type configuration R1 Mafuuu Yes
CRF-26 P3 Author fixed (3f79eb5) chatprovider.go:191 ProviderBaseURLHostname scheme-fallback branch (no-scheme URL) has no test coverage R3 Bisky Yes

Contested and acknowledged

CRF-1 (Nit, sse_filter.go:73) - isEventStreamContentType duplicates isSSEContentType

  • Finding: isEventStreamContentType in aibridge/intercept/chatcompletions/sse_filter.go duplicates isSSEContentType from coderd/x/chatd/chatdebug/transport.go:404. A shared helper would eliminate the duplication.
  • Author defense: The chatdebug helper is unexported in a chatd package. Importing chatd from aibridge would create the wrong dependency direction. The new helper is kept package-local.
  • Panel closure (R3, 8/8): All reviewers verified the dependency direction constraint. aibridge/ has zero imports from coderd/x/chatd/. Extracting a shared package for a 5-line function is disproportionate. Acceptable duplication.

CRF-3 (Note, sse_filter.go:1) - No dedicated unit test file for sse_filter.go

  • Finding: sse_filter.go (193 lines, new file) has no dedicated unit test file. Internal functions are exercised only through integration tests.
  • Author defense: Coverage is kept at the interceptor seam. Tests pin 8 distinct scenarios: empty streams, comment-only, DONE-only, malformed data, large SSE lines, missing DONE after valid data, and comments before valid chunks.
  • Panel closure (R3, 7/7): All evaluating reviewers verified the test coverage. The interceptor boundary is the correct test seam: it proves the filter delivers correct behavior at the contract level. Unit-testing internal parsing would be testing implementation details, not outcomes.

Round log

Round 1

Panel. Netero first pass: 1 P3, 1 Nit, 1 Note. Panel (20 reviewers): 3 P2, 9 P3, 1 P4, 8 Nit new. 2 dropped. Reviewed against efbaede..deb4344.

Round 2

BLOCKED. 21 addressed, 2 silent (CRF-1, CRF-3). No review.

Round 3

Panel. CRF-1 panel closed (8/8). CRF-3 panel closed (7/7). 1 new P3 (CRF-26). Reviewed against efbaede..e991ec6.

Round 4

CRF-26 addressed (3f79eb5). All findings resolved. Reviewed against efbaede..3f79eb5.

About deep-review

CRF = Coder Review Finding (P0-P4, Nit, Note)

Reviewer Focus
Bisky tests
Chopper ops/errors
Churn-guard change verification
Ging language modernization
Gon naming
Hisoka edge cases
Killua perf
Kite change integrity
Knov contracts
Knuckle SQL
Kurapika security
Law decomposition
Leorio docs
Luffy product
Mafu-san process
Mafuuu contracts
Melody dispatch/pairing
Meruem structural
Nami frontend
Netero mechanical checks
Pariston premise testing
Pen-botter product gaps
Razor verification
Robin duplication
Ryosuke Go arch
Takumi concurrency
Zoro shape

🤖 Managed by Coder Agents.

Copy link
Copy Markdown
Contributor

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good PR. The SSE filter is well-designed: correct pipe lifecycle, clean goroutine shutdown, proper SSE spec compliance. The defense-in-depth validation (config time + runtime) is the right pattern. The logUpstreamStreamFailure structured logging is exemplary operator tooling. Test density is healthy at 53% with genuine assertions that verify both new behavior and absence of old failure modes.

Severity count: 3 P2, 9 P3, 1 P4, 8 Nit.

The three P2s cluster around the pre-existing misconfiguration experience: the runtime error lacks chat error classification (CRF-4), the update endpoint blocks unrelated edits on already-broken configs (CRF-5), and the mapStreamError doc comment no longer describes what the function does (CRF-11). The P3s are a mix of operator signal quality (CRF-6, CRF-10), test gaps (CRF-9, CRF-14), and structural observations about the heuristic scope (CRF-7).

Highlight quote from Pariston: "The SSE filter exists because the OpenAI SDK's stream decoder doesn't strip SSE comment lines per the SSE spec. If the SDK handled comments correctly, the filter becomes overhead with no behavioral effect. Worth tracking."

🤖 This review was automatically generated with Coder Agents.

Comment thread coderd/x/chatd/model_routing_aibridge.go
Comment thread coderd/exp_chats.go Outdated
Comment thread aibridge/intercept/chatcompletions/streaming.go
Comment thread aibridge/intercept/chatcompletions/streaming.go Outdated
Comment thread coderd/x/chatd/model_routing_aibridge.go
Comment thread aibridge/intercept/chatcompletions/sse_filter.go Outdated
Comment thread aibridge/intercept/chatcompletions/sse_filter.go
Comment thread coderd/exp_chats.go Outdated
Comment thread coderd/exp_chats.go Outdated
Comment thread aibridge/intercept/chatcompletions/sse_filter.go
@ibetitsmike
Copy link
Copy Markdown
Collaborator Author

/coder-agents-review

Copy link
Copy Markdown
Contributor

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

21 of 23 findings addressed in e991ec6. Excellent responsiveness across the board: all three P2s fixed, all P3s resolved, and the nits cleaned up.

Two findings remain without a response:

  • CRF-1 (Nit) sse_filter.go:73: isEventStreamContentType duplicates isSSEContentType from chatdebug/transport.go:404. Both parse media type and compare against text/event-stream. Thread was resolved without a comment.
  • CRF-3 (Note) sse_filter.go:1: No dedicated unit test file for the 193-line sse_filter.go. Internal functions (stats methods, filter goroutine lifecycle, pipe error propagation) are exercised only through integration tests. Thread was resolved without a comment.

Further review is blocked until these are addressed, acknowledged ("won't fix" is fine), or deferred with a ticket. These are low-severity items; a brief response on each thread is sufficient to unblock.

🤖 This review was automatically generated with Coder Agents.

@ibetitsmike
Copy link
Copy Markdown
Collaborator Author

/coder-agents-review

Copy link
Copy Markdown
Contributor

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 21 fixes verified across multiple reviewers. No regressions found. CI green.

Contested findings resolved:

  • CRF-1 (Nit, code duplication): panel closed 8/8. Dependency direction prevents sharing; two 5-line functions in separate packages is acceptable.
  • CRF-3 (Note, no unit test file): panel closed 7/7. Integration tests at the interceptor seam cover the behavioral contract across 8 scenarios.

One new finding (1 P3). The CRF-13 fix extracted ProviderBaseURLHostname as a shared function but the scheme-fallback branch (no-scheme URLs) has no test coverage.

Mafu-san: "21 of 23 non-dropped findings addressed in one commit. Both contested findings now have documented defenses. No findings silently ignored. The fix commit touches exactly 11 files, the same 11 as the original commit. No scope drift."

🤖 This review was automatically generated with Coder Agents.

Comment thread coderd/x/chatd/chatprovider/chatprovider.go
@ibetitsmike
Copy link
Copy Markdown
Collaborator Author

/coder-agents-review

Copy link
Copy Markdown
Contributor

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRF-26 fixed in 3f79eb5 with TestProviderBaseURLHostname covering full URL, bare host (scheme-fallback), host with port, and invalid URL.

All 26 findings across 4 rounds are resolved: 22 fixed, 2 panel-closed (contested), 2 dropped. CI green. Test density 59.5%.

🤖 This review was automatically generated with Coder Agents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant