Skip to content

feat: record and expose terminal upstream interception errors#26961

Open
dannykopping wants to merge 2 commits into
dk/aigov137-dbfrom
dk/aigov137-backend
Open

feat: record and expose terminal upstream interception errors#26961
dannykopping wants to merge 2 commits into
dk/aigov137-dbfrom
dk/aigov137-backend

Conversation

@dannykopping

@dannykopping dannykopping commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Categorises the terminal error of a failed interception and persists it
on the interception record, then surfaces it on the AI Gateway API.

  • Categorise into an enum (bad_request, unauthorized,
    rate_limited, overloaded, server_error, unknown), unwrapping
    the ResponseError envelope, the upstream Anthropic/OpenAI SDK errors,
    and key-pool exhaustion so blocking and streaming paths agree.
  • Thread the type and raw message through the recorder dRPC into the
    aibridge_interceptions row (optional proto fields; NULL on success).
  • Expose the error on the AI Gateway thread API from the root
    interception.

This PR was produced by opencode (agent) using the anthropic/claude-opus-4-8 model, under human direction and review.

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Docs preview

📖 View docs preview for docs/reference/api/aigateway.md

Copy link
Copy Markdown
Contributor Author

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@dannykopping dannykopping changed the title feat(aibridge): record and expose terminal upstream interception errors feat: record and expose terminal upstream interception errors Jul 2, 2026
@dannykopping dannykopping force-pushed the dk/aigov137-backend branch from aa06e78 to 1b0a895 Compare July 2, 2026 09:44
@dannykopping dannykopping changed the title feat: record and expose terminal upstream interception errors feat(aibridge): record and expose terminal upstream interception errors Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

/coder-agents-review

@coder-agents-review

coder-agents-review Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Chat: Review posted | View chat
Requested: 2026-07-02 09:52 UTC by @dannykopping
Spend: $43.67 / $100.00

Review history
  • R1 (2026-07-02): 18 reviewers, 4 Nit, 2 Note, 2 P2, 5 P3, COMMENT. Review

deep-review v0.9.0 | Round 1 | a66498e..1b0a895

Last posted: Round 1, 13 findings (2 P2, 5 P3, 4 Nit, 2 Note), COMMENT. Review

Finding inventory

Findings

# Sev Status Location Summary Round Reviewer Posted
CRF-1 P3 Open enterprise/aibridged_integration_test.go:626 Dead struct field model in test case struct R1 Netero Yes
CRF-2 P2 Open aibridge/bridge.go:360 Streaming key-pool exhaustion categorized as server_error instead of unauthorized R1 Mafuuu P2, Pariston P2, Hisoka P3 Yes
CRF-3 P2 Open aibridge/provider/copilot.go:125 Copilot /v1/messages route errors always categorized as unknown R1 Razor P2, Melody P3 Yes
CRF-4 P3 Open coderd/aibridgedserver/aibridgedserver.go:819 interceptionErrorType maps unrecognized non-empty types to NULL instead of unknown R1 Knov P3, Chopper P3 Yes
CRF-5 P3 Open coderd/database/db2sdk/db2sdk.go:1285 ErrorType and ErrorMessage checked independently; partial error possible on API R1 Ryosuke P3, Hisoka P3, Chopper Note, Knuckle Note, Kite Note Yes
CRF-6 P3 Open enterprise/aibridged_integration_test.go:642 Integration test only covers blocking; streaming error path untested R1 Chopper P3 Yes
CRF-7 P3 Open coderd/aibridgedserver/aibridgedserver.go:258 No server-side truncation on error_message before DB write R1 Ryosuke P3, Knuckle Note Yes
CRF-8 Nit Open aibridge/bridge.go:358 Comment documents DB NULL behavior 3 layers from the DB R1 Gon Yes
CRF-9 Nit Open enterprise/aibridged_integration_test.go:642 Test loop without t.Run subtest scoping R1 Chopper, Ryosuke Yes
CRF-10 Nit Open aibridge/interception_error.go:45 errors.AsType available since Go 1.26; 3 instances of old pattern R1 Ging-go Yes
CRF-11 Nit Open codersdk/aibridge.go:83 Doc comment hardcodes enum value list R1 Leorio Yes
CRF-12 Note Open aibridge/interception_error_internal_test.go:92 Truncation test only exercises ASCII; multi-byte boundary behavior untested R1 Bisky, Chopper, Hisoka, Kite Yes
CRF-13 Note Open aibridge/recorder/types.go:83 ErrorTypeFromStatus doesn't cover 408/413/422 R1 Hisoka, Ryosuke Yes

Contested and acknowledged

(none)

Round log

Round 1

Panel. 0 P0, 2 P2, 5 P3, 4 Nit, 2 Note. Reviewed against a66498e..1b0a895.
Panel: Bisky, Chopper, Ging-go, Gon, Hisoka, Kite, Knov, Knuckle, Komugi, Leorio, Mafu-san, Mafuuu, Melody, Meruem, Pariston, Ryosuke. Wildcards: Razor, Killua.

About deep-review

CRF = Coder Review Finding (P0-P4, Nit, Note)

Reviewer Focus
Bisky tests
Chopper ops/errors
Churn-guard change verification
Ging language modernization
Gon naming
Hisoka edge cases
Killua perf
Kite change integrity
Knov contracts
Knuckle SQL
Komugi flake/determinism
Kurapika security
Law decomposition
Leorio docs
Luffy product
Mafu-san process
Mafuuu contracts
Melody dispatch/pairing
Meruem structural
Nami frontend
Netero mechanical checks
Pariston premise testing
Pen-botter product gaps
Razor verification
Robin duplication
Ryosuke Go arch
Takumi concurrency
Zoro shape

🤖 Managed by Coder Agents.

@dannykopping dannykopping changed the title feat(aibridge): record and expose terminal upstream interception errors feat: record and expose terminal upstream interception errors Jul 2, 2026
@dannykopping dannykopping force-pushed the dk/aigov137-backend branch from 1b0a895 to 66f667d Compare July 2, 2026 10:06

@coder-agents-review coder-agents-review Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error categorization architecture is well-designed. The three-layer dispatch (domain in categorizeInterceptionError, provider-specific via CategorizeError, shared ErrorTypeFromStatus) is the right decomposition, and the narrow errorCategorizer interface keeps provider imports out of the bridge package. Test density at 64% covers every layer from unit through integration. The migration is safe (nullable columns, no rewrite), the proto optional fields are correct, and the enum/constant alignment is clean.

2 P2, 5 P3, 4 Nit.

The two P2s are related: the streaming interceptors lose the original key-pool error cause (returning the client-facing 502 ResponseError instead of wrapping the keyPoolErr), and the Copilot provider's CategorizeError only recognizes OpenAI-shaped errors, missing its own /v1/messages route's Anthropic-shaped errors. Both produce incorrect categorization for real failure modes.

"Same root cause, different recorded type. The test expectations document this intentionally, but the asymmetry is observable by API consumers filtering on error_type." (Knov)

Notes for the body (not inline): the truncation test at interception_error_internal_test.go:92 only exercises ASCII, so the multi-byte UTF-8 boundary behavior of strings.ToValidUTF8 is untested (the code is correct, the test just doesn't prove the interesting case). ErrorTypeFromStatus maps 408/413/422 to unknown; worth noting if the enum expands.

🤖 This review was automatically generated with Coder Agents.

Comment thread aibridge/bridge.go
Comment thread aibridge/provider/copilot.go
Comment thread coderd/aibridgedserver/aibridgedserver.go
Comment thread coderd/database/db2sdk/db2sdk.go
Comment thread enterprise/aibridged_integration_test.go
Comment thread enterprise/aibridged_integration_test.go
Comment thread aibridge/interception_error.go
Comment thread codersdk/aibridge.go
Comment thread aibridge/interception_error_internal_test.go
Comment thread aibridge/recorder/types.go
Categorises the terminal error of a failed interception and persists it
on the interception record, then surfaces it on the AI Gateway API.

- Categorise into an enum (`bad_request`, `unauthorized`,
  `rate_limited`, `overloaded`, `server_error`, `unknown`), unwrapping
  the ResponseError envelope, the upstream Anthropic/OpenAI SDK errors,
  and key-pool exhaustion so blocking and streaming paths agree.
- Thread the type and raw message through the recorder dRPC into the
  `aibridge_interceptions` row (optional proto fields; NULL on success).
- Expose the error on the AI Gateway thread API from the root
  interception.

*This PR was produced by opencode (agent) using the `anthropic/claude-opus-4-8` model, under human direction and review.*
- record streaming key-pool exhaustion as unauthorized (not server_error)
  by preserving the *keypool.Error on the messages/chatcompletions
  streaming paths; the client still receives the masked envelope
- categorize Copilot's Anthropic-style /v1/messages errors
- store an unrecognized error type as unknown instead of NULL, and cap
  the error message at the dRPC write boundary
- nest the API error message under error_type to avoid a half-populated
  error on the response
- add a timeout error type (HTTP 408 and context deadline; 413/422 map
  to bad_request)
- integration test covers streaming; drop dead field
@dannykopping dannykopping force-pushed the dk/aigov137-backend branch from 761c8c8 to d8e72b3 Compare July 2, 2026 14:30
@dannykopping dannykopping marked this pull request as ready for review July 2, 2026 15:59
@dannykopping dannykopping requested a review from ssncferreira July 2, 2026 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant