Skip to content

feat: add server-validated structured output for chat turns#26954

Draft
ibetitsmike wants to merge 4 commits into
mainfrom
mike/structured-output
Draft

feat: add server-validated structured output for chat turns#26954
ibetitsmike wants to merge 4 commits into
mainfrom
mike/structured-output

Conversation

@ibetitsmike

@ibetitsmike ibetitsmike commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Adds first-class API and chatd support for requesting a server-validated structured final output from an agent turn, so SDK clients can get a typed result without defining their own final-output tool.

Summary

Callers set response_format (type json_schema) on CreateChatRequest or CreateChatMessageRequest. The HTTP layer validates the schema (16 KiB cap, compilable, root object, fragment-local $ref only, strict=false rejected) and persists it as a server-created response-format message part on the user turn, so it survives queueing, requires_action resume, and compaction. Omitting the field preserves current behavior exactly; plan-mode turns reject it because plan mode has its own stop-after completion semantics.

chatd keeps the normal agent loop (builtin/MCP/provider/dynamic tools, subagents, compaction, retries) and injects a server-owned coder_structured_output finalizer tool: builtin (name protected against dynamic and MCP collisions), exclusive (must be called alone), and stop-after (a successful result finishes the turn). Every structured step sets fantasy.ToolChoiceRequired, plain-text steps regenerate instead of completing (bounded: 5 consecutive text-only completions fail the turn fast instead of storming the provider), validation failures return retryable tool errors, and step-budget exhaustion fails terminally with structured_output_not_produced instead of succeeding.

The result contract for clients: the success signal is a non-error tool-result part with tool_name == "coder_structured_output" whose result carries the canonical validated JSON. Successful finalizer results are exempt from chatloop's tool-result truncation so the canonical payload is never corrupted (validation-error results still truncate). ExtractStructuredOutputValue pins this contract in tests, and the finalizer result never re-enters the provider prompt, so the existing chain-mode safeguard covers follow-up turns.

Generated client types now represent arbitrary-JSON chat fields correctly: schema, tool-call args, and input_schema generate as Record<string, unknown>, tool result and dynamic tool output as unknown, and the OpenAPI spec renders them as objects.

No DB migration; github.com/kaptinlin/jsonschema promoted from indirect to direct dependency for schema compilation. Architecture documented in coderd/x/chatd/ARCHITECTURE.md.

This PR was authored by Mux (an AI coding agent) working on behalf of Mike.

Callers can request a JSON-schema-shaped final output for an assistant
turn via response_format on CreateChatRequest/CreateChatMessageRequest.
chatd keeps the normal agent loop but injects a server-owned
coder_structured_output finalizer tool (builtin, exclusive,
stop-after) and sets required tool choice on every structured step.
The turn only succeeds once the finalizer validates the output against
the caller schema; step-budget exhaustion fails terminally instead of
succeeding. Omitting the field preserves current behavior exactly.
@ibetitsmike

Copy link
Copy Markdown
Collaborator Author

/coder-agents-review

@coder-agents-review

coder-agents-review Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Chat: Review posted | View chat
Requested: 2026-07-02 11:19 UTC by @ibetitsmike
Spend: $104.91 / $100.00

Review history
  • R1 (2026-07-02): 24 reviewers, 11 Nit, 7 P3, COMMENT. Review
  • R2 (2026-07-02): 8 reviewers, 12 Nit, 7 P3, APPROVE. Review

deep-review v0.9.0 | Round 2 | cf75dd0..ab89f44

Last posted: Round 2, 19 findings (7 P3, 12 Nit), APPROVE. Review

Finding inventory

Findings

# Sev Status Location Summary Round Reviewer Posted
CRF-1 P3 Author fixed (ab89f44) structuredoutput.go:246 Info() returns shared mutable reference to schemaMap R1 Hisoka P3, Kite P3, Knov P3, Ryosuke P3, Mafuuu Note Yes
CRF-2 Nit Author fixed (ab89f44) structuredoutput_test.go:346 Test comment claims Info returns independent copy; false R1 Hisoka, Kite, Knov, Ryosuke Yes
CRF-3 P3 Author fixed (ab89f44) chatprompt.go:1805 encodeNulInParts does not process ResponseFormat field R1 Knuckle Yes
CRF-4 P3 Author fixed (ab89f44) generation.go:189 errStructuredOutputNotProduced lacks actionable guidance and uses ChatErrorKindGeneric R1 Leorio P3, Chopper P3 Yes
CRF-5 P3 Author fixed (ab89f44) generation.go:663 Empty-response path reuses step-limit error for a different failure mode R1 Leorio Yes
CRF-6 P3 Author fixed (ab89f44) structuredoutput.go:48 ValidationError duplicates codersdk.ValidationError R1 Robin P3, Zoro Nit Yes
CRF-7 P3 Author fixed (ab89f44) generation_preparer.go:485 MCP shadow warning omits server identity R1 Chopper Yes
CRF-8 P3 Author fixed (ab89f44) structuredoutput_test.go:222 No test for Info() with non-empty Description R1 Bisky Yes
CRF-9 Nit Author fixed (ab89f44) structured_output_internal_test.go:55 soAssistantMessage is cryptic for a generic text helper R1 Gon Yes
CRF-10 Nit Author fixed (ab89f44) exp_chats_test.go:14457 InvalidFormats table loop without t.Run subtests R1 Bisky, Chopper Yes
CRF-11 Nit Author fixed (ab89f44) chats.go:473 ResponseFormat field comment 7 lines; 2 deliver the contract R1 Gon Yes
CRF-12 Nit Author fixed (ab89f44) structured_output.go:59 activeTurnResponseFormat doc 8 lines; 3 deliver the invariants R1 Gon Yes
CRF-13 Nit Author fixed (ab89f44) structuredoutput.go:179 Delete-loop clears map; clear(m) since Go 1.21 R1 Ging-Go Yes
CRF-14 Nit Author fixed (ab89f44) structuredoutput.go:320 Collect-keys-and-sort is slices.Sorted(maps.Keys()) since 1.23 R1 Ging-Go Yes
CRF-15 Nit Author fixed (ab89f44) structuredoutput.go:316 Empty-string fallback is cmp.Or since Go 1.22 R1 Ging-Go Yes
CRF-16 Nit Author fixed (ab89f44) chats.go:480 ResponseFormat JSON tag missing omitempty R1 Knuckle Nit, Ryosuke Note Yes
CRF-17 Nit Author fixed (ab89f44) chats.go:344 Description field missing doc comment R1 Leorio Yes
CRF-18 Nit Author fixed (ab89f44) ARCHITECTURE.md:1247 Doc cites structuredoutput.Validate but code uses NewRequest R1 Leorio Yes
CRF-19 Nit Open structuredoutput_test.go:97 Name validation and schema mismatch loops lack t.Run subtests R2 Bisky Yes

Law analysis

Effective LOC: 2248 (812 production, 1436 test, 592 generated). Head SHA: fa98d8c. Verdict: Don't split. Enforcement: Advisory. Single vertical slice, one feature, one risk domain. 812 production lines across 7 files.

Contested and acknowledged

Round log

Round 1

Netero: no findings. Law: don't split (advisory). Panel: 22 triggered + 2 wildcards. 7 P3, 11 Nit. Reviewed against cf75dd0..fa98d8c.

Round 2

Churn guard: PROCEED. 18/18 findings addressed in ab89f44. Netero: no findings. Panel: 5 always-on + 2 triggered + 1 wildcard. 0 P3, 1 Nit new. All R1 fixes verified. Reviewed against cf75dd0..ab89f44.

About deep-review

CRF = Coder Review Finding (P0-P4, Nit, Note)

Reviewer Focus
Bisky tests
Chopper ops/errors
Churn-guard change verification
Ging language modernization
Gon naming
Hisoka edge cases
Killua perf
Kite change integrity
Knov contracts
Knuckle SQL
Komugi flake/determinism
Kurapika security
Law decomposition
Leorio docs
Luffy product
Mafu-san process
Mafuuu contracts
Melody dispatch/pairing
Meruem structural
Nami frontend
Netero mechanical checks
Pariston premise testing
Pen-botter product gaps
Razor verification
Robin duplication
Ryosuke Go arch
Takumi concurrency
Zoro shape

🤖 Managed by Coder Agents.

@coder-agents-review coder-agents-review Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well-designed feature. The finalizer tool approach correctly reuses the existing agent loop machinery (exclusive tools, stop-after, step budget) rather than inventing new mechanisms. Layered security boundaries for schema validation (HTTP rejection, loader stripping, reserved name at three layers) are strong. Test density is 63.9% with genuine behavioral assertions at four levels (unit, internal decision logic, E2E, HTTP integration). The SchemaNormalizeRoundTrip test specifically pins a real integration risk.

7 P3, 11 Nit.

Observations worth noting but not filed as findings:

  • partsToMessageParts has no explicit case for ChatMessagePartTypeResponseFormat; the part is dropped by implicit switch fallthrough. Five reviewers noted this. It follows the pre-existing pattern for the skill part type and is tested by TestConvertMessagesWithFiles_SkipsResponseFormatPart. An explicit case with continue would document the intent and guard against a future default branch.

  • activeTurnResponseFormat and activeTurnStart use different visibility filters for "active turn user message." Four reviewers noted this. The first filters by isUserVisibleChatMessage; the second accepts any non-deleted, non-compressed user message. The inconsistency has no current trigger path (model-only user messages are always compressed) and is inherited from the pre-existing inline logic.

  • formatValidationErrors recursively walks list.Details with no output size bound. The 16 KiB schema cap limits the blast radius. A maximally adversarial schema could produce a large error string that consumes model context, degrading retry quality, but the practical impact is bounded.

"A reader encountering soAssistantMessage(t, 2, "done") would expect a structured-output-flavored message." (Gon)


coderd/x/chatd/chatprompt/chatprompt.go:1805

P3 [CRF-3] encodeNulInParts processes Text, Content, Args, ArgsDelta, Result, ResultDelta but does not touch ResponseFormat. The ChatResponseFormatJSONSchema.Description field is a free-form string from the API caller with no NUL validation. If it contains a NUL byte, json.Marshal encodes it as \u0000 and PostgreSQL rejects the jsonb insert.

"The consequence is a 500 error on a pathological API request. The caller gets no useful error message; the server log shows a PostgreSQL constraint violation." (Knuckle)

The Schema field (json.RawMessage) has the same gap. The Name field is safe because the regex rejects NUL. Fix: add ResponseFormat processing to encodeNulInParts/decodeNulInParts.

(Knuckle)

🤖

🤖 This review was automatically generated with Coder Agents.

Comment thread coderd/x/chatd/structuredoutput/structuredoutput.go Outdated
Comment thread coderd/x/chatd/structuredoutput/structuredoutput_test.go
Comment thread coderd/x/chatd/generation.go Outdated
Comment thread coderd/x/chatd/generation.go
Comment thread coderd/x/chatd/structuredoutput/structuredoutput.go Outdated
Comment thread coderd/x/chatd/structuredoutput/structuredoutput.go Outdated
Comment thread coderd/x/chatd/structuredoutput/structuredoutput.go Outdated
Comment thread codersdk/chats.go Outdated
Comment thread codersdk/chats.go
Comment thread coderd/x/chatd/ARCHITECTURE.md
- NUL-encode response-format part fields before jsonb persistence (CRF-3)
- deep-copy the finalizer parameter map per Info call (CRF-1, CRF-2)
- dedicated structured_output error kind with actionable message (CRF-4, CRF-5)
- reuse codersdk.ValidationError instead of a parallel type (CRF-6)
- include MCP server config ID in the shadow-drop warning (CRF-7)
- test the finalizer description branch and Info copy semantics (CRF-8)
- explicit response-format case in partsToMessageParts (review note)
- test/comment/doc cleanups (CRF-9 through CRF-18)
@ibetitsmike

Copy link
Copy Markdown
Collaborator Author

/coder-agents-review

@coder-agents-review coder-agents-review Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 18 R1 findings addressed in ab89f44. Verified by 8 reviewers across both the fix code and the original feature. The fixes are proportional and address root causes: copyJSONObject for the shared mutable schemaMap, mapNulInResponseFormat with DB round-trip test for the NUL encoding gap, ChatErrorKindStructuredOutput with an actionable message, direct use of codersdk.ValidationError, and MCP server config ID in shadow warnings.

1 new Nit.

"I tried to build a case against this change and could not." (Pariston)

🤖 This review was automatically generated with Coder Agents.

format := jsonSchemaFormat(validSchema)
format.JSONSchema.Name = name
err := structuredoutput.Validate(format)
require.NotNil(t, err, "name %q should be rejected", name)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit [CRF-19] The rejected-name loop (lines 97-105), accepted-name loop, RootNotObject loop (lines 135-146), FragmentLocalRefs rejection loop (lines 159-166), and RunSchemaMismatch loop (lines 253-265) all iterate over multiple inputs without t.Run. Same pattern CRF-10 fixed in exp_chats_test.go. Each require call includes the input in its message, so diagnostics are adequate, but t.Run would give subtest isolation and go test -run addressability.

(Bisky)

🤖

Fixes three issues found in dogfood testing of server-validated
structured output:

- Exempt successful finalizer results from tool-result truncation.
  Truncation corrupted the canonical validated JSON while still
  persisting it as a success, breaking the recovery contract. Error
  results (validation failures) are still truncated. New chatloop
  ResultTruncationExempter interface, implemented by the finalizer.
- Fail structured turns fast after 5 consecutive text-only assistant
  completions instead of regenerating up to the full 1200-step budget
  against a model or proxy that ignores required tool choice. Any
  tool call (including failed finalizer attempts) resets the streak.
- Fix generated types for arbitrary-JSON chat fields: response_format
  schema, tool-call args, dynamic tool input_schema now generate as
  Record<string, unknown> (was Record<string, string>), tool result
  and dynamic tool output as unknown; swagger now renders them as
  objects (was array of integer) via swaggertype tags.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant