feat: add server-validated structured output for chat turns by ibetitsmike · Pull Request #26954 · coder/coder

ibetitsmike · 2026-07-02T05:19:50Z

Adds first-class API and chatd support for requesting a server-validated structured final output from an agent turn, so SDK clients can get a typed result without defining their own final-output tool.

Summary

Callers set response_format (type json_schema) on CreateChatRequest or CreateChatMessageRequest. The HTTP layer validates the schema (16 KiB cap, compilable, root object, fragment-local $ref only, strict=false rejected) and persists it as a server-created response-format message part on the user turn, so it survives queueing, requires_action resume, and compaction. Omitting the field preserves current behavior exactly; plan-mode turns reject it because plan mode has its own stop-after completion semantics.

chatd keeps the normal agent loop (builtin/MCP/provider/dynamic tools, subagents, compaction, retries) and injects a server-owned coder_structured_output finalizer tool: builtin (name protected against dynamic and MCP collisions), exclusive (must be called alone), and stop-after (a successful result finishes the turn). Every structured step sets fantasy.ToolChoiceRequired, plain-text steps regenerate instead of completing (bounded: 5 consecutive text-only completions fail the turn fast instead of storming the provider), validation failures return retryable tool errors, and step-budget exhaustion fails terminally with structured_output_not_produced instead of succeeding.

The result contract for clients: the success signal is a non-error tool-result part with tool_name == "coder_structured_output" whose result carries the canonical validated JSON. Successful finalizer results are exempt from chatloop's tool-result truncation so the canonical payload is never corrupted (validation-error results still truncate). ExtractStructuredOutputValue pins this contract in tests, and the finalizer result never re-enters the provider prompt, so the existing chain-mode safeguard covers follow-up turns.

Generated client types now represent arbitrary-JSON chat fields correctly: schema, tool-call args, and input_schema generate as Record<string, unknown>, tool result and dynamic tool output as unknown, and the OpenAPI spec renders them as objects.

No DB migration; github.com/kaptinlin/jsonschema promoted from indirect to direct dependency for schema compilation. Architecture documented in coderd/x/chatd/ARCHITECTURE.md.

This PR was authored by Mux (an AI coding agent) working on behalf of Mike.

Callers can request a JSON-schema-shaped final output for an assistant turn via response_format on CreateChatRequest/CreateChatMessageRequest. chatd keeps the normal agent loop but injects a server-owned coder_structured_output finalizer tool (builtin, exclusive, stop-after) and sets required tool choice on every structured step. The turn only succeeds once the finalizer validates the output against the caller schema; step-budget exhaustion fails terminally instead of succeeding. Omitting the field preserves current behavior exactly.

ibetitsmike · 2026-07-02T05:20:20Z

/coder-agents-review

coder-agents-review · 2026-07-02T05:20:23Z

Chat: Review posted | View chat
Requested: 2026-07-02 11:19 UTC by @ibetitsmike
Spend: $104.91 / $100.00

Review history

R1 (2026-07-02): 24 reviewers, 11 Nit, 7 P3, COMMENT. Review
R2 (2026-07-02): 8 reviewers, 12 Nit, 7 P3, APPROVE. Review

deep-review v0.9.0 | Round 2 | cf75dd0..ab89f44

Last posted: Round 2, 19 findings (7 P3, 12 Nit), APPROVE. Review

Finding inventory

Findings

#	Sev	Status	Location	Summary	Round	Reviewer	Posted
CRF-1	P3	Author fixed (`ab89f44`)	structuredoutput.go:246	Info() returns shared mutable reference to schemaMap	R1	Hisoka P3, Kite P3, Knov P3, Ryosuke P3, Mafuuu Note	Yes
CRF-2	Nit	Author fixed (`ab89f44`)	structuredoutput_test.go:346	Test comment claims Info returns independent copy; false	R1	Hisoka, Kite, Knov, Ryosuke	Yes
CRF-3	P3	Author fixed (`ab89f44`)	chatprompt.go:1805	encodeNulInParts does not process ResponseFormat field	R1	Knuckle	Yes
CRF-4	P3	Author fixed (`ab89f44`)	generation.go:189	errStructuredOutputNotProduced lacks actionable guidance and uses ChatErrorKindGeneric	R1	Leorio P3, Chopper P3	Yes
CRF-5	P3	Author fixed (`ab89f44`)	generation.go:663	Empty-response path reuses step-limit error for a different failure mode	R1	Leorio	Yes
CRF-6	P3	Author fixed (`ab89f44`)	structuredoutput.go:48	ValidationError duplicates codersdk.ValidationError	R1	Robin P3, Zoro Nit	Yes
CRF-7	P3	Author fixed (`ab89f44`)	generation_preparer.go:485	MCP shadow warning omits server identity	R1	Chopper	Yes
CRF-8	P3	Author fixed (`ab89f44`)	structuredoutput_test.go:222	No test for Info() with non-empty Description	R1	Bisky	Yes
CRF-9	Nit	Author fixed (`ab89f44`)	structured_output_internal_test.go:55	soAssistantMessage is cryptic for a generic text helper	R1	Gon	Yes
CRF-10	Nit	Author fixed (`ab89f44`)	exp_chats_test.go:14457	InvalidFormats table loop without t.Run subtests	R1	Bisky, Chopper	Yes
CRF-11	Nit	Author fixed (`ab89f44`)	chats.go:473	ResponseFormat field comment 7 lines; 2 deliver the contract	R1	Gon	Yes
CRF-12	Nit	Author fixed (`ab89f44`)	structured_output.go:59	activeTurnResponseFormat doc 8 lines; 3 deliver the invariants	R1	Gon	Yes
CRF-13	Nit	Author fixed (`ab89f44`)	structuredoutput.go:179	Delete-loop clears map; clear(m) since Go 1.21	R1	Ging-Go	Yes
CRF-14	Nit	Author fixed (`ab89f44`)	structuredoutput.go:320	Collect-keys-and-sort is slices.Sorted(maps.Keys()) since 1.23	R1	Ging-Go	Yes
CRF-15	Nit	Author fixed (`ab89f44`)	structuredoutput.go:316	Empty-string fallback is cmp.Or since Go 1.22	R1	Ging-Go	Yes
CRF-16	Nit	Author fixed (`ab89f44`)	chats.go:480	ResponseFormat JSON tag missing omitempty	R1	Knuckle Nit, Ryosuke Note	Yes
CRF-17	Nit	Author fixed (`ab89f44`)	chats.go:344	Description field missing doc comment	R1	Leorio	Yes
CRF-18	Nit	Author fixed (`ab89f44`)	ARCHITECTURE.md:1247	Doc cites structuredoutput.Validate but code uses NewRequest	R1	Leorio	Yes
CRF-19	Nit	Open	structuredoutput_test.go:97	Name validation and schema mismatch loops lack t.Run subtests	R2	Bisky	Yes

Law analysis

Effective LOC: 2248 (812 production, 1436 test, 592 generated). Head SHA: fa98d8c. Verdict: Don't split. Enforcement: Advisory. Single vertical slice, one feature, one risk domain. 812 production lines across 7 files.

Contested and acknowledged

Round log

Round 1

Netero: no findings. Law: don't split (advisory). Panel: 22 triggered + 2 wildcards. 7 P3, 11 Nit. Reviewed against cf75dd0..fa98d8c.

Round 2

Churn guard: PROCEED. 18/18 findings addressed in ab89f44. Netero: no findings. Panel: 5 always-on + 2 triggered + 1 wildcard. 0 P3, 1 Nit new. All R1 fixes verified. Reviewed against cf75dd0..ab89f44.

About deep-review

CRF = Coder Review Finding (P0-P4, Nit, Note)

Reviewer	Focus
Bisky	tests
Chopper	ops/errors
Churn-guard	change verification
Ging	language modernization
Gon	naming
Hisoka	edge cases
Killua	perf
Kite	change integrity
Knov	contracts
Knuckle	SQL
Komugi	flake/determinism
Kurapika	security
Law	decomposition
Leorio	docs
Luffy	product
Mafu-san	process
Mafuuu	contracts
Melody	dispatch/pairing
Meruem	structural
Nami	frontend
Netero	mechanical checks
Pariston	premise testing
Pen-botter	product gaps
Razor	verification
Robin	duplication
Ryosuke	Go arch
Takumi	concurrency
Zoro	shape

🤖 Managed by Coder Agents.

coder-agents-review

Well-designed feature. The finalizer tool approach correctly reuses the existing agent loop machinery (exclusive tools, stop-after, step budget) rather than inventing new mechanisms. Layered security boundaries for schema validation (HTTP rejection, loader stripping, reserved name at three layers) are strong. Test density is 63.9% with genuine behavioral assertions at four levels (unit, internal decision logic, E2E, HTTP integration). The SchemaNormalizeRoundTrip test specifically pins a real integration risk.

7 P3, 11 Nit.

Observations worth noting but not filed as findings:

partsToMessageParts has no explicit case for ChatMessagePartTypeResponseFormat; the part is dropped by implicit switch fallthrough. Five reviewers noted this. It follows the pre-existing pattern for the skill part type and is tested by TestConvertMessagesWithFiles_SkipsResponseFormatPart. An explicit case with continue would document the intent and guard against a future default branch.
activeTurnResponseFormat and activeTurnStart use different visibility filters for "active turn user message." Four reviewers noted this. The first filters by isUserVisibleChatMessage; the second accepts any non-deleted, non-compressed user message. The inconsistency has no current trigger path (model-only user messages are always compressed) and is inherited from the pre-existing inline logic.
formatValidationErrors recursively walks list.Details with no output size bound. The 16 KiB schema cap limits the blast radius. A maximally adversarial schema could produce a large error string that consumes model context, degrading retry quality, but the practical impact is bounded.

"A reader encountering soAssistantMessage(t, 2, "done") would expect a structured-output-flavored message." (Gon)

coderd/x/chatd/chatprompt/chatprompt.go:1805

P3 [CRF-3] encodeNulInParts processes Text, Content, Args, ArgsDelta, Result, ResultDelta but does not touch ResponseFormat. The ChatResponseFormatJSONSchema.Description field is a free-form string from the API caller with no NUL validation. If it contains a NUL byte, json.Marshal encodes it as \u0000 and PostgreSQL rejects the jsonb insert.

"The consequence is a 500 error on a pathological API request. The caller gets no useful error message; the server log shows a PostgreSQL constraint violation." (Knuckle)

The Schema field (json.RawMessage) has the same gap. The Name field is safe because the regex rejects NUL. Fix: add ResponseFormat processing to encodeNulInParts/decodeNulInParts.

(Knuckle)

🤖

🤖 This review was automatically generated with Coder Agents.

- NUL-encode response-format part fields before jsonb persistence (CRF-3) - deep-copy the finalizer parameter map per Info call (CRF-1, CRF-2) - dedicated structured_output error kind with actionable message (CRF-4, CRF-5) - reuse codersdk.ValidationError instead of a parallel type (CRF-6) - include MCP server config ID in the shadow-drop warning (CRF-7) - test the finalizer description branch and Info copy semantics (CRF-8) - explicit response-format case in partsToMessageParts (review note) - test/comment/doc cleanups (CRF-9 through CRF-18)

ibetitsmike · 2026-07-02T11:19:33Z

/coder-agents-review

coder-agents-review

All 18 R1 findings addressed in ab89f44. Verified by 8 reviewers across both the fix code and the original feature. The fixes are proportional and address root causes: copyJSONObject for the shared mutable schemaMap, mapNulInResponseFormat with DB round-trip test for the NUL encoding gap, ChatErrorKindStructuredOutput with an actionable message, direct use of codersdk.ValidationError, and MCP server config ID in shadow warnings.

1 new Nit.

"I tried to build a case against this change and could not." (Pariston)

🤖 This review was automatically generated with Coder Agents.

coder-agents-review · 2026-07-02T11:51:12Z

+			format := jsonSchemaFormat(validSchema)
+			format.JSONSchema.Name = name
+			err := structuredoutput.Validate(format)
+			require.NotNil(t, err, "name %q should be rejected", name)


Nit [CRF-19] The rejected-name loop (lines 97-105), accepted-name loop, RootNotObject loop (lines 135-146), FragmentLocalRefs rejection loop (lines 159-166), and RunSchemaMismatch loop (lines 253-265) all iterate over multiple inputs without t.Run. Same pattern CRF-10 fixed in exp_chats_test.go. Each require call includes the input in its message, so diagnostics are adequate, but t.Run would give subtest isolation and go test -run addressability.

(Bisky)

🤖

Fixes three issues found in dogfood testing of server-validated structured output: - Exempt successful finalizer results from tool-result truncation. Truncation corrupted the canonical validated JSON while still persisting it as a success, breaking the recovery contract. Error results (validation failures) are still truncated. New chatloop ResultTruncationExempter interface, implemented by the finalizer. - Fail structured turns fast after 5 consecutive text-only assistant completions instead of regenerating up to the full 1200-step budget against a model or proxy that ignores required tool choice. Any tool call (including failed finalizer attempts) resets the streak. - Fix generated types for arbitrary-JSON chat fields: response_format schema, tool-call args, dynamic tool input_schema now generate as Record<string, unknown> (was Record<string, string>), tool result and dynamic tool output as unknown; swagger now renders them as objects (was array of integer) via swaggertype tags.

ibetitsmike added 2 commits July 1, 2026 22:45

refactor(coderd/x/chatd): simplify structured output implementation

fa98d8c

github-actions Bot assigned ibetitsmike Jul 2, 2026

coder-agents-review Bot reviewed Jul 2, 2026

View reviewed changes

coder-agents-review Bot approved these changes Jul 2, 2026

View reviewed changes

Uh oh!

Conversation

ibetitsmike commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

ibetitsmike commented Jul 2, 2026

Uh oh!

coder-agents-review Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Findings

Law analysis

Contested and acknowledged

Round log

Round 1

Round 2

Uh oh!

coder-agents-review Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ibetitsmike commented Jul 2, 2026

Uh oh!

coder-agents-review Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coder-agents-review Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ibetitsmike commented Jul 2, 2026 •

edited

Loading

coder-agents-review Bot commented Jul 2, 2026 •

edited

Loading