feat: add server-validated structured output for chat turns#26954
feat: add server-validated structured output for chat turns#26954ibetitsmike wants to merge 4 commits into
Conversation
Callers can request a JSON-schema-shaped final output for an assistant turn via response_format on CreateChatRequest/CreateChatMessageRequest. chatd keeps the normal agent loop but injects a server-owned coder_structured_output finalizer tool (builtin, exclusive, stop-after) and sets required tool choice on every structured step. The turn only succeeds once the finalizer validates the output against the caller schema; step-budget exhaustion fails terminally instead of succeeding. Omitting the field preserves current behavior exactly.
|
/coder-agents-review |
|
Chat: Review posted | View chat Review historydeep-review v0.9.0 | Round 2 | Last posted: Round 2, 19 findings (7 P3, 12 Nit), APPROVE. Review Finding inventoryFindings
Law analysisEffective LOC: 2248 (812 production, 1436 test, 592 generated). Head SHA: fa98d8c. Verdict: Don't split. Enforcement: Advisory. Single vertical slice, one feature, one risk domain. 812 production lines across 7 files. Contested and acknowledgedRound logRound 1Netero: no findings. Law: don't split (advisory). Panel: 22 triggered + 2 wildcards. 7 P3, 11 Nit. Reviewed against cf75dd0..fa98d8c. Round 2Churn guard: PROCEED. 18/18 findings addressed in ab89f44. Netero: no findings. Panel: 5 always-on + 2 triggered + 1 wildcard. 0 P3, 1 Nit new. All R1 fixes verified. Reviewed against cf75dd0..ab89f44. About deep-reviewCRF = Coder Review Finding (P0-P4, Nit, Note)
|
There was a problem hiding this comment.
Well-designed feature. The finalizer tool approach correctly reuses the existing agent loop machinery (exclusive tools, stop-after, step budget) rather than inventing new mechanisms. Layered security boundaries for schema validation (HTTP rejection, loader stripping, reserved name at three layers) are strong. Test density is 63.9% with genuine behavioral assertions at four levels (unit, internal decision logic, E2E, HTTP integration). The SchemaNormalizeRoundTrip test specifically pins a real integration risk.
7 P3, 11 Nit.
Observations worth noting but not filed as findings:
-
partsToMessagePartshas no explicitcaseforChatMessagePartTypeResponseFormat; the part is dropped by implicit switch fallthrough. Five reviewers noted this. It follows the pre-existing pattern for theskillpart type and is tested byTestConvertMessagesWithFiles_SkipsResponseFormatPart. An explicitcasewithcontinuewould document the intent and guard against a futuredefaultbranch. -
activeTurnResponseFormatandactiveTurnStartuse different visibility filters for "active turn user message." Four reviewers noted this. The first filters byisUserVisibleChatMessage; the second accepts any non-deleted, non-compressed user message. The inconsistency has no current trigger path (model-only user messages are always compressed) and is inherited from the pre-existing inline logic. -
formatValidationErrorsrecursively walkslist.Detailswith no output size bound. The 16 KiB schema cap limits the blast radius. A maximally adversarial schema could produce a large error string that consumes model context, degrading retry quality, but the practical impact is bounded.
"A reader encountering
soAssistantMessage(t, 2, "done")would expect a structured-output-flavored message." (Gon)
coderd/x/chatd/chatprompt/chatprompt.go:1805
P3 [CRF-3] encodeNulInParts processes Text, Content, Args, ArgsDelta, Result, ResultDelta but does not touch ResponseFormat. The ChatResponseFormatJSONSchema.Description field is a free-form string from the API caller with no NUL validation. If it contains a NUL byte, json.Marshal encodes it as \u0000 and PostgreSQL rejects the jsonb insert.
"The consequence is a 500 error on a pathological API request. The caller gets no useful error message; the server log shows a PostgreSQL constraint violation." (Knuckle)
The Schema field (json.RawMessage) has the same gap. The Name field is safe because the regex rejects NUL. Fix: add ResponseFormat processing to encodeNulInParts/decodeNulInParts.
(Knuckle)
🤖
🤖 This review was automatically generated with Coder Agents.
- NUL-encode response-format part fields before jsonb persistence (CRF-3) - deep-copy the finalizer parameter map per Info call (CRF-1, CRF-2) - dedicated structured_output error kind with actionable message (CRF-4, CRF-5) - reuse codersdk.ValidationError instead of a parallel type (CRF-6) - include MCP server config ID in the shadow-drop warning (CRF-7) - test the finalizer description branch and Info copy semantics (CRF-8) - explicit response-format case in partsToMessageParts (review note) - test/comment/doc cleanups (CRF-9 through CRF-18)
|
/coder-agents-review |
There was a problem hiding this comment.
All 18 R1 findings addressed in ab89f44. Verified by 8 reviewers across both the fix code and the original feature. The fixes are proportional and address root causes: copyJSONObject for the shared mutable schemaMap, mapNulInResponseFormat with DB round-trip test for the NUL encoding gap, ChatErrorKindStructuredOutput with an actionable message, direct use of codersdk.ValidationError, and MCP server config ID in shadow warnings.
1 new Nit.
"I tried to build a case against this change and could not." (Pariston)
🤖 This review was automatically generated with Coder Agents.
| format := jsonSchemaFormat(validSchema) | ||
| format.JSONSchema.Name = name | ||
| err := structuredoutput.Validate(format) | ||
| require.NotNil(t, err, "name %q should be rejected", name) |
There was a problem hiding this comment.
Nit [CRF-19] The rejected-name loop (lines 97-105), accepted-name loop, RootNotObject loop (lines 135-146), FragmentLocalRefs rejection loop (lines 159-166), and RunSchemaMismatch loop (lines 253-265) all iterate over multiple inputs without t.Run. Same pattern CRF-10 fixed in exp_chats_test.go. Each require call includes the input in its message, so diagnostics are adequate, but t.Run would give subtest isolation and go test -run addressability.
(Bisky)
🤖
Fixes three issues found in dogfood testing of server-validated structured output: - Exempt successful finalizer results from tool-result truncation. Truncation corrupted the canonical validated JSON while still persisting it as a success, breaking the recovery contract. Error results (validation failures) are still truncated. New chatloop ResultTruncationExempter interface, implemented by the finalizer. - Fail structured turns fast after 5 consecutive text-only assistant completions instead of regenerating up to the full 1200-step budget against a model or proxy that ignores required tool choice. Any tool call (including failed finalizer attempts) resets the streak. - Fix generated types for arbitrary-JSON chat fields: response_format schema, tool-call args, dynamic tool input_schema now generate as Record<string, unknown> (was Record<string, string>), tool result and dynamic tool output as unknown; swagger now renders them as objects (was array of integer) via swaggertype tags.
Adds first-class API and chatd support for requesting a server-validated structured final output from an agent turn, so SDK clients can get a typed result without defining their own final-output tool.
Summary
Callers set
response_format(typejson_schema) onCreateChatRequestorCreateChatMessageRequest. The HTTP layer validates the schema (16 KiB cap, compilable, root object, fragment-local$refonly,strict=falserejected) and persists it as a server-createdresponse-formatmessage part on the user turn, so it survives queueing, requires_action resume, and compaction. Omitting the field preserves current behavior exactly; plan-mode turns reject it because plan mode has its own stop-after completion semantics.chatd keeps the normal agent loop (builtin/MCP/provider/dynamic tools, subagents, compaction, retries) and injects a server-owned
coder_structured_outputfinalizer tool: builtin (name protected against dynamic and MCP collisions), exclusive (must be called alone), and stop-after (a successful result finishes the turn). Every structured step setsfantasy.ToolChoiceRequired, plain-text steps regenerate instead of completing (bounded: 5 consecutive text-only completions fail the turn fast instead of storming the provider), validation failures return retryable tool errors, and step-budget exhaustion fails terminally withstructured_output_not_producedinstead of succeeding.The result contract for clients: the success signal is a non-error
tool-resultpart withtool_name == "coder_structured_output"whoseresultcarries the canonical validated JSON. Successful finalizer results are exempt from chatloop's tool-result truncation so the canonical payload is never corrupted (validation-error results still truncate).ExtractStructuredOutputValuepins this contract in tests, and the finalizer result never re-enters the provider prompt, so the existing chain-mode safeguard covers follow-up turns.Generated client types now represent arbitrary-JSON chat fields correctly:
schema, tool-callargs, andinput_schemagenerate asRecord<string, unknown>, toolresultand dynamic tooloutputasunknown, and the OpenAPI spec renders them as objects.No DB migration;
github.com/kaptinlin/jsonschemapromoted from indirect to direct dependency for schema compilation. Architecture documented incoderd/x/chatd/ARCHITECTURE.md.