feat: surface model content-filter refusals as a blocked chat error#26294
feat: surface model content-filter refusals as a blocked chat error#26294jscottmiller wants to merge 5 commits into
Conversation
When a provider blocks a turn with a content-filter stop reason and no
content (e.g. Anthropic's "refusal"), the turn previously ended silently
("Thinking" then nothing). Detect this in the chat loop and end the turn
as a terminal content_filter error carrying the provider's explanation,
so the UI shows a clear "Response blocked" message.
- codersdk: add ChatErrorKindContentFilter
- chatloop: return a classified ErrContentFiltered on empty
content-filtered turns, surfacing Anthropic refusal category/explanation
- chaterror: default message for the new kind
- site: render "Response blocked" title; regenerate types
- go.mod: point fantasy fork at the refusal mapping change
Coder Agents generated.
Run make gen to add the content_filter enum value to swagger and the generated API reference docs. Coder Agents generated.
Docs preview📖 View docs preview for |
… recording step state
|
Two known limitations of the empty-content check, noted during review:
Coder Agents generated (on behalf of @jscottmiller). |
|
/coder-agents-review |
|
Chat: Review in progress | View chat deep-review v0.7.1 | Round 2 | Last posted: Round 2, 11 findings (1 P2, 4 P3, 6 Nit), COMMENT. Review Finding inventoryFindings
Contested and acknowledgedCRF-5 (P3, chatloop.go:57) - Comment verbosity
CRF-6 (Nit, chaterror/message.go:116) - retryMessage missing content_filter case
Round logRound 1Panel. 1 P2, 2 P3, 4 Nit, 2 Note. Reviewed against a4c867f..2514264. Panel: Bisky, Hisoka, Mafu-san, Mafuuu, Pariston, Chopper, Ging-Go, Ging-TS, Gon, Leorio, Kite, Nami, Meruem (wildcard), Knov (wildcard). Ging-Go, Ging-TS, Mafu-san, Pariston, Kite: no findings. Round 2Panel. CRF-1 through CRF-4, CRF-7 through CRF-9 verified fixed. CRF-5 closed by panel (2/3). CRF-6 closed by panel (3/3). 1 new P3 (CRF-10). 1 Nit dropped (follows file convention). Reviewed against a4c867f..3343fac. Panel: Bisky, Mafuuu, Gon, Razor (wildcard). About deep-reviewCRF = Coder Review Finding (P0-P4, Nit, Note)
|
There was a problem hiding this comment.
Clean, well-scoped change that solves the right problem at the right layer. The error classification pipeline, the WithClassification/Classify round-trip, and the retry-loop exclusion all hold up under inspection. The sanitizer cannot create false positives (tool blocks require FinishReasonToolCalls, mutually exclusive with FinishReasonContentFilter), and the double classification through chatd.go is harmless because WithProvider short-circuits when the provider matches and Message is non-empty. Dependency management is explicit.
Pariston: "I tried to build a case against this change and could not find a material premise failure. The problem is correctly understood, the solution is proportional, and the fix is at the right causal level."
1 P2, 2 P3, 4 Nit. The P2 is an integration test gap: the only codepath that converts a silent empty turn into a visible error has no test through Run. The test infrastructure already supports this pattern (TestRun_* with chattest.FakeModel), and 50+ sibling tests exercise analogous paths. Five reviewers independently flagged it.
Notes (not posted inline): Refusal metadata extraction is Anthropic-only; Google and OpenAI content filters degrade to a generic message with no category or explanation. Partial content with a content-filter finish is silently persisted as normal, which the PR description acknowledges as intentional.
coderd/x/chatd/chaterror/message.go:116
Nit [CRF-6] retryMessage has no ChatErrorKindContentFilter case.
terminalMessage handles all eleven kinds including content_filter. retryMessage handles ten, with content_filter falling through to default ("returned an unexpected error"). Unreachable today because Retryable is false (zero value), but every sibling kind that has a terminalMessage case also has a retryMessage case (except usage_limit, which is a pre-existing gap). Adding a case keeps the two switches symmetric.
(Bisky, Mafuuu, Chopper, Leorio)
🤖
🤖 This review was automatically generated with Coder Agents.
Adds a Run-level integration test for the empty-turn content-filter branch, exact-string and Detail assertions in the unit tests, terminal message table coverage, an explicit Retryable: false, and a Storybook story for the Response blocked error state.
|
Addressed in 3343fac: CRF-4 (Run-level test Intentionally skipped: CRF-5 (the comments explain why the behavior matters, per repo comment guidance) and CRF-6 ( Coder Agents generated (on behalf of @jscottmiller). |
|
/coder-agents-review |
|
Correction to the note above: CRF-5 was subsequently addressed in 234d2ad (doc comment trims per the suggested wording). Only CRF-6 remains intentionally skipped. Coder Agents generated (on behalf of @jscottmiller). |
There was a problem hiding this comment.
All seven R1 fixes verified. The integration test (TestRun_ContentFilterEmptyTurn) properly exercises the Run branch end-to-end, confirms the error sentinel and classification fields, and checks that no step is persisted. Test assertions are now exact-string. The Storybook story has 5 assertions covering heading, message, detail, and terminal behavior. Good turnaround.
Contested findings disposed:
CRF-5 (comment verbosity): Panel closed, 2/3 accept. Mafuuu and Razor independently verified the comments describe behavioral contracts per repo guidance. Gon narrowed the re-raise to contentFilterError (line 68) but was overridden by the panel majority.
CRF-6 (retryMessage missing case): Panel closed, 3/3 accept. All three reviewers independently traced the 4-step call chain confirming the path is unreachable, and confirmed usage_limit establishes the same precedent.
Mafuuu: "Contract fidelity checks performed: lifecycle, semantic honesty, adversarial path. All aligned."
1 new P3. The complement boundary (non-empty content with a ContentFilter finish reason should persist normally) is explicitly called out in the PR description as intentional behavior but has no test.
🤖 This review was automatically generated with Coder Agents.
| // TestRun_ContentFilterEmptyTurn exercises the branch in Run that converts | ||
| // a content-filter finish with no content into a terminal classified error | ||
| // instead of a silent empty turn. Nothing is persisted for the blocked step. | ||
| func TestRun_ContentFilterEmptyTurn(t *testing.T) { |
There was a problem hiding this comment.
P3 [CRF-10] No test for the complement boundary: non-empty content with FinishReasonContentFilter should persist normally.
The fork at chatloop.go:601-607 is the core design decision: empty content + ContentFilter = error; non-empty content + ContentFilter = normal persist. Only the error half is tested. The complement is the behavior that protects prose refusals from being misclassified as blocks, explicitly called out in the PR description: "A normal text refusal is unaffected, since content is non-empty."
Sketch: copy TestRun_ContentFilterEmptyTurn, add a TextPart before the finish part, assert err == nil and the step is persisted.
(Bisky)
🤖
| }) | ||
| } | ||
|
|
||
| // TestRun_ContentFilterEmptyTurn exercises the branch in Run that converts |
There was a problem hiding this comment.
Nit [CRF-11] Test doc comment restates what the function name and body show.
The name TestRun_ContentFilterEmptyTurn already says what's tested. The body's assertions carry the invariants. The convention in chatloop_run_internal_test.go is to omit doc comments on test functions unless they carry a non-obvious trap. Consider deleting the comment.
(Gon)
🤖
Summary
When a provider blocks a turn with a content-filter stop reason and produces no
content, the chat turn previously ended silently: the user saw the "Thinking"
spinner and then nothing. This adds detection and surfacing so the turn ends as
a terminal
content_filtererror with the provider's explanation, which the UIrenders as a clear "Response blocked" message.
Observed with Anthropic's real-time safety classifiers (
stop_reason: "refusal",empty content), e.g. the Fable model on security-policy-violating prompts.
Changes
ChatErrorKindContentFilter(+AllChatErrorKinds, generated TS).FinishReasonContentFilterand nocontent, return a classified
ErrContentFilteredinstead of breakingsilently, surfacing the Anthropic refusal
category/explanation.Dependency
Depends on coder/fantasy#41. The
go.modreplace currently points at that PR'sbranch commit (
805f9dae1e20). Repoint it to the merged fantasy commit beforethis merges.
Testing
go test ./coderd/x/chatd/chatloop/ ./coderd/x/chatd/chaterror/ ./codersdk/(new
contentfilter_internal_test.go).develop.shinstance routed through thedev.coder.com AI gateway: a Fable refusal now ends the turn as
status=error,last_error.kind=content_filterwith Anthropic's explanationand
detail=cyber. A normal text refusal (e.g. Opus 4.8 declining in prose)is unaffected, since content is non-empty.
Decision log
returns HTTP 200 with
stop_reason: "refusal"+stop_detailsand emptycontent; the fantasy SDK normalized it to
FinishReasonUnknownand droppedthe details, and chatloop's empty-content branch discarded it.
ChatStatusError+ a newChatErrorKindto reusethe existing classified-error flow and frontend rendering, rather than a new
terminal status or message-part type.
explanation/categorythroughProviderMetadata) over a generic static message; the genericunknownsignal is ambiguous and risks false positives on benign empty completions.
FinishReasonContentFilterforrefusalrather than adding a newfinish reason.
Coder Agents generated.