feat: surface model content-filter refusals as a blocked chat error by jscottmiller · Pull Request #26294 · coder/coder

jscottmiller · 2026-06-11T19:31:44Z

Summary

When a provider blocks a turn with a content-filter stop reason and produces no
content, the chat turn previously ended silently: the user saw the "Thinking"
spinner and then nothing. This adds detection and surfacing so the turn ends as
a terminal content_filter error with the provider's explanation, which the UI
renders as a clear "Response blocked" message.

Observed with Anthropic's real-time safety classifiers (stop_reason: "refusal",
empty content), e.g. the Fable model on security-policy-violating prompts.

Changes

codersdk: add ChatErrorKindContentFilter (+ AllChatErrorKinds, generated TS).
chatloop: when a step finishes with FinishReasonContentFilter and no
content, return a classified ErrContentFiltered instead of breaking
silently, surfacing the Anthropic refusal category/explanation.
chaterror: default user-facing message for the new kind.
site: render "Response blocked" as the error title.
go.mod: point the fantasy fork at the refusal mapping change.

Dependency

Depends on coder/fantasy#41. The go.mod replace currently points at that PR's
branch commit (805f9dae1e20). Repoint it to the merged fantasy commit before
this merges.

Testing

go test ./coderd/x/chatd/chatloop/ ./coderd/x/chatd/chaterror/ ./codersdk/
(new contentfilter_internal_test.go).
End-to-end against a local develop.sh instance routed through the
dev.coder.com AI gateway: a Fable refusal now ends the turn as
status=error, last_error.kind=content_filter with Anthropic's explanation
and detail=cyber. A normal text refusal (e.g. Opus 4.8 declining in prose)
is unaffected, since content is non-empty.

Decision log

Reproduced the silent turn and confirmed via the chat debug-run API that Fable
returns HTTP 200 with stop_reason: "refusal" + stop_details and empty
content; the fantasy SDK normalized it to FinishReasonUnknown and dropped
the details, and chatloop's empty-content branch discarded it.
Chose to model the block as ChatStatusError + a new ChatErrorKind to reuse
the existing classified-error flow and frontend rendering, rather than a new
terminal status or message-part type.
Chose the precise path (carry Anthropic's explanation/category through
ProviderMetadata) over a generic static message; the generic unknown
signal is ambiguous and risks false positives on benign empty completions.
Reused FinishReasonContentFilter for refusal rather than adding a new
finish reason.

Coder Agents generated.

When a provider blocks a turn with a content-filter stop reason and no content (e.g. Anthropic's "refusal"), the turn previously ended silently ("Thinking" then nothing). Detect this in the chat loop and end the turn as a terminal content_filter error carrying the provider's explanation, so the UI shows a clear "Response blocked" message. - codersdk: add ChatErrorKindContentFilter - chatloop: return a classified ErrContentFiltered on empty content-filtered turns, surfacing Anthropic refusal category/explanation - chaterror: default message for the new kind - site: render "Response blocked" title; regenerate types - go.mod: point fantasy fork at the refusal mapping change Coder Agents generated.

Run make gen to add the content_filter enum value to swagger and the generated API reference docs. Coder Agents generated.

github-actions · 2026-06-11T19:45:17Z

Docs preview

📖 View docs preview for docs/reference/api/chats.md

… recording step state

jscottmiller · 2026-06-11T20:03:13Z

Two known limitations of the empty-content check, noted during review:

Blocked steps drop token usage. The turn ends before PersistStep, and usage is stored as columns on inserted assistant message rows, so a content-filter turn with no content has nowhere to record the refused step's input tokens (Anthropic bills input tokens on refusals). This matches every other terminal-error path today (stream errors, auth failures mid-turn). If usage accounting for failed turns matters, persisting usage-only steps needs its own design; follow-up issue candidate.
Partial content + content-filter finish still ends silently. The new check only fires when the step produced no content. A mid-stream classifier block after partial output persists the partial text and ends as a normal stop with no error. Surfacing it would mean returning the classified error after PersistStep succeeds, which would also change UX for OpenAI/Google content-filter truncations that currently end silently; that is a product decision, so it is intentionally out of scope here.

Coder Agents generated (on behalf of @jscottmiller).

jscottmiller · 2026-06-11T20:17:42Z

/coder-agents-review

coder-agents-review · 2026-06-11T20:17:46Z

Chat: Review in progress | View chat
Requested: 2026-06-11 21:37 UTC by @jscottmiller
Spend: $33.81 / $100.00

deep-review v0.7.1 | Round 2 | a4c867f..3343fac

Last posted: Round 2, 11 findings (1 P2, 4 P3, 6 Nit), COMMENT. Review

Finding inventory

Findings

#	Sev	Status	Location	Summary	Round	Reviewer	Posted
CRF-1	P3	Author fixed (`3343fac`)	contentfilter_internal_test.go:41	WithoutMetadataUsesDefault only asserts non-empty Message, tautological for content_filter branch	R1	Netero, Bisky P3	Yes
CRF-2	P3	Author fixed (`3343fac`)	contentfilter_internal_test.go:29	WithRefusalMetadata does not assert classified.Detail	R1	Netero, Bisky P3	Yes
CRF-3	Nit	Author fixed (`3343fac`)	chaterror/message.go:67	TestTerminalMessage table test missing content_filter case	R1	Netero	Yes
CRF-4	P2	Author fixed (`3343fac`)	contentfilter_internal_test.go:13	No integration test exercises the content-filter branch in Run	R1	Bisky P2, Hisoka P3, Mafuuu P3, Chopper P3, Meruem P3	Yes
CRF-5	P3	Author contested; panel closed R2 (2/3 accept)	chatloop.go:57	Comment verbosity pattern: doc and inline comments restate what code shows	R1	Gon P2 (downgraded by orchestrator)	Yes
CRF-6	Nit	Author contested; panel closed R2 (3/3 accept)	chaterror/message.go:116	retryMessage has no content_filter case	R1	Bisky, Mafuuu, Chopper, Leorio	Yes
CRF-7	Nit	Author fixed (`3343fac`)	chatStatusHelpers.ts:49	No Storybook story for content_filter error state	R1	Nami	Yes
CRF-8	Nit	Author fixed (`3343fac`)	chatloop.go:72	contentFilterError omits explicit Retryable: false	R1	Knov	Yes
CRF-9	Nit	Author fixed (`3343fac`)	contentfilter_internal_test.go:30	Test errorf says "want explanation" instead of expected value	R1	Gon	Yes
CRF-10	P3	Open	contentfilter_internal_test.go:57	No test for complement boundary: non-empty content with ContentFilter should persist normally	R2	Bisky P3	Yes
CRF-11	Nit	Open	contentfilter_internal_test.go:54	Test doc comment restates what function name and body show	R2	Gon P2 (downgraded by orchestrator)	Yes

Contested and acknowledged

CRF-5 (P3, chatloop.go:57) - Comment verbosity

Finding: Doc and inline comments on new code (ErrContentFiltered, contentFilterError, inline at line 601) restate what the code already shows. Suggested trimming to one-line comments.
Author defense: Comments explain why the behavior matters, per repo comment guidance.
Panel closure (R2, 2/3): Mafuuu traced each comment to the repo rule ("Describe the behaviour of the code, not the reasoning the agent used to produce the change") and confirmed all three follow it. Razor verified all three describe behavioral contracts or design intent, not code mechanics. Gon narrowed the re-raise to line 68 specifically but was overridden by the majority.

CRF-6 (Nit, chaterror/message.go:116) - retryMessage missing content_filter case

Finding: retryMessage has no ChatErrorKindContentFilter case, unlike every other kind in terminalMessage.
Author defense: The case is unreachable because content_filter errors are never retryable. usage_limit already breaks the switch symmetry.
Panel closure (R2, 3/3): Bisky verified the unreachability and usage_limit precedent. Mafuuu traced the 4-step call chain from contentFilterError through chatretry.Retry confirming the path cannot execute. Razor independently verified the same chain with step-by-step evidence.

Round log

Round 1

Panel. 1 P2, 2 P3, 4 Nit, 2 Note. Reviewed against a4c867f..2514264. Panel: Bisky, Hisoka, Mafu-san, Mafuuu, Pariston, Chopper, Ging-Go, Ging-TS, Gon, Leorio, Kite, Nami, Meruem (wildcard), Knov (wildcard). Ging-Go, Ging-TS, Mafu-san, Pariston, Kite: no findings.

Round 2

Panel. CRF-1 through CRF-4, CRF-7 through CRF-9 verified fixed. CRF-5 closed by panel (2/3). CRF-6 closed by panel (3/3). 1 new P3 (CRF-10). 1 Nit dropped (follows file convention). Reviewed against a4c867f..3343fac. Panel: Bisky, Mafuuu, Gon, Razor (wildcard).

About deep-review

CRF = Coder Review Finding (P0-P4, Nit, Note)

Reviewer	Focus
Bisky	tests
Chopper	ops/errors
Churn-guard	change verification
Ging	language modernization
Gon	naming
Hisoka	edge cases
Killua	perf
Kite	change integrity
Knov	contracts
Knuckle	SQL
Kurapika	security
Law	decomposition
Leorio	docs
Luffy	product
Mafu-san	process
Mafuuu	contracts
Melody	dispatch/pairing
Meruem	structural
Nami	frontend
Netero	mechanical checks
Pariston	premise testing
Pen-botter	product gaps
Razor	verification
Robin	duplication
Ryosuke	Go arch
Takumi	concurrency
Zoro	shape

🤖 Managed by Coder Agents.

coder-agents-review

Clean, well-scoped change that solves the right problem at the right layer. The error classification pipeline, the WithClassification/Classify round-trip, and the retry-loop exclusion all hold up under inspection. The sanitizer cannot create false positives (tool blocks require FinishReasonToolCalls, mutually exclusive with FinishReasonContentFilter), and the double classification through chatd.go is harmless because WithProvider short-circuits when the provider matches and Message is non-empty. Dependency management is explicit.

Pariston: "I tried to build a case against this change and could not find a material premise failure. The problem is correctly understood, the solution is proportional, and the fix is at the right causal level."

1 P2, 2 P3, 4 Nit. The P2 is an integration test gap: the only codepath that converts a silent empty turn into a visible error has no test through Run. The test infrastructure already supports this pattern (TestRun_* with chattest.FakeModel), and 50+ sibling tests exercise analogous paths. Five reviewers independently flagged it.

Notes (not posted inline): Refusal metadata extraction is Anthropic-only; Google and OpenAI content filters degrade to a generic message with no category or explanation. Partial content with a content-filter finish is silently persisted as normal, which the PR description acknowledges as intentional.

coderd/x/chatd/chaterror/message.go:116

Nit [CRF-6] retryMessage has no ChatErrorKindContentFilter case.

terminalMessage handles all eleven kinds including content_filter. retryMessage handles ten, with content_filter falling through to default ("returned an unexpected error"). Unreachable today because Retryable is false (zero value), but every sibling kind that has a terminalMessage case also has a retryMessage case (except usage_limit, which is a pre-existing gap). Adding a case keeps the two switches symmetric.

(Bisky, Mafuuu, Chopper, Leorio)

🤖

🤖 This review was automatically generated with Coder Agents.

linear-code · 2026-06-11T20:52:00Z

CODAGT-611

Adds a Run-level integration test for the empty-turn content-filter branch, exact-string and Detail assertions in the unit tests, terminal message table coverage, an explicit Retryable: false, and a Storybook story for the Response blocked error state.

jscottmiller · 2026-06-11T21:37:35Z

Addressed in 3343fac: CRF-4 (Run-level test TestRun_ContentFilterEmptyTurn, also asserts no step persists), CRF-1/CRF-2/CRF-9 (exact-string, Detail, and failure-output assertions), CRF-3 (terminal message table entries), CRF-7 (TerminalContentFilterError story, 16/16 passing), CRF-8 (explicit Retryable: false).

Intentionally skipped: CRF-5 (the comments explain why the behavior matters, per repo comment guidance) and CRF-6 (retryMessage case is unreachable; the error never enters the retry path, and usage_limit already breaks the switch symmetry).

Coder Agents generated (on behalf of @jscottmiller).

jscottmiller · 2026-06-11T21:37:37Z

/coder-agents-review

jscottmiller · 2026-06-11T21:43:09Z

Correction to the note above: CRF-5 was subsequently addressed in 234d2ad (doc comment trims per the suggested wording). Only CRF-6 remains intentionally skipped.

Coder Agents generated (on behalf of @jscottmiller).

coder-agents-review

All seven R1 fixes verified. The integration test (TestRun_ContentFilterEmptyTurn) properly exercises the Run branch end-to-end, confirms the error sentinel and classification fields, and checks that no step is persisted. Test assertions are now exact-string. The Storybook story has 5 assertions covering heading, message, detail, and terminal behavior. Good turnaround.

Contested findings disposed:

CRF-5 (comment verbosity): Panel closed, 2/3 accept. Mafuuu and Razor independently verified the comments describe behavioral contracts per repo guidance. Gon narrowed the re-raise to contentFilterError (line 68) but was overridden by the panel majority.

CRF-6 (retryMessage missing case): Panel closed, 3/3 accept. All three reviewers independently traced the 4-step call chain confirming the path is unreachable, and confirmed usage_limit establishes the same precedent.

Mafuuu: "Contract fidelity checks performed: lifecycle, semantic honesty, adversarial path. All aligned."

1 new P3. The complement boundary (non-empty content with a ContentFilter finish reason should persist normally) is explicitly called out in the PR description as intentional behavior but has no test.

🤖 This review was automatically generated with Coder Agents.

coder-agents-review · 2026-06-11T22:01:31Z

+// TestRun_ContentFilterEmptyTurn exercises the branch in Run that converts
+// a content-filter finish with no content into a terminal classified error
+// instead of a silent empty turn. Nothing is persisted for the blocked step.
+func TestRun_ContentFilterEmptyTurn(t *testing.T) {


P3 [CRF-10] No test for the complement boundary: non-empty content with FinishReasonContentFilter should persist normally.

The fork at chatloop.go:601-607 is the core design decision: empty content + ContentFilter = error; non-empty content + ContentFilter = normal persist. Only the error half is tested. The complement is the behavior that protects prose refusals from being misclassified as blocks, explicitly called out in the PR description: "A normal text refusal is unaffected, since content is non-empty."

Sketch: copy TestRun_ContentFilterEmptyTurn, add a TextPart before the finish part, assert err == nil and the step is persisted.

(Bisky)

🤖

coder-agents-review · 2026-06-11T22:01:31Z

+	})
+}
+
+// TestRun_ContentFilterEmptyTurn exercises the branch in Run that converts


Nit [CRF-11] Test doc comment restates what the function name and body show.

The name TestRun_ContentFilterEmptyTurn already says what's tested. The body's assertions carry the invariants. The convention in chatloop_run_internal_test.go is to omit doc comments on test functions unless they carry a non-obvious trap. Consider deleting the comment.

(Gon)

🤖

github-actions Bot assigned jscottmiller Jun 11, 2026

chore: regenerate API docs for ChatErrorKindContentFilter

26fbf8d

Run make gen to add the content_filter enum value to swagger and the generated API reference docs. Coder Agents generated.

refactor(coderd/x/chatd/chatloop): check content-filter finish before…

2514264

… recording step state

coder-agents-review Bot reviewed Jun 11, 2026

View reviewed changes

docs(coderd/x/chatd/chatloop): trim content-filter doc comments

234d2ad

coder-agents-review Bot reviewed Jun 11, 2026

View reviewed changes

Conversation

jscottmiller commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Dependency

Testing

Uh oh!

github-actions Bot commented Jun 11, 2026

Docs preview

Uh oh!

jscottmiller commented Jun 11, 2026

Uh oh!

jscottmiller commented Jun 11, 2026

Uh oh!

coder-agents-review Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Findings

Contested and acknowledged

CRF-5 (P3, chatloop.go:57) - Comment verbosity

CRF-6 (Nit, chaterror/message.go:116) - retryMessage missing content_filter case

Round log

Round 1

Round 2

Uh oh!

coder-agents-review Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

linear-code Bot commented Jun 11, 2026

Uh oh!

jscottmiller commented Jun 11, 2026

Uh oh!

jscottmiller commented Jun 11, 2026

Uh oh!

jscottmiller commented Jun 11, 2026

Uh oh!

coder-agents-review Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coder-agents-review Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

coder-agents-review Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jscottmiller commented Jun 11, 2026 •

edited

Loading

coder-agents-review Bot commented Jun 11, 2026 •

edited

Loading