feat(coderd/x/chatd): preflight Anthropic and Bedrock PDF and request-size limits#26236
feat(coderd/x/chatd): preflight Anthropic and Bedrock PDF and request-size limits#26236ethanndickson wants to merge 14 commits into
Conversation
Own error classification inside chatpdf instead of rebuilding a ClassifiedError at the chatloop call site, relocate the PDF caps from chatprovider into chatpdf as unexported policy, and drop the weak /U and /O encryption markers in favor of scanning for /Encrypt only.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5a1c428e48
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Bedrock's InvokeModel transport limits requests to 20 MB, lower than Anthropic's 32 MB Messages API cap. Sharing the 32 MB cap let Bedrock PDF payloads between 20 and 32 MB pass preflight only to be rejected by the provider, which is the failure this check exists to prevent.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7120264098
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ubstring
IsEncryptedPDF scanned the whole file for the literal "/Encrypt", so an
otherwise valid, unlocked PDF whose text or metadata mentioned /Encrypt
(API specs, security docs) was rejected as password-protected before it
reached Claude.
Match the key-value shape instead: an /Encrypt entry followed by an
indirect reference ("12 0 R") or an inline dictionary ("<<"). Every
encrypted PDF carries this in its trailer or XRef-stream dictionary,
which is never compressed, so recall is preserved while incidental prose
no longer hard-fails a valid attachment.
…l request size Move the Anthropic/Bedrock PDF preflight out of its own chatpdf package and into chatsanitize, which already owns pre-request, provider-specific prompt hygiene (ApplyAnthropicProviderToolGuard runs on the immediately preceding line, at the same lifecycle stage). The public entry is now ValidatePromptLimits. Account for non-PDF bytes in the request-size cap. The previous check summed only base64 PDF bytes, so a request whose full body exceeded the provider limit (large prompt, chat history, tool content alongside a near-cap PDF) passed preflight and was still rejected by the provider. The walker now estimates every part's serialized size (text, reasoning, tool-call input, tool-result output, and every file part at base64 size) and enforces the cap after the full walk. The estimate omits JSON framing so it is a strict lower bound: any rejection here is a guaranteed true positive. The cap now applies to any oversized Anthropic/Bedrock request, not only those containing PDFs, since the limit is a provider transport property rather than a PDF property. Structural cleanup within the new package: fold the user-facing display name into the limits struct so limitsFor is the single per-provider switch, reuse the package's nil-safe tool-part casts in the size estimator so typed-nil parts cannot panic, and keep the preflight logic in one file. Because chatsanitize sits below chatprovider in the import graph, it cannot call NormalizeProvider/ProviderDisplayName without an import cycle. provider is already the canonical fantasy name here (it comes from opts.Model.Provider()), so limitsFor switches on the exact name and fails open for anything else, matching the existing ApplyAnthropicProviderToolGuard contract.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5e7cdb33c4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The file name survived from the deleted chatpreflight package. Within chatsanitize every file runs pre-request, so "preflight" names the lifecycle stage shared by the whole package rather than this file's topic. Name it after what it validates, prompt limits, matching the package's topic-file convention and the ValidatePromptLimits entry point.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9d3fef8b0b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e5cf4603a0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 96389747c3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
|
Codex Review: Didn't find any major issues. Breezy! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
…ialization Counting every file part at base64-encoded size broke the estimator's lower-bound invariant: fantasy's anthropic provider sends text files as plain-text document sources at raw size and drops unsupported file media types from the request entirely, so text-heavy prompts could be falsely rejected against the payload cap. Estimate file parts the way the provider serializes them instead. Also remove dead code: the always-true positive-cap guards in the validator, and the safeMessageToolCallPart/safeMessageToolResultPart wrappers, whose call sites now use safeMessagePart directly.
The chatfiles PDF helpers had exactly one consumer, the preflight validator, so the package boundary was not pulling its weight. Move isPDF and approxPDFPageCount into chatsanitize as unexported helpers and fold their unique test coverage into the limits test. Also tighten doc comments and merge the provider-gating subtests. No behavior change.
|
/coder-agents-review |
|
Chat: Review in progress | View chat deep-review v0.7.1 | Round 1 | Last posted: Round 1, 8 findings (1 P2, 4 P3, 3 Nit), COMMENT. Review Finding inventoryFindings
Contested and acknowledged(none) Round logRound 1Panel. Netero: 1 P3. Panel (Bisky, Hisoka, Mafu-san, Mafuuu, Pariston, Ging-Go, Chopper, Gon, Leorio, Robin, Ryosuke, Zoro, Meruem): 1 P2, 4 P3, 3 Nit, plus 1 Note dropped (scope). Reviewed against 7326480..2e2fe8f. About deep-reviewCRF = Coder Review Finding (P0-P4, Nit, Note)
|
There was a problem hiding this comment.
Well-designed preflight validation. The fail-open philosophy is consistent: no false rejections from unrecognized PDF structure, unknown providers, or unparseable pages. The safeMessagePart[T] generics refactoring is clean, and the iterative self-correction visible in the commit history (encryption check added/dropped, Bedrock cap corrected, file estimation fixed) shows genuine care. Test density is solid at ~1:1 with good boundary probes.
Severity breakdown: 1 P2, 4 P3, 3 Nit.
"But a few are wearing outfits they didn't earn." (Bisky, on the test that claims to cover every part type)
Notes from the panel, not posted as inline findings:
-
chatprompt/chatprompt.go:39-61still carries two concretesafeAsToolCallPart/safeAsToolResultPartduplicates that the new generic could replace. Import direction supports it (chatpromptalready importschatsanitize), but exporting the generic is an API decision outside this PR's scope. (Robin) -
Five call sites outside
chatsanitizeuse rawfantasy.AsMessagePartwithout typed-nil guards (chatd.go:6783,chatd.go:9270,chatdebug/summary.go:55,chatadvisor/handoff.go:138,154). Same panic class this PR just fixed within its own package. Low practical risk today (those paths handle fresh responses, not replayed history), but the class is documented now. (Mafuuu, Robin) -
The MiB vs MB gap (~1.55 MB for Anthropic, ~971 KB for Bedrock) is acknowledged in the Bedrock constant comment with sound fail-open reasoning. The Anthropic constant is silent on the same choice. (Hisoka)
-
Every aggregation test uses a single-message prompt via
promptWithParts. The struct-level accumulator makes cross-message bugs unlikely, but the multi-message walk path is unexercised. (Bisky, Hisoka) -
estimatePartBytesandchatloop/metrics.go:ContentPartSizeboth walkfantasy.Messageparts for different purposes (serialized vs raw size). Both must be updated if fantasy adds a new data-carrying part type, with no cross-reference between them. (Zoro)
🤖 This review was automatically generated with Coder Agents.
| // Every part type must contribute to the lower-bound estimate. Each | ||
| // part alone fits under the injected cap; two together exceed it. | ||
| caps := limits{requestPayloadBytes: 40, pageCap: 100} | ||
| parts := []fantasy.MessagePart{ |
There was a problem hiding this comment.
P2 [CRF-2] The test is named "Counts every part type toward the request estimate" but the parts slice omits ReasoningPart and ToolResultOutputContentMedia. Both branches in production code (estimatePartBytes line 203, toolResultOutputBytes line 235) have zero byte-counting coverage. Delete either branch and every test still passes.
"Reasoning text can dominate request size in extended conversations (Claude's chain-of-thought output runs to thousands of tokens). An undercounting regression here silently defeats the payload cap for reasoning-heavy prompts, which is exactly the class of request most likely to hit the 32 MiB limit." (Bisky)
The typed-nil test at line 176 exercises (*fantasy.ReasoningPart)(nil), which proves nil-safety but not size contribution.
Fix: add to the parts slice:
fantasy.ReasoningPart{Text: strings.Repeat("r", 30)},
fantasy.ToolResultPart{Output: fantasy.ToolResultOutputContentMedia{Data: strings.Repeat("m", 30)}},(Bisky P2, Chopper P3, Meruem P3)
🤖
| ) | ||
| } | ||
| return fmt.Sprintf( | ||
| "PDF attachments include about %d pages, but %s accepts at most %d pages for this model. Split the PDF and retry.", |
There was a problem hiding this comment.
P3 [CRF-1] When multiple small PDFs together exceed the cap (e.g., three 40-page PDFs with a 100-page cap), this branch returns "Split the PDF and retry." The user has no single large PDF to split. The single-file branch (line 160) correctly names the oversized file; this aggregate branch gives inapplicable guidance.
Something like "Remove some PDF attachments or reduce total pages and retry" would match the actual situation.
(Netero, Meruem)
🤖
| // estimate is a lower bound, so exceeding the cap guarantees rejection. | ||
| if v.estimatedRequestBytes > v.caps.requestPayloadBytes { | ||
| return v.reject( | ||
| fmt.Sprintf("This request is too large for %s's request size limit. Remove or shrink attachments and retry.", v.caps.displayName), |
There was a problem hiding this comment.
P3 [CRF-3] "Remove or shrink attachments" assumes the request is large because of user-uploaded files. In an extended agentic session, tool results (file reads, grep output) accumulate across the conversation; 320 tool results averaging 100 KB each reach 32 MB with zero user-uploaded files. The user sees "Remove or shrink attachments," sees none, and has no idea what to do.
"The PDF-specific errors are good. They name the file, state the violation, and prescribe an applicable action. This message should match that standard." (Leorio)
Consider: "Remove attachments or start a shorter conversation and retry," or detect whether the prompt actually contains file parts and branch the advice, the way pageCapMessage already branches on single-file vs. aggregate.
(Chopper P3, Leorio P3)
🤖
|
|
||
| // approxPDFPageCount counts /Type /Page markers in data. ok is false when none | ||
| // are found, so PDFs with unrecognized structure are not falsely rejected. | ||
| func approxPDFPageCount(data []byte) (count int, ok bool) { |
There was a problem hiding this comment.
P3 [CRF-4] The regex counts every physical /Type /Page marker in the file, including superseded objects from incremental saves. A 5-page PDF that went through 20 incremental-save cycles accumulates ~105 physical markers, tripping the 100-page cap despite having 5 logical pages. Enterprise documents cycling through approval workflows are realistic targets.
"The heuristic is acknowledged as approximate, and a full parser has its own risks. The question is whether the approx label is doing enough work when the overcount can be 20x." (Hisoka)
The fail-open design (zero matches returns ok=false, skipping the check) prevents one class of false positives, but the overcount class remains. One mitigation without a full parser: scope the regex scan to after the final startxref, which points to the live object graph.
(Hisoka)
🤖
| return fmt.Sprintf("PDF attachment %q", name) | ||
| } | ||
|
|
||
| // isPDF reports whether data begins with the standard PDF header. |
There was a problem hiding this comment.
Nit [CRF-5] This comment restates bytes.HasPrefix(data, pdfHeader). The same pattern appears on pageCapMessage (line 156), the limits struct (line 45), and reject (line 247). Per project standard: comments should be substantive, not restatements of the code they annotate.
(Gon)
🤖
|
|
||
| pdfDefaultPageCap = 100 | ||
| pdfLargeContextPageCap = 600 | ||
| pdfLargeContextMinToken = 200_000 |
There was a problem hiding this comment.
Nit [CRF-6] pdfLargeContextMinToken reads as an inclusive lower bound ("min"), but the check at line 64 uses strict >, so a 200k-token model does NOT qualify for the larger page cap. pdfLargeContextThreshold would make the boundary behavior unambiguous without needing the comment.
(Gon)
🤖
| // ValidatePromptLimits returns an already-classified configuration error, | ||
| // so a context wrap is enough; chaterror.Classify unwraps to find it. | ||
| if err = chatsanitize.ValidatePromptLimits(provider, opts.ContextLimitFallback, prompt); err != nil { | ||
| return canonical, nil, xerrors.Errorf("preflight prompt: %w", err) |
There was a problem hiding this comment.
Nit [CRF-7] The full .Error() chain reads: prepare prompt: preflight prompt: prompt preflight failed: reason=invalid_pdf .... Three layers, each a paraphrase of "the prompt preparation step." The structured Detail and Message fields are clean (chaterror.Classify recovers them), but if anyone logs err.Error(), they get three referrals before the actual diagnostic.
Consider "validate prompt limits: %w" (names the specific substep) or dropping the text entirely and relying on the stack frame.
(Leorio)
🤖
| require.Contains(t, err.Error(), "reason=invalid_pdf") | ||
| }) | ||
|
|
||
| t.Run("Gates on canonical provider names", func(t *testing.T) { |
There was a problem hiding this comment.
P3 [CRF-8] This test proves Bedrock triggers validation and uses the Bedrock provider name. No test verifies Bedrock's distinct 20 MiB payload cap vs. Anthropic's 32 MiB, or that user-facing messages say "AWS Bedrock." The validateWithCaps helper always hardcodes fantasyanthropic.Name (line 210), so walker-level tests never exercise Bedrock-specific behavior. If limitsFor gave Bedrock Anthropic's 32 MiB cap, every test would still pass.
Sketch: call ValidatePromptLimits(fantasybedrock.Name, 200_000, oversizedPrompt) where the prompt is between 20 MiB and 32 MiB, and assert rejection with "AWS Bedrock" in the classified message.
(Bisky)
🤖
Adds server-side preflight validation for Anthropic and Bedrock-hosted Claude requests so configuration problems surface as clear, non-retryable errors before the provider call instead of opaque provider-side rejections. Coder's per-upload size cap does not catch these cases because several base64-expanded PDFs can blow the aggregate request limits, and the file resolver de-duplicates file IDs so it cannot see repeated or inline attachments.
Validation runs at the prompt-preparation layer in
chatloop, after the provider-ready[]fantasy.Messageis assembled and immediately after the existing Anthropic tool-history guard. It lives inchatsanitize, which already owns pre-request, provider-specific prompt hygiene; the public entry point ischatsanitize.ValidatePromptLimits.The walker rejects:
Bedrock shares Anthropic's PDF page caps because fantasy's bedrock provider wraps the anthropic client, but uses the lower request payload cap to match its transport limit. For uncapped or unrecognized providers, validation is a no-op. Failures carry an actionable, user-facing message naming the offending attachment.
Because the walker can see arbitrary replayed history,
chatsanitize's nil-safe part casts are generalized intosafeMessagePartandsafeToolResultOutputgenerics that tolerate typed-nil pointer parts, whichfantasy.AsMessagePartandfantasy.AsToolResultOutputTypewould otherwise dereference and panic on. All existing casts in the package now go through this path.An earlier revision also rejected encrypted PDFs via an
/Encryptbyte heuristic; that check was dropped as too fragile, since reliably distinguishing a trailer/Encryptentry from incidental document content requires real PDF parsing, and a false positive would hard-fail a valid attachment.Closes CODAGT-544