Skip to content

feat(coderd/x/chatd): preflight Anthropic and Bedrock PDF and request-size limits#26236

Open
ethanndickson wants to merge 14 commits into
mainfrom
pdf-anthropic-preflight-validation
Open

feat(coderd/x/chatd): preflight Anthropic and Bedrock PDF and request-size limits#26236
ethanndickson wants to merge 14 commits into
mainfrom
pdf-anthropic-preflight-validation

Conversation

@ethanndickson

@ethanndickson ethanndickson commented Jun 11, 2026

Copy link
Copy Markdown
Member

Adds server-side preflight validation for Anthropic and Bedrock-hosted Claude requests so configuration problems surface as clear, non-retryable errors before the provider call instead of opaque provider-side rejections. Coder's per-upload size cap does not catch these cases because several base64-expanded PDFs can blow the aggregate request limits, and the file resolver de-duplicates file IDs so it cannot see repeated or inline attachments.

Validation runs at the prompt-preparation layer in chatloop, after the provider-ready []fantasy.Message is assembled and immediately after the existing Anthropic tool-history guard. It lives in chatsanitize, which already owns pre-request, provider-specific prompt hygiene; the public entry point is chatsanitize.ValidatePromptLimits.

The walker rejects:

  • PDF attachments that are not valid PDFs, or whose aggregate page count exceeds the per-model cap (100 pages, or 600 for models with more than a 200k-token context). Page counts are estimated with a fast byte-level heuristic, not a full PDF parser.
  • Requests whose estimated body size exceeds the provider's documented request limit: Anthropic's 32 MB Messages API limit, or the lower 20 MB limit Anthropic documents for Bedrock-hosted requests. The estimate sums every message part (text, reasoning, tool-call input, and tool-result output including error text) and counts file parts the way fantasy's anthropic provider serializes them: images and PDFs at base64-encoded size, text files at raw size, and unsupported file media types at zero because the provider drops them from the request. JSON framing is omitted, so the estimate is a strict lower bound and any rejection is a guaranteed true positive. This cap applies to any oversized request, not only those containing PDFs.

Bedrock shares Anthropic's PDF page caps because fantasy's bedrock provider wraps the anthropic client, but uses the lower request payload cap to match its transport limit. For uncapped or unrecognized providers, validation is a no-op. Failures carry an actionable, user-facing message naming the offending attachment.

Because the walker can see arbitrary replayed history, chatsanitize's nil-safe part casts are generalized into safeMessagePart and safeToolResultOutput generics that tolerate typed-nil pointer parts, which fantasy.AsMessagePart and fantasy.AsToolResultOutputType would otherwise dereference and panic on. All existing casts in the package now go through this path.

An earlier revision also rejected encrypted PDFs via an /Encrypt byte heuristic; that check was dropped as too fragile, since reliably distinguishing a trailer /Encrypt entry from incidental document content requires real PDF parsing, and a false positive would hard-fail a valid attachment.

Closes CODAGT-544

Own error classification inside chatpdf instead of rebuilding a
ClassifiedError at the chatloop call site, relocate the PDF caps from
chatprovider into chatpdf as unexported policy, and drop the weak /U and
/O encryption markers in favor of scanning for /Encrypt only.
@linear-code

linear-code Bot commented Jun 11, 2026

Copy link
Copy Markdown

CODAGT-544

@ethanndickson ethanndickson marked this pull request as ready for review June 11, 2026 06:22

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5a1c428e48

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread coderd/x/chatd/chatpdf/pdf.go Outdated
Bedrock's InvokeModel transport limits requests to 20 MB, lower than
Anthropic's 32 MB Messages API cap. Sharing the 32 MB cap let Bedrock
PDF payloads between 20 and 32 MB pass preflight only to be rejected by
the provider, which is the failure this check exists to prevent.
@ethanndickson

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7120264098

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread coderd/x/chatd/chatpdf/pdf.go Outdated
Comment thread coderd/x/chatfiles/pdf.go Outdated
…ubstring

IsEncryptedPDF scanned the whole file for the literal "/Encrypt", so an
otherwise valid, unlocked PDF whose text or metadata mentioned /Encrypt
(API specs, security docs) was rejected as password-protected before it
reached Claude.

Match the key-value shape instead: an /Encrypt entry followed by an
indirect reference ("12 0 R") or an inline dictionary ("<<"). Every
encrypted PDF carries this in its trailer or XRef-stream dictionary,
which is never compressed, so recall is preserved while incidental prose
no longer hard-fails a valid attachment.
…l request size

Move the Anthropic/Bedrock PDF preflight out of its own chatpdf package
and into chatsanitize, which already owns pre-request, provider-specific
prompt hygiene (ApplyAnthropicProviderToolGuard runs on the immediately
preceding line, at the same lifecycle stage). The public entry is now
ValidatePromptLimits.

Account for non-PDF bytes in the request-size cap. The previous check
summed only base64 PDF bytes, so a request whose full body exceeded the
provider limit (large prompt, chat history, tool content alongside a
near-cap PDF) passed preflight and was still rejected by the provider.
The walker now estimates every part's serialized size (text, reasoning,
tool-call input, tool-result output, and every file part at base64 size)
and enforces the cap after the full walk. The estimate omits JSON framing
so it is a strict lower bound: any rejection here is a guaranteed true
positive. The cap now applies to any oversized Anthropic/Bedrock request,
not only those containing PDFs, since the limit is a provider transport
property rather than a PDF property.

Structural cleanup within the new package: fold the user-facing display
name into the limits struct so limitsFor is the single per-provider
switch, reuse the package's nil-safe tool-part casts in the size
estimator so typed-nil parts cannot panic, and keep the preflight logic
in one file.

Because chatsanitize sits below chatprovider in the import graph, it
cannot call NormalizeProvider/ProviderDisplayName without an import
cycle. provider is already the canonical fantasy name here (it comes from
opts.Model.Provider()), so limitsFor switches on the exact name and fails
open for anything else, matching the existing ApplyAnthropicProviderToolGuard
contract.
@ethanndickson

Copy link
Copy Markdown
Member Author

@codex review

@ethanndickson ethanndickson changed the title feat(coderd/x): preflight Anthropic PDF attachment limits feat(coderd/x): preflight Anthropic and Bedrock PDF and request-size limits Jun 11, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5e7cdb33c4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread coderd/x/chatd/chatsanitize/limits.go
The file name survived from the deleted chatpreflight package. Within
chatsanitize every file runs pre-request, so "preflight" names the
lifecycle stage shared by the whole package rather than this file's
topic. Name it after what it validates, prompt limits, matching the
package's topic-file convention and the ValidatePromptLimits entry point.
@ethanndickson

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9d3fef8b0b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread coderd/x/chatd/chatsanitize/limits.go Outdated
Comment thread coderd/x/chatd/chatsanitize/limits.go Outdated
@ethanndickson

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e5cf4603a0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread coderd/x/chatd/chatsanitize/limits.go
@ethanndickson

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 96389747c3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread coderd/x/chatfiles/pdf.go Outdated
@ethanndickson

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Breezy!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…ialization

Counting every file part at base64-encoded size broke the estimator's
lower-bound invariant: fantasy's anthropic provider sends text files as
plain-text document sources at raw size and drops unsupported file media
types from the request entirely, so text-heavy prompts could be falsely
rejected against the payload cap. Estimate file parts the way the
provider serializes them instead.

Also remove dead code: the always-true positive-cap guards in the
validator, and the safeMessageToolCallPart/safeMessageToolResultPart
wrappers, whose call sites now use safeMessagePart directly.
The chatfiles PDF helpers had exactly one consumer, the preflight
validator, so the package boundary was not pulling its weight. Move
isPDF and approxPDFPageCount into chatsanitize as unexported helpers
and fold their unique test coverage into the limits test. Also tighten
doc comments and merge the provider-gating subtests. No behavior
change.
@ethanndickson ethanndickson changed the title feat(coderd/x): preflight Anthropic and Bedrock PDF and request-size limits feat(coderd/x/chatd): preflight Anthropic and Bedrock PDF and request-size limits Jun 11, 2026
@ethanndickson

Copy link
Copy Markdown
Member Author

/coder-agents-review

@coder-agents-review

coder-agents-review Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Chat: Review in progress | View chat
Requested: 2026-06-12 02:20 UTC by @ethanndickson

deep-review v0.7.1 | Round 1 | 7326480..2e2fe8f

Last posted: Round 1, 8 findings (1 P2, 4 P3, 3 Nit), COMMENT. Review

Finding inventory

Findings

# Sev Status Location Summary Round Reviewer Posted
CRF-1 P3 Open limits.go:166 Aggregate page-cap message advises "Split the PDF" for multi-file aggregate R1 Netero, Meruem Nit Yes
CRF-2 P2 Open limits_internal_test.go:117 "Counts every part type" omits ReasoningPart and ToolResultOutputContentMedia R1 Bisky P2, Chopper P3, Meruem P3 Yes
CRF-3 P3 Open limits.go:126 Payload cap message says "attachments" when bloat may be tool-result history R1 Chopper P3, Leorio P3 Yes
CRF-4 P3 Open limits.go:185 approxPDFPageCount overcounts incrementally-saved PDFs (superseded objects) R1 Hisoka Yes
CRF-5 Nit Open limits.go:178 Redundant doc comments restate code (isPDF, pageCapMessage, limits struct, reject) R1 Gon Yes
CRF-6 Nit Open limits.go:35 pdfLargeContextMinToken name implies >= but check uses strict > R1 Gon Yes
CRF-7 Nit Open chatloop.go:783 Triple-wrapping error context in error chain R1 Leorio Yes
CRF-8 P3 Open limits_internal_test.go:37 Bedrock's 20 MiB cap and display name untested R1 Bisky Yes
CRF-9 Note Dropped by orchestrator (scope creep: chatprompt outside PR scope) chatprompt.go:39-61 chatprompt duplicate safe-cast functions not consolidated R1 Robin No

Contested and acknowledged

(none)

Round log

Round 1

Panel. Netero: 1 P3. Panel (Bisky, Hisoka, Mafu-san, Mafuuu, Pariston, Ging-Go, Chopper, Gon, Leorio, Robin, Ryosuke, Zoro, Meruem): 1 P2, 4 P3, 3 Nit, plus 1 Note dropped (scope). Reviewed against 7326480..2e2fe8f.

About deep-review

CRF = Coder Review Finding (P0-P4, Nit, Note)

Reviewer Focus
Bisky tests
Chopper ops/errors
Churn-guard change verification
Ging language modernization
Gon naming
Hisoka edge cases
Killua perf
Kite change integrity
Knov contracts
Knuckle SQL
Kurapika security
Law decomposition
Leorio docs
Luffy product
Mafu-san process
Mafuuu contracts
Melody dispatch/pairing
Meruem structural
Nami frontend
Netero mechanical checks
Pariston premise testing
Pen-botter product gaps
Razor verification
Robin duplication
Ryosuke Go arch
Takumi concurrency
Zoro shape

🤖 Managed by Coder Agents.

@coder-agents-review coder-agents-review Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well-designed preflight validation. The fail-open philosophy is consistent: no false rejections from unrecognized PDF structure, unknown providers, or unparseable pages. The safeMessagePart[T] generics refactoring is clean, and the iterative self-correction visible in the commit history (encryption check added/dropped, Bedrock cap corrected, file estimation fixed) shows genuine care. Test density is solid at ~1:1 with good boundary probes.

Severity breakdown: 1 P2, 4 P3, 3 Nit.

"But a few are wearing outfits they didn't earn." (Bisky, on the test that claims to cover every part type)

Notes from the panel, not posted as inline findings:

  • chatprompt/chatprompt.go:39-61 still carries two concrete safeAsToolCallPart/safeAsToolResultPart duplicates that the new generic could replace. Import direction supports it (chatprompt already imports chatsanitize), but exporting the generic is an API decision outside this PR's scope. (Robin)

  • Five call sites outside chatsanitize use raw fantasy.AsMessagePart without typed-nil guards (chatd.go:6783, chatd.go:9270, chatdebug/summary.go:55, chatadvisor/handoff.go:138,154). Same panic class this PR just fixed within its own package. Low practical risk today (those paths handle fresh responses, not replayed history), but the class is documented now. (Mafuuu, Robin)

  • The MiB vs MB gap (~1.55 MB for Anthropic, ~971 KB for Bedrock) is acknowledged in the Bedrock constant comment with sound fail-open reasoning. The Anthropic constant is silent on the same choice. (Hisoka)

  • Every aggregation test uses a single-message prompt via promptWithParts. The struct-level accumulator makes cross-message bugs unlikely, but the multi-message walk path is unexercised. (Bisky, Hisoka)

  • estimatePartBytes and chatloop/metrics.go:ContentPartSize both walk fantasy.Message parts for different purposes (serialized vs raw size). Both must be updated if fantasy adds a new data-carrying part type, with no cross-reference between them. (Zoro)

🤖 This review was automatically generated with Coder Agents.

// Every part type must contribute to the lower-bound estimate. Each
// part alone fits under the injected cap; two together exceed it.
caps := limits{requestPayloadBytes: 40, pageCap: 100}
parts := []fantasy.MessagePart{

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 [CRF-2] The test is named "Counts every part type toward the request estimate" but the parts slice omits ReasoningPart and ToolResultOutputContentMedia. Both branches in production code (estimatePartBytes line 203, toolResultOutputBytes line 235) have zero byte-counting coverage. Delete either branch and every test still passes.

"Reasoning text can dominate request size in extended conversations (Claude's chain-of-thought output runs to thousands of tokens). An undercounting regression here silently defeats the payload cap for reasoning-heavy prompts, which is exactly the class of request most likely to hit the 32 MiB limit." (Bisky)

The typed-nil test at line 176 exercises (*fantasy.ReasoningPart)(nil), which proves nil-safety but not size contribution.

Fix: add to the parts slice:

fantasy.ReasoningPart{Text: strings.Repeat("r", 30)},
fantasy.ToolResultPart{Output: fantasy.ToolResultOutputContentMedia{Data: strings.Repeat("m", 30)}},

(Bisky P2, Chopper P3, Meruem P3)

🤖

)
}
return fmt.Sprintf(
"PDF attachments include about %d pages, but %s accepts at most %d pages for this model. Split the PDF and retry.",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 [CRF-1] When multiple small PDFs together exceed the cap (e.g., three 40-page PDFs with a 100-page cap), this branch returns "Split the PDF and retry." The user has no single large PDF to split. The single-file branch (line 160) correctly names the oversized file; this aggregate branch gives inapplicable guidance.

Something like "Remove some PDF attachments or reduce total pages and retry" would match the actual situation.

(Netero, Meruem)

🤖

// estimate is a lower bound, so exceeding the cap guarantees rejection.
if v.estimatedRequestBytes > v.caps.requestPayloadBytes {
return v.reject(
fmt.Sprintf("This request is too large for %s's request size limit. Remove or shrink attachments and retry.", v.caps.displayName),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 [CRF-3] "Remove or shrink attachments" assumes the request is large because of user-uploaded files. In an extended agentic session, tool results (file reads, grep output) accumulate across the conversation; 320 tool results averaging 100 KB each reach 32 MB with zero user-uploaded files. The user sees "Remove or shrink attachments," sees none, and has no idea what to do.

"The PDF-specific errors are good. They name the file, state the violation, and prescribe an applicable action. This message should match that standard." (Leorio)

Consider: "Remove attachments or start a shorter conversation and retry," or detect whether the prompt actually contains file parts and branch the advice, the way pageCapMessage already branches on single-file vs. aggregate.

(Chopper P3, Leorio P3)

🤖


// approxPDFPageCount counts /Type /Page markers in data. ok is false when none
// are found, so PDFs with unrecognized structure are not falsely rejected.
func approxPDFPageCount(data []byte) (count int, ok bool) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 [CRF-4] The regex counts every physical /Type /Page marker in the file, including superseded objects from incremental saves. A 5-page PDF that went through 20 incremental-save cycles accumulates ~105 physical markers, tripping the 100-page cap despite having 5 logical pages. Enterprise documents cycling through approval workflows are realistic targets.

"The heuristic is acknowledged as approximate, and a full parser has its own risks. The question is whether the approx label is doing enough work when the overcount can be 20x." (Hisoka)

The fail-open design (zero matches returns ok=false, skipping the check) prevents one class of false positives, but the overcount class remains. One mitigation without a full parser: scope the regex scan to after the final startxref, which points to the live object graph.

(Hisoka)

🤖

return fmt.Sprintf("PDF attachment %q", name)
}

// isPDF reports whether data begins with the standard PDF header.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit [CRF-5] This comment restates bytes.HasPrefix(data, pdfHeader). The same pattern appears on pageCapMessage (line 156), the limits struct (line 45), and reject (line 247). Per project standard: comments should be substantive, not restatements of the code they annotate.

(Gon)

🤖


pdfDefaultPageCap = 100
pdfLargeContextPageCap = 600
pdfLargeContextMinToken = 200_000

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit [CRF-6] pdfLargeContextMinToken reads as an inclusive lower bound ("min"), but the check at line 64 uses strict >, so a 200k-token model does NOT qualify for the larger page cap. pdfLargeContextThreshold would make the boundary behavior unambiguous without needing the comment.

(Gon)

🤖

// ValidatePromptLimits returns an already-classified configuration error,
// so a context wrap is enough; chaterror.Classify unwraps to find it.
if err = chatsanitize.ValidatePromptLimits(provider, opts.ContextLimitFallback, prompt); err != nil {
return canonical, nil, xerrors.Errorf("preflight prompt: %w", err)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit [CRF-7] The full .Error() chain reads: prepare prompt: preflight prompt: prompt preflight failed: reason=invalid_pdf .... Three layers, each a paraphrase of "the prompt preparation step." The structured Detail and Message fields are clean (chaterror.Classify recovers them), but if anyone logs err.Error(), they get three referrals before the actual diagnostic.

Consider "validate prompt limits: %w" (names the specific substep) or dropping the text entirely and relying on the stack frame.

(Leorio)

🤖

require.Contains(t, err.Error(), "reason=invalid_pdf")
})

t.Run("Gates on canonical provider names", func(t *testing.T) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 [CRF-8] This test proves Bedrock triggers validation and uses the Bedrock provider name. No test verifies Bedrock's distinct 20 MiB payload cap vs. Anthropic's 32 MiB, or that user-facing messages say "AWS Bedrock." The validateWithCaps helper always hardcodes fantasyanthropic.Name (line 210), so walker-level tests never exercise Bedrock-specific behavior. If limitsFor gave Bedrock Anthropic's 32 MiB cap, every test would still pass.

Sketch: call ValidatePromptLimits(fantasybedrock.Name, 200_000, oversizedPrompt) where the prompt is between 20 MiB and 32 MiB, and assert rejection with "AWS Bedrock" in the classified message.

(Bisky)

🤖

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant