SEP-2356: File input support for tools and elicitation#2356
Conversation
Proposes Tool.inputFiles and ElicitRequestFormParams.requestedFiles to let servers declaratively mark which arguments expect user-selected files. Clients declaring the fileInputs capability render native file pickers and encode selections as data URIs (with a standardized name= media-type parameter for filenames). Key design points: - inputFiles is a sibling of inputSchema (not an annotation hint) - Per-argument accept[] MIME filters and maxSize limits - Supports single-file and array-of-file arguments - Adds StringArraySchema to PrimitiveSchemaDefinition for elicitation - Capability gates advertising, not acceptance: servers always accept well-formed data URIs regardless of negotiation - Error convention: -32602 with data.reason for size/type violations - Cites OpenAI Apps SDK openai/fileParams as prior art https://claude.ai/code/session_01UE8PfZW3WmKXvoqtamBbtp
- elicitation.form.fileInputs nests under existing client cap - Tool-side stays top-level (no ClientCapabilities.tools exists) - Independent gating: clients can support one surface without the other - Add Open Questions section debating alt placements: new ClientCapabilities.tools namespace vs single unified flag https://claude.ai/code/session_01UE8PfZW3WmKXvoqtamBbtp
One top-level ClientCapabilities.fileInputs flag instead of the split approach. Rationale captured: - Underlying capability (file picker + data URI encoding) is singular - Elicitation is already gated by the elicitation capability itself - Simpler server check - No ClientCapabilities.tools exists to nest under anyway Removes Open Questions section; the placement debate is resolved. https://claude.ai/code/session_01UE8PfZW3WmKXvoqtamBbtp
… to schema
Implements the file upload SEP with declarative file input metadata:
- FileInputDescriptor: { accept?: string[], maxSize?: number } - advisory MIME filter + byte limit
- Tool.inputFiles: maps argument names to FileInputDescriptor for native file pickers
- ElicitRequestFormParams.requestedFiles: symmetric support for elicitation forms
- StringArraySchema: new PrimitiveSchemaDefinition member for multi-file inputs
- ClientCapabilities.fileInputs: capability gate (server MUST NOT send inputFiles without it)
Files are transmitted as RFC 2397 data URIs: data:<mediatype>;name=<filename>;base64,<data>
https://claude.ai/code/session_01JxhHWiXrXgE4JWC27dznRN
Introduces an Overview section in the SEP that walks reviewers through the complete round trip on both surfaces before the formal spec: - Tool: `describe_image` definition with `inputFiles`, paired with the matching `tools/call` request carrying a data-URI argument. - Elicitation: `elicitation/create` request with `requestedFiles`, paired with the matching `ElicitResult` response carrying the data-URI content. All examples use the same real (non-truncated) 1x1 PNG so the wire encoding is concrete and copy-pasteable. The same examples are added as validated JSON files under schema/draft/examples/ and wired into schema.ts via @includecode so they appear in the generated reference docs. https://claude.ai/code/session_0168Rxur9BGcHnAzo3zpZkEH
Rename placeholder SEP file to 2356 (the PR number), fill in the header title and PR link, and regenerate the SEP docs so the new page appears in the community SEP index and navigation. https://claude.ai/code/session_0168Rxur9BGcHnAzo3zpZkEH
Replace 'future SEP' framing in three places with direct guidance: inputFiles covers the inline case, URL-mode elicitation covers files too large to embed. No new transport machinery needed. :house: Remote-Dev: homespace
… handling by surface - Gate StringArraySchema behind fileInputs capability (SEP prose + schema.ts docstring) so existing form-mode clients aren't broken by an unrecognized PrimitiveSchemaDefinition member - Clarify maxSize applies per-file for array-typed arguments - Reword schema-shape constraints to permit extra properties; add SHOULD-ignore rule for malformed inputFiles/requestedFiles entries - Require format:"uri" on StringArraySchema.items for file fields - Add percent-encoding test vector; require name= before ;base64 per RFC 2397 - Define accept matching: type/subtype only, case-insensitive, params stripped - Scope file_uri_malformed to broken data: URIs; non-data URIs are server-defined - Split error handling into tool-call (-32602) vs elicitation-result (re-elicit or fail enclosing op) subsections :house: Remote-Dev: homespace
…p-sdks-8Rub5 # Conflicts: # docs/community/seps/index.mdx # docs/docs.json # docs/seps/2356-declarative-file-inputs-for-tools-and-elicitation.mdx
The committed mdx was out of sync with its source .md (admonition line break). Main never tripped this because the workflow only runs when seps/**/*.md changes; adding 2356.md surfaces the drift.
|
Hey, great feature. Any timeline on this? |
The reason-code table conflicted with outputSchema conformance and would have been the first place the protocol dictates a CallToolResult error body. Validation failures now follow SEP-1303 with descriptive text only; a general structured-error vocabulary is recorded as deferred work. :house: Remote-Dev: homespace
…p-sdks-8Rub5 # Conflicts: # docs/docs.json # docs/seps/index.mdx :house: Remote-Dev: homespace
Fills the preamble (author ochafik, sponsor localden), defines the absent semantic for maxSize, grounds the Motivation section in a concrete deployed workaround from the PR thread, adds an honest Drawbacks section, and adds two negative test vectors (oversized rejection in schema examples, misplaced-keyword snippet in prose). :house: Remote-Dev: homespace
Relaxes the client-side MUST to RFC 2397 conformance with a SHOULD-base64 recommendation for binary content, and confirms servers MUST accept either form. Percent-encoded data URIs (e.g., data:text/plain,hello%20world) are valid per the RFC and natural for short textual payloads. :house: Remote-Dev: homespace
|
Hi @localden |
…p-sdks-8Rub5 🏠 Remote-Dev: homespace
Renames mcpFile to x-mcp-file to follow the x-mcp-* extension keyword convention SEP-2243 already shipped for the same inputSchema placement, and relaxes the host-side MUST-disregard so agentic hosts may forward a model-authored data: URI verbatim under their existing tool-approval policy. The hard security requirement is now scoped to what it actually protects: hosts never read user storage into the slot without an explicit consent gesture and never resolve a model-supplied reference. :house: Remote-Dev: homespace
…hint for forwarded values Adds an Unresolved Questions section for the vocabulary-URI resolvability and SDK pre-registration timing questions so reviewers track them rather than re-derive. Consolidates Future Work into a bulleted list and adds SEP-2631 plus the File Uploads WG as the routing for out-of-band transfer. Adds a SHOULD-level display hint (decoded size and media type) when a host forwards a model-supplied data URI, closing the one approval-UI gap the relaxation introduced that is specific to this slot. :house: Remote-Dev: homespace
… from another branch 🏠 Remote-Dev: homespace
The model would require explicit prompting to NOT fill out the argument, right? I'm assuming that models today would not leave it empty, and would happily hallucinate some input for this argument or proactively read some file and provide the encoded data. Or should the host not advertise the argument to the model at all for x-mcp-file? |
caseychow-oai
left a comment
There was a problem hiding this comment.
Added some minor thoughts. This is really shaping up!
| "params": { | ||
| "name": "describe_image", | ||
| "arguments": { | ||
| "image": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR4nGNkYGBgAAAABQABWaDDsAAAAABJRU5ErkJggg==" |
There was a problem hiding this comment.
I'd want to make sure that we also have a direction in the future for supporting remote files without needing to do a ton of surgery to the schema, especially if we want to represent basic data like MIME types and filenames.
There was a problem hiding this comment.
Expanded the Future Work entry to cover this in 0d4e03b. Remote files and resource references carry the MIME type and filename as headers/metadata rather than inline, so the descriptor's accept and maxSize constraints transfer unchanged; the schema surface doesn't move.
| "x-mcp-file": { | ||
| "accept": ["image/png", "image/jpeg"], | ||
| "maxSize": 5242880 | ||
| } |
There was a problem hiding this comment.
How would fallback occur for clients that don't support this? (I did see below, but one thing I'd be curious about is whether we'd want a protocol-level capability to allow servers to know a client does or doesn't support file inputs, and if not, use one of the prior mechanisms to get as close as possible.)
| introduce an upload protocol for that case. Servers that need files larger | ||
| than their declared `maxSize` **SHOULD** obtain them via [URL-mode | ||
| elicitation][url-elicit], which already provides an out-of-band browser flow | ||
| where the upload protocol is entirely server-controlled. The `x-mcp-file` slot | ||
| carries only files that fit inline. |
There was a problem hiding this comment.
Another common fallback I've seen is using the apps extension to render a file uploader.
There was a problem hiding this comment.
Good call. Added a note on MCP Apps as another out-of-band large-file path in 0d4e03b.
Agree with this. Some potential wording to address it:
|
Maintainer Activity CheckHi @localden! You're assigned to this SEP but there hasn't been any activity from you in 16 days. Please provide an update on:
If you're no longer able to sponsor this SEP, please let us know so we can find another maintainer. This is an automated message from the SEP lifecycle bot. |
| } | ||
| ``` | ||
|
|
||
| A client that recognizes `x-mcp-file` renders a file picker filtered to PNG/JPEG, |
There was a problem hiding this comment.
This should be a MAY or implementation suggestion. I don't think we need to require human in the loop here.
I.e. if the model knows it has a list of photos and can select the right one that should be fine too right?
There was a problem hiding this comment.
Relaxed in 0d4e03b. The host integration section now says hosts SHOULD require an affirmative selection rather than binding silently, but the form is implementation-defined and explicitly includes the model picking from a candidate list.
| The server asks the user for a file mid-flow. The same `x-mcp-file` keyword | ||
| applies to `requestedSchema` properties: | ||
|
|
||
| ```json |
There was a problem hiding this comment.
This will need to get updated to match MRTR semantics from SEP-2322: Multi Round-Trip Requests
There was a problem hiding this comment.
Reconciled in 74729ac. Merged main to pick up the SEP-2322 types and spec docs (d25c6ce), then rewrote the elicitation example to show the InputRequiredResult / inputResponses flow and reworked the §Elicitation results error-handling rationale for the MRTR shapes. The x-mcp-file design itself is unchanged: it decorates a property inside requestedSchema, which MRTR didn't touch; only the JSON-RPC envelope around the example moved.
| } | ||
| ``` | ||
|
|
||
| Clients that encounter `x-mcp-file` on a schema that does not match the permitted |
There was a problem hiding this comment.
If a Server is returning a malformed tool should we suggest something even stronger, like ignoring the tool altogether?
What are the chances that a client responding in this way will be successful?
There was a problem hiding this comment.
Added in 0d4e03b. Clients MAY now omit the tool entirely when ignoring the keyword leaves it unusable (e.g. a malformed required field).
| leaves the argument absent and the host fills it from a user gesture: | ||
|
|
||
| 1. The model emits a `tools/call` with the `x-mcp-file` argument absent. | ||
| 2. The host detects the unfilled slot at the human-in-the-loop confirmation |
There was a problem hiding this comment.
I think we should remove human in the loop requirement here. This should be a suggestion.
Alternatively one could imagine having the AgentLoop implement this by instructing the LLM to return a file name here that matches the description.
There was a problem hiding this comment.
| If a server needs the original filename, it **SHOULD** declare a separate | ||
| ordinary string argument for it; the description on that argument is | ||
| sufficient for hosts to know what to populate. |
There was a problem hiding this comment.
This seems like a potential security or accuracy concern -- it requires the model to specify the exact value twice. Why can't the uri specified by the model be included in the data here?
There was a problem hiding this comment.
Clarified in 0d4e03b. The host populates both the file slot and the filename argument from the same selection, so they can't drift; the model never authors either. Added a cross-reference to the rationale for why name= doesn't travel inside the data URI (RFC 2397 doesn't define it).
| leaves the argument absent and the host fills it from a user gesture: | ||
|
|
||
| 1. The model emits a `tools/call` with the `x-mcp-file` argument absent. | ||
| 2. The host detects the unfilled slot at the human-in-the-loop confirmation |
| an annotation and otherwise ignore it. Clients that do not recognize the | ||
| keyword see an ordinary `uri`-format string field. `StringSchema` gaining an | ||
| optional field is additive. |
There was a problem hiding this comment.
Clients that do not recognize the keyword see an ordinary
uri-format string field.
This seems problematic if the client doesn't support this feature. Is this REQUIRED for clients to implement?
- Split host integration by how the host surfaces tools (output tokens vs. programmatic) and note that hosts should not present x-mcp-file properties to a model that emits arguments as output tokens - Relax the human-in-the-loop requirement to a SHOULD; the affirmative selection form is implementation-defined and may include the model picking from a candidate list - Allow clients to skip a tool entirely when a malformed required x-mcp-file field leaves it unusable - Note MCP Apps as an additional out-of-band large-file path - Clarify that the host populates both the file slot and any companion filename argument from the same selection - Expand Future Work to cover the remote-file path explicitly - Add a 'Why not a custom format value' alternatives entry
|
Pushed 0d4e03b addressing the bulk of the open review feedback. The host integration section is reworked along the lines @pja-ant suggested: it now distinguishes hosts that surface tools such that the model emits Also in 0d4e03b:
Still working through:
|
Brings the branch up to date with the current draft, including the SEP-2322 (MRTR) types and spec docs needed to reconcile this SEP's elicitation example. Auto-generated files (docs/seps/index.mdx, docs/specification/draft/schema.mdx) resolved toward main and regenerated in the next commit.
SEP-2322 is Approved and its types are in schema/draft. The elicitation example showed a standalone elicitation/create JSON-RPC request, which the MRTR spec says servers must no longer send; it's now embedded in an InputRequiredResult returned from the originating tools/call, with the ElicitResult carried in inputResponses on the client's retry. The 'Elicitation results' section's premise that an ElicitResult is a JSON-RPC response (and therefore can't be rejected with -32602) no longer holds; under MRTR the ElicitResult rides in the params of a new client request. The recommended recovery is unchanged in substance (re-elicit or fail the enclosing operation), reworded for the MRTR shapes, plus a note that protocol errors are reserved for structurally malformed inputResponses rather than file-validation failures. Also updates the two file-input schema examples for the per-request clientCapabilities _meta and resultType fields the merged draft requires, and regenerates docs.json / index.mdx / schema.mdx from the merged sources.
The MRTR reconciliation updated CallToolRequest and CallToolResult example JSON files but did not regenerate the rendered schema reference, so check:schema:md was failing in CI. :house: Remote-Dev: homespace
Maintainer Activity CheckHi @localden! You're assigned to this SEP but there hasn't been any activity from you in 20 days. Please provide an update on:
If you're no longer able to sponsor this SEP, please let us know so we can find another maintainer. This is an automated message from the SEP lifecycle bot. |
Summary
This PR adds support for declarative file inputs in tools and elicitation forms, allowing servers to specify which arguments/fields should be rendered as file pickers by clients. Files are transmitted as RFC 2397 data URIs with embedded metadata.
Key Changes
fileInputscapability toClientCapabilitiesto allow clients to declare support for file input handlinginputFilesfield toToolinterface to declare which tool arguments accept file inputs, with optional MIME type and size constraintsrequestedFilesfield toElicitRequestFormParamsfor file inputs in elicitation formsFileInputDescriptorinterface with optionalaccept(MIME type patterns) andmaxSize(byte limit) fields for client-side validation hintsPrimitiveSchemaDefinitionto includeStringArraySchemafor multi-file inputs in elicitation formsImplementation Details
data:<mediatype>;name=<filename>;base64,<data>name=parameter is percent-encoded to preserve original filenamesfileInputscapabilityFileInputDescriptorare advisory; servers must independently validate inputsInvalidParamsErrorand reason"file_too_large"{"type": "string", "format": "uri"}or arrays thereofPrototype PRs
Related work
_meta["openai/fileParams"]is used in MCP Apps for the same purpose, but with an extra step to upload a file & get a fileId