Skip to content

SEP-2356: File input support for tools and elicitation#2356

Open
ochafik wants to merge 29 commits into
mainfrom
claude/file-upload-sep-sdks-8Rub5
Open

SEP-2356: File input support for tools and elicitation#2356
ochafik wants to merge 29 commits into
mainfrom
claude/file-upload-sep-sdks-8Rub5

Conversation

@ochafik

@ochafik ochafik commented Mar 5, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR adds support for declarative file inputs in tools and elicitation forms, allowing servers to specify which arguments/fields should be rendered as file pickers by clients. Files are transmitted as RFC 2397 data URIs with embedded metadata.

Key Changes

  • Client capability: Added fileInputs capability to ClientCapabilities to allow clients to declare support for file input handling
  • Tool file inputs: Added inputFiles field to Tool interface to declare which tool arguments accept file inputs, with optional MIME type and size constraints
  • Elicitation file inputs: Added requestedFiles field to ElicitRequestFormParams for file inputs in elicitation forms
  • File input descriptor: Introduced FileInputDescriptor interface with optional accept (MIME type patterns) and maxSize (byte limit) fields for client-side validation hints
  • Schema support: Extended PrimitiveSchemaDefinition to include StringArraySchema for multi-file inputs in elicitation forms
  • StringArraySchema: Added new schema type for flat arrays of strings, intended for multi-file inputs where each item is a data URI

Implementation Details

  • File inputs are encoded as RFC 2397 data URIs with format: data:<mediatype>;name=<filename>;base64,<data>
  • The name= parameter is percent-encoded to preserve original filenames
  • Servers MUST NOT include file input fields unless the client declared the fileInputs capability
  • All validation hints in FileInputDescriptor are advisory; servers must independently validate inputs
  • Servers SHOULD reject oversized files with InvalidParamsError and reason "file_too_large"
  • Schema properties for file inputs MUST be {"type": "string", "format": "uri"} or arrays thereof

Prototype PRs

Related work

claude added 4 commits March 4, 2026 20:03
Proposes Tool.inputFiles and ElicitRequestFormParams.requestedFiles to
let servers declaratively mark which arguments expect user-selected
files. Clients declaring the fileInputs capability render native file
pickers and encode selections as data URIs (with a standardized name=
media-type parameter for filenames).

Key design points:
- inputFiles is a sibling of inputSchema (not an annotation hint)
- Per-argument accept[] MIME filters and maxSize limits
- Supports single-file and array-of-file arguments
- Adds StringArraySchema to PrimitiveSchemaDefinition for elicitation
- Capability gates advertising, not acceptance: servers always accept
  well-formed data URIs regardless of negotiation
- Error convention: -32602 with data.reason for size/type violations
- Cites OpenAI Apps SDK openai/fileParams as prior art

https://claude.ai/code/session_01UE8PfZW3WmKXvoqtamBbtp
- elicitation.form.fileInputs nests under existing client cap
- Tool-side stays top-level (no ClientCapabilities.tools exists)
- Independent gating: clients can support one surface without the other
- Add Open Questions section debating alt placements: new
  ClientCapabilities.tools namespace vs single unified flag

https://claude.ai/code/session_01UE8PfZW3WmKXvoqtamBbtp
One top-level ClientCapabilities.fileInputs flag instead of the split
approach. Rationale captured:
- Underlying capability (file picker + data URI encoding) is singular
- Elicitation is already gated by the elicitation capability itself
- Simpler server check
- No ClientCapabilities.tools exists to nest under anyway

Removes Open Questions section; the placement debate is resolved.

https://claude.ai/code/session_01UE8PfZW3WmKXvoqtamBbtp
… to schema

Implements the file upload SEP with declarative file input metadata:
- FileInputDescriptor: { accept?: string[], maxSize?: number } - advisory MIME filter + byte limit
- Tool.inputFiles: maps argument names to FileInputDescriptor for native file pickers
- ElicitRequestFormParams.requestedFiles: symmetric support for elicitation forms
- StringArraySchema: new PrimitiveSchemaDefinition member for multi-file inputs
- ClientCapabilities.fileInputs: capability gate (server MUST NOT send inputFiles without it)

Files are transmitted as RFC 2397 data URIs: data:<mediatype>;name=<filename>;base64,<data>

https://claude.ai/code/session_01JxhHWiXrXgE4JWC27dznRN
@ochafik ochafik changed the title Add declarative file input support for tools and elicitation SEP 2356 - File input support for tools and elicitation Mar 5, 2026
claude added 2 commits March 6, 2026 17:44
Introduces an Overview section in the SEP that walks reviewers through
the complete round trip on both surfaces before the formal spec:

- Tool: `describe_image` definition with `inputFiles`, paired with the
  matching `tools/call` request carrying a data-URI argument.
- Elicitation: `elicitation/create` request with `requestedFiles`,
  paired with the matching `ElicitResult` response carrying the
  data-URI content.

All examples use the same real (non-truncated) 1x1 PNG so the wire
encoding is concrete and copy-pasteable.

The same examples are added as validated JSON files under
schema/draft/examples/ and wired into schema.ts via @includecode so
they appear in the generated reference docs.

https://claude.ai/code/session_0168Rxur9BGcHnAzo3zpZkEH
Rename placeholder SEP file to 2356 (the PR number), fill in the
header title and PR link, and regenerate the SEP docs so the new
page appears in the community SEP index and navigation.

https://claude.ai/code/session_0168Rxur9BGcHnAzo3zpZkEH
Comment thread seps/2356-declarative-file-inputs-for-tools-and-elicitation.md Outdated
Replace 'future SEP' framing in three places with direct guidance:
inputFiles covers the inline case, URL-mode elicitation covers files
too large to embed. No new transport machinery needed.

:house: Remote-Dev: homespace
… handling by surface

- Gate StringArraySchema behind fileInputs capability (SEP prose + schema.ts
  docstring) so existing form-mode clients aren't broken by an unrecognized
  PrimitiveSchemaDefinition member
- Clarify maxSize applies per-file for array-typed arguments
- Reword schema-shape constraints to permit extra properties; add SHOULD-ignore
  rule for malformed inputFiles/requestedFiles entries
- Require format:"uri" on StringArraySchema.items for file fields
- Add percent-encoding test vector; require name= before ;base64 per RFC 2397
- Define accept matching: type/subtype only, case-insensitive, params stripped
- Scope file_uri_malformed to broken data: URIs; non-data URIs are
  server-defined
- Split error handling into tool-call (-32602) vs elicitation-result
  (re-elicit or fail enclosing op) subsections

:house: Remote-Dev: homespace
@localden localden changed the title SEP 2356 - File input support for tools and elicitation SEP 2356: File input support for tools and elicitation Mar 15, 2026
claude added 2 commits March 17, 2026 15:04
…p-sdks-8Rub5

# Conflicts:
#	docs/community/seps/index.mdx
#	docs/docs.json
#	docs/seps/2356-declarative-file-inputs-for-tools-and-elicitation.mdx
The committed mdx was out of sync with its source .md (admonition
line break). Main never tripped this because the workflow only runs
when seps/**/*.md changes; adding 2356.md surfaces the drift.
@localden localden self-assigned this Mar 18, 2026
@localden localden added draft SEP proposal with a sponsor. and removed proposal SEP proposal without a sponsor. labels Mar 18, 2026
@localden localden changed the title SEP 2356: File input support for tools and elicitation SEP-2356: File input support for tools and elicitation Mar 18, 2026
@jakobwennberg

Copy link
Copy Markdown

Hey, great feature. Any timeline on this?

@ochafik ochafik marked this pull request as ready for review March 26, 2026 13:56
@ochafik ochafik requested a review from a team as a code owner March 26, 2026 13:56
The reason-code table conflicted with outputSchema conformance and would
have been the first place the protocol dictates a CallToolResult error
body. Validation failures now follow SEP-1303 with descriptive text only;
a general structured-error vocabulary is recorded as deferred work.

:house: Remote-Dev: homespace
…p-sdks-8Rub5

# Conflicts:
#	docs/docs.json
#	docs/seps/index.mdx

:house: Remote-Dev: homespace
Fills the preamble (author ochafik, sponsor localden), defines the
absent semantic for maxSize, grounds the Motivation section in a
concrete deployed workaround from the PR thread, adds an honest
Drawbacks section, and adds two negative test vectors (oversized
rejection in schema examples, misplaced-keyword snippet in prose).

:house: Remote-Dev: homespace
Relaxes the client-side MUST to RFC 2397 conformance with a SHOULD-base64
recommendation for binary content, and confirms servers MUST accept either
form. Percent-encoded data URIs (e.g., data:text/plain,hello%20world) are
valid per the RFC and natural for short textual payloads.

:house: Remote-Dev: homespace
@priyasharma123

Copy link
Copy Markdown

Hi @localden
Thanks for the contribution! Could you please share an estimated timeline for when this feature is expected to be available in MCP (e.g., next release or specific version)?

Comment thread docs/seps/2356-declarative-file-inputs-for-tools-and-elicitation.mdx Outdated
Comment thread docs/seps/2356-declarative-file-inputs-for-tools-and-elicitation.mdx Outdated
Renames mcpFile to x-mcp-file to follow the x-mcp-* extension keyword
convention SEP-2243 already shipped for the same inputSchema placement,
and relaxes the host-side MUST-disregard so agentic hosts may forward a
model-authored data: URI verbatim under their existing tool-approval
policy. The hard security requirement is now scoped to what it actually
protects: hosts never read user storage into the slot without an explicit
consent gesture and never resolve a model-supplied reference.

:house: Remote-Dev: homespace
…hint for forwarded values

Adds an Unresolved Questions section for the vocabulary-URI resolvability
and SDK pre-registration timing questions so reviewers track them rather
than re-derive. Consolidates Future Work into a bulleted list and adds
SEP-2631 plus the File Uploads WG as the routing for out-of-band transfer.
Adds a SHOULD-level display hint (decoded size and media type) when a host
forwards a model-supplied data URI, closing the one approval-UI gap the
relaxation introduced that is specific to this slot.

:house: Remote-Dev: homespace
… from another branch

🏠 Remote-Dev: homespace
@clareliguori

clareliguori commented May 6, 2026

Copy link
Copy Markdown
Contributor

Host integration on the tool surface

Tools are model-controlled; the model populates arguments. For an x-mcp-file argument on the tool surface, the typical flow is that the model leaves the argument absent and the host fills it from a user gesture:

The model would require explicit prompting to NOT fill out the argument, right? I'm assuming that models today would not leave it empty, and would happily hallucinate some input for this argument or proactively read some file and provide the encoded data.

Or should the host not advertise the argument to the model at all for x-mcp-file?

@caseychow-oai caseychow-oai left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some minor thoughts. This is really shaping up!

"params": {
"name": "describe_image",
"arguments": {
"image": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR4nGNkYGBgAAAABQABWaDDsAAAAABJRU5ErkJggg=="

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd want to make sure that we also have a direction in the future for supporting remote files without needing to do a ton of surgery to the schema, especially if we want to represent basic data like MIME types and filenames.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded the Future Work entry to cover this in 0d4e03b. Remote files and resource references carry the MIME type and filename as headers/metadata rather than inline, so the descriptor's accept and maxSize constraints transfer unchanged; the schema surface doesn't move.

Comment on lines +84 to +87
"x-mcp-file": {
"accept": ["image/png", "image/jpeg"],
"maxSize": 5242880
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would fallback occur for clients that don't support this? (I did see below, but one thing I'd be curious about is whether we'd want a protocol-level capability to allow servers to know a client does or doesn't support file inputs, and if not, use one of the prior mechanisms to get as close as possible.)

Comment on lines +321 to +325
introduce an upload protocol for that case. Servers that need files larger
than their declared `maxSize` **SHOULD** obtain them via [URL-mode
elicitation][url-elicit], which already provides an out-of-band browser flow
where the upload protocol is entirely server-controlled. The `x-mcp-file` slot
carries only files that fit inline.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another common fallback I've seen is using the apps extension to render a file uploader.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Added a note on MCP Apps as another out-of-band large-file path in 0d4e03b.

@pja-ant

pja-ant commented May 8, 2026

Copy link
Copy Markdown
Contributor

Host integration on the tool surface

Tools are model-controlled; the model populates arguments. For an x-mcp-file argument on the tool surface, the typical flow is that the model leaves the argument absent and the host fills it from a user gesture:

The model would require explicit prompting to NOT fill out the argument, right? I'm assuming that models today would not leave it empty, and would happily hallucinate some input for this argument or proactively read some file and provide the encoded data.

Or should the host not advertise the argument to the model at all for x-mcp-file?

Agree with this.

Some potential wording to address it:

Where the host surfaces tools such that the model emits arguments directly as output tokens, hosts SHOULD NOT present x-mcp-file-annotated properties to the model verbatim — models will attempt to populate them, either by hallucinating a value or by reading and re-emitting file bytes through the token stream. Such hosts SHOULD either omit the property from the model-visible schema (noting in the tool description that a file attachment is consumed) or replace it with a sentinel the model can emit to request host-side substitution.

Where the model constructs arguments programmatically — for example, via a code-execution environment that invokes an MCP client library — hosts SHOULD present the schema unchanged. The model populates the slot in code without the bytes entering the token stream, and the host forwards the value per the paragraph below.

In either case, the server-facing tools/call MUST carry the encoded value per Wire encoding.

@sep-automation-bot

Copy link
Copy Markdown

Maintainer Activity Check

Hi @localden!

You're assigned to this SEP but there hasn't been any activity from you in 16 days.

Please provide an update on:

  • Current status of your review/work
  • Any blockers or concerns
  • Expected timeline for next steps

If you're no longer able to sponsor this SEP, please let us know so we can find another maintainer.


This is an automated message from the SEP lifecycle bot.

}
```

A client that recognizes `x-mcp-file` renders a file picker filtered to PNG/JPEG,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a MAY or implementation suggestion. I don't think we need to require human in the loop here.

I.e. if the model knows it has a list of photos and can select the right one that should be fine too right?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relaxed in 0d4e03b. The host integration section now says hosts SHOULD require an affirmative selection rather than binding silently, but the form is implementation-defined and explicitly includes the model picking from a candidate list.

The server asks the user for a file mid-flow. The same `x-mcp-file` keyword
applies to `requestedSchema` properties:

```json

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to get updated to match MRTR semantics from SEP-2322: Multi Round-Trip Requests

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reconciled in 74729ac. Merged main to pick up the SEP-2322 types and spec docs (d25c6ce), then rewrote the elicitation example to show the InputRequiredResult / inputResponses flow and reworked the §Elicitation results error-handling rationale for the MRTR shapes. The x-mcp-file design itself is unchanged: it decorates a property inside requestedSchema, which MRTR didn't touch; only the JSON-RPC envelope around the example moved.

}
```

Clients that encounter `x-mcp-file` on a schema that does not match the permitted

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a Server is returning a malformed tool should we suggest something even stronger, like ignoring the tool altogether?

What are the chances that a client responding in this way will be successful?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 0d4e03b. Clients MAY now omit the tool entirely when ignoring the keyword leaves it unusable (e.g. a malformed required field).

leaves the argument absent and the host fills it from a user gesture:

1. The model emits a `tools/call` with the `x-mcp-file` argument absent.
2. The host detects the unfilled slot at the human-in-the-loop confirmation

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove human in the loop requirement here. This should be a suggestion.

Alternatively one could imagine having the AgentLoop implement this by instructing the LLM to return a file name here that matches the description.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked in 0d4e03b along the lines you and @kurtisvg suggested. The numbered flow now explicitly allows the host to offer the model a list of candidates to choose from, and the per-invocation confirmation is a SHOULD with the form left implementation-defined.

Comment on lines +268 to +270
If a server needs the original filename, it **SHOULD** declare a separate
ordinary string argument for it; the description on that argument is
sufficient for hosts to know what to populate.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a potential security or accuracy concern -- it requires the model to specify the exact value twice. Why can't the uri specified by the model be included in the data here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified in 0d4e03b. The host populates both the file slot and the filename argument from the same selection, so they can't drift; the model never authors either. Added a cross-reference to the rationale for why name= doesn't travel inside the data URI (RFC 2397 doesn't define it).

leaves the argument absent and the host fills it from a user gesture:

1. The model emits a `tools/call` with the `x-mcp-file` argument absent.
2. The host detects the unfilled slot at the human-in-the-loop confirmation

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment on lines +539 to +541
an annotation and otherwise ignore it. Clients that do not recognize the
keyword see an ordinary `uri`-format string field. `StringSchema` gaining an
optional field is additive.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clients that do not recognize the keyword see an ordinary uri-format string field.

This seems problematic if the client doesn't support this feature. Is this REQUIRED for clients to implement?

- Split host integration by how the host surfaces tools (output tokens vs.
  programmatic) and note that hosts should not present x-mcp-file properties
  to a model that emits arguments as output tokens
- Relax the human-in-the-loop requirement to a SHOULD; the affirmative
  selection form is implementation-defined and may include the model picking
  from a candidate list
- Allow clients to skip a tool entirely when a malformed required x-mcp-file
  field leaves it unusable
- Note MCP Apps as an additional out-of-band large-file path
- Clarify that the host populates both the file slot and any companion
  filename argument from the same selection
- Expand Future Work to cover the remote-file path explicitly
- Add a 'Why not a custom format value' alternatives entry
@localden

Copy link
Copy Markdown
Contributor

Pushed 0d4e03b addressing the bulk of the open review feedback.

The host integration section is reworked along the lines @pja-ant suggested: it now distinguishes hosts that surface tools such that the model emits arguments directly as output tokens (where the host should not present x-mcp-file properties to the model verbatim, since the model will try to populate them) from hosts where the model constructs arguments programmatically (where the schema is presented unchanged). That should close @caseychow-oai's hallucination concern.

Also in 0d4e03b:

  • Human-in-the-loop is now a SHOULD with the selection form left implementation-defined, per @CaitieM20 and @kurtisvg. Includes the model-picks-from-a-candidate-list case.
  • Clients MAY omit a tool whose malformed x-mcp-file field leaves it unusable.
  • MCP Apps noted as another out-of-band large-file path.
  • The Future Work entry on additional wire schemes now covers remote files explicitly.
  • Added a "Why not a custom format value?" alternatives entry covering why format alone can't carry the descriptor.

Still working through:

  • Capability gate (whether fileInputs should be a declared client capability vs. the current degrade-gracefully approach).
  • Reconciling the elicitation flow with SEP-2322 MRTR semantics.

localden and others added 3 commits May 13, 2026 07:52
Brings the branch up to date with the current draft, including the
SEP-2322 (MRTR) types and spec docs needed to reconcile this SEP's
elicitation example. Auto-generated files (docs/seps/index.mdx,
docs/specification/draft/schema.mdx) resolved toward main and
regenerated in the next commit.
SEP-2322 is Approved and its types are in schema/draft. The elicitation
example showed a standalone elicitation/create JSON-RPC request, which
the MRTR spec says servers must no longer send; it's now embedded in an
InputRequiredResult returned from the originating tools/call, with the
ElicitResult carried in inputResponses on the client's retry.

The 'Elicitation results' section's premise that an ElicitResult is a
JSON-RPC response (and therefore can't be rejected with -32602) no
longer holds; under MRTR the ElicitResult rides in the params of a new
client request. The recommended recovery is unchanged in substance
(re-elicit or fail the enclosing operation), reworded for the MRTR
shapes, plus a note that protocol errors are reserved for structurally
malformed inputResponses rather than file-validation failures.

Also updates the two file-input schema examples for the per-request
clientCapabilities _meta and resultType fields the merged draft requires,
and regenerates docs.json / index.mdx / schema.mdx from the merged
sources.
The MRTR reconciliation updated CallToolRequest and CallToolResult example
JSON files but did not regenerate the rendered schema reference, so
check:schema:md was failing in CI.

:house: Remote-Dev: homespace
@sep-automation-bot

Copy link
Copy Markdown

Maintainer Activity Check

Hi @localden!

You're assigned to this SEP but there hasn't been any activity from you in 20 days.

Please provide an update on:

  • Current status of your review/work
  • Any blockers or concerns
  • Expected timeline for next steps

If you're no longer able to sponsor this SEP, please let us know so we can find another maintainer.


This is an automated message from the SEP lifecycle bot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

in-review SEP proposal ready for review. SEP

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.