Skip to content

SEP-2663: Tasks Extension#2663

Open
LucaButBoring wants to merge 39 commits into
modelcontextprotocol:mainfrom
LucaButBoring:feat/ext-tasks
Open

SEP-2663: Tasks Extension#2663
LucaButBoring wants to merge 39 commits into
modelcontextprotocol:mainfrom
LucaButBoring:feat/ext-tasks

Conversation

@LucaButBoring
Copy link
Copy Markdown
Contributor

@LucaButBoring LucaButBoring commented Apr 29, 2026

This SEP defines an extension that allows a server respond to a tools/call request with an asynchronous task handle instead of a final result, allowing the client to retrieve the eventual result by polling. The extension introduces three methods: tasks/get, tasks/update, and tasks/cancel; a polymorphic-result discriminator (resultType: "task"); and a Task shape that carries a task status, in-progress server-to-client requests, and a final result or error. Task creation is server-directed: the client signals support by including the extension in its per-request capabilities, and the server decides on a per-request basis whether to materialize a task.

Tasks will become a foundational building block of MCP and are expected to be supported in future protocol versions. The experimental tasks feature in the 2025-11-25 specification served as a stopgap until the protocol's extension mechanism was available. Now that extensions have been formalized, moving tasks to an official extension gives the feature time to incubate and evolve based on additional real-world implementation feedback, without being constrained by the core specification's release cadence. Once the extension has stabilized and achieved broad adoption, it is intended to be promoted into the core protocol.

This proposal removes the version of tasks specified in the 2025-11-25 release. It is shaped by implementation feedback since that release and by several changes to the base protocol expected to arrive in the 2026-06-30 specification:

Motivation and Context

The experimental tasks feature served as an alternate execution mode for tool calls, elicitation, and sampling, allowing receivers to return a poll handle instead of blocking until a final result was ready. Implementation experience surfaced several challenges:

  1. The handshake is fragile. Tasks today expose method-level capabilities (tasks.requests.tools.call declares that tools/call MAY be task-augmented) alongside a tool-level execution.taskSupport field that declares whether a particular tool will accept the augmentation. Clients express their own support for tasks by passing a task parameter on their requests, but MUST NOT include it if the method/tool does not support tasks. A client that wants to opt into tasks must therefore prime its state with a tools/list call before issuing any task-augmented request, and cannot blindly attach a task parameter to every request to handle tools isomorphically. This is confusing, implicit, and easy to get wrong.

  2. tasks/result is a blocking trap. In the current flow, a client that observes input_required is required to call tasks/result prematurely so that the server has an SSE stream on which to side-channel elicitation or sampling requests. tasks/result then blocks until the entire operation completes. This forces long-lived persistent connections that many clients and servers do not want to implement, and it conflicts with SEP-2260, which disallows unsolicited server-to-client requests outright. Under SEP-2260, the SSE semantics that justified the blocking behavior no longer apply.

  3. tasks/list scoping cannot be defined. To avoid clients cancelling or retrieving results for tasks they shouldn't have access to, all tasks should be bound to some sort of "authorization context," the implementation of which is left to individual servers according to their existing bespoke permission models. However, in many cases, it is not possible to perform this binding, in which case the task ID becomes the only line of defense against contamination. In this scenario, it is unsafe for a server to support tasks/list at all. While it was possible for tasks to instead be bound to a session, SEP-2567 removes sessions from the protocol. There is no other natural scope a server can define unilaterally — task IDs can be unguessable handles that a server can recognize one at a time, but servers cannot reliably correlate two unrelated handles to the same caller without additional state.

Beyond implementation challenges, tasks face another structural issue: Client-hosted tasks are no longer expressible. SEP-1686 permitted clients to host tasks for elicitation and sampling, in part to avoid coupling tasks to tool calls. SEP-2260 makes any unsolicited server-to-client request invalid; every server-to-client polling request under client-hosted tasks would be unsolicited by definition.

This proposal intends to solve the above issues by redesigning certain aspects of the feature and moving tasks out to an official extension. Redefining tasks as an official extension gives the feature more time to incubate and evolve independently of the core specification, promoting adoption. As part of the redesign, this proposal consolidates the polling lifecycle into tasks/get and a new tasks/update to remove the blocking tasks/result method. The redesign allows servers to return tasks unsolicited (in response to ordinary, non-task-flagged requests) to eliminate the per-request opt-in and the tools/list warmup, relying instead on the extension capability as the single handshake point. Finally, this proposal removes client-hosted elicitation and sampling tasks in compliance with SEP-2260.

How Has This Been Tested?

Conformance test suite: modelcontextprotocol/conformance#262

Breaking Changes

Described in proposal.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

Supersedes #2557.


AI Use Disclosure: The extension SEP document in this PR was initially drafted using claude.ai with the previous iteration as a reference. I rewrote/rephrased many sections myself and verified its correctness, using claude.ai as a reviewer to iteratively scrub out issues.

@LucaButBoring LucaButBoring changed the title SEP-XXXX: Tasks Extension SEP-2663: Tasks Extension Apr 29, 2026
@LucaButBoring LucaButBoring requested review from a team as code owners April 29, 2026 01:14
@LucaButBoring LucaButBoring added this to the 2026-06-30-RC milestone Apr 29, 2026
@LucaButBoring LucaButBoring added SEP in-review SEP proposal ready for review. extension roadmap/agents Roadmap: Agent Communication (Tasks lifecycle) labels Apr 29, 2026
@LucaButBoring
Copy link
Copy Markdown
Contributor Author

Moving discussion from #2557 over here @CaitieM20 @markdroth @Randgalt @kurtisvg @localden @pja-ant @dsp-ant @maxisbey @maciej-kisiel @ylxlpl

(Tagged everyone who commented on #2557)

@localden
Copy link
Copy Markdown
Contributor

Thanks for putting this together, @LucaButBoring - I'll post the comment that I was typing earlier in #2557 and let you validate how much of this is still relevant.

Some notes beyond the other bits I called out in the review. There's a few places where I think the SEP is a little underspecified:

  1. CreateTaskResult and GetTaskResult both carry resultType: "task" but have different shapes. In schema.ts, CreateTaskResult is { task: Task } (nested), while GetTaskResult is Result & DetailedTask (flat). So a client switching on resultType === "task" then has to also check whether it got result.task.taskId or result.taskId. Is the nesting on CreateTaskResult intentional, or a holdover from before GetTaskResult was flattened?
  2. The inputRequests key contract stops at "SHOULD dedupe." The tasks spec says clients should dedupe by key, and the inputResponses JSDoc says keys match inputRequests keys. But it doesn't say whether a key is unique for the task's lifetime or can be reused after the server consumes the response.
  3. Retry of tasks/get with inputResponses. What happens if a client sends tasks/get { inputResponses: { k: ... } }, network blips, and client retries? If the response was a sampling result the server feeds into a downstream API call, that call just ran twice. IMO the smallest fix is to say the server MUST treat inputResponses keyed on a request it has already consumed as a no-op. That makes the key the idempotency token, and it's almost what the key-matching contract says already.
  4. "The same requests will be included" is ambiguous for partial responses. The Input Requests section says if the client polls again before providing all responses, the same requests reappear. Does "the same requests" mean the full original set (client must re-send what it already provided) or only the still-unfulfilled remainder? I'd read it as the remainder, but the text doesn't say so, and it interacts with the idempotency point above.
  5. The cancel behavior has two stories. The spec says servers MAY ignore cancellation but MUST support tasks/cancel, which I read as: always return a valid CancelTaskResult, possibly with a non-cancelled status. But the "Cancellation Not Supported" example returns a -32603 JSON-RPC error instead. Those are different contracts. Can we formalize that the response carries the task's current status, which may not be cancelled, and drop the -32603 example.
  6. ttl and pollInterval are now in different units. The schema still documents pollInterval in milliseconds while the SEP moves ttl to seconds. So { ttl: 60, pollInterval: 5000 } is 60 seconds next to 5000 milliseconds. @pja-ant raised this before and I don't see it landed yet. Both fields should match.
  7. The Failed example might be the wrong status under the new rule. The Task Flow Change section says failed is for JSON-RPC errors and application faults go to completed with isError: true. The Failed example shows error: { code: -32603, message: "API rate limit exceeded" }. A downstream API rate-limiting the tool is an application fault (exactly the case the new rule routes to completed). If -32603 here means the MCP server itself fell over, the message should say that; otherwise the example is the case the rule says not to use failed for.
  8. Is taskId alone always sufficient for tasks/get? requestState lets a server externalize lookup state to the client (a backend job ID, a serialized continuation) so it doesn't have to keep a mapping table - that makes sense. But in a fully stateless deployment a server could push that to the limit and put the entire task record in requestState, keeping nothing locally. At that point tasks/get { taskId } without requestState has nothing to look up, which runs into the "MUST NOT return CreateTaskResult until tasks/get would find it" guarantee. Should we be explicit about the taskId always being sufficient as a standalone index of a task?

A couple of schema regressions I noticed too:

  • CallToolRequestParams, CreateMessageRequestParams, and the ElicitRequest*Params types no longer extend anything after TaskAugmentedRequestParams was removed, so they've lost the RequestParams base and _meta? with it.
  • ServerRequest still includes GetTaskRequest and CancelTaskRequest even though client-hosted tasks are removed.

@localden localden moved this to In Review in SEP Review Pipeline Apr 29, 2026
@localden localden moved this from In Review to Review Batch in SEP Review Pipeline Apr 29, 2026
Comment thread seps/2663-tasks-extension.md Outdated
Comment thread seps/2663-tasks-extension.md Outdated
Comment thread seps/2663-tasks-extension.md
Comment thread seps/2663-tasks-extension.md
@LucaButBoring
Copy link
Copy Markdown
Contributor Author

LucaButBoring commented Apr 29, 2026

@localden Thanks for the feedback, going through this:

  1. CreateTaskResult and GetTaskResult both carry resultType: "task" but have different shapes. In schema.ts, CreateTaskResult is { task: Task } (nested), while GetTaskResult is Result & DetailedTask (flat). So a client switching on resultType === "task" then has to also check whether it got result.task.taskId or result.taskId. Is the nesting on CreateTaskResult intentional, or a holdover from before GetTaskResult was flattened?

This revision limits resultType: "task" to CreateTaskResult to avoid any ambiguity, noticed that issue while rewriting this. GetTaskResult was always flat, the distinction was that we made CreateTaskResult nested at the last minute in 2025-11-25 to allow switching on it. That nesting is a holdover from before we had resultType, so we can actually flatten CreateTaskResult, too.

edit: updated

  1. The inputRequests key contract stops at "SHOULD dedupe." The tasks spec says clients should dedupe by key, and the inputResponses JSDoc says keys match inputRequests keys. But it doesn't say whether a key is unique for the task's lifetime or can be reused after the server consumes the response.

This revision does require keys to be unique over the lifetime of a task, and not reused between distinct requests.

  1. Retry of tasks/get with inputResponses. What happens if a client sends tasks/get { inputResponses: { k: ... } }, network blips, and client retries? If the response was a sampling result the server feeds into a downstream API call, that call just ran twice. IMO the smallest fix is to say the server MUST treat inputResponses keyed on a request it has already consumed as a no-op. That makes the key the idempotency token, and it's almost what the key-matching contract says already.

Yup, that's how tasks/update works in this revision.

  1. "The same requests will be included" is ambiguous for partial responses. The Input Requests section says if the client polls again before providing all responses, the same requests reappear. Does "the same requests" mean the full original set (client must re-send what it already provided) or only the still-unfulfilled remainder? I'd read it as the remainder, but the text doesn't say so, and it interacts with the idempotency point above.

I struck out that phrasing in this revision, now it can actually be either, as tasks/update is eventually-consistent - but the new key uniqueness constraint means that this is fine from the client's perspective, now.

  1. The cancel behavior has two stories. The spec says servers MAY ignore cancellation but MUST support tasks/cancel, which I read as: always return a valid CancelTaskResult, possibly with a non-cancelled status. But the "Cancellation Not Supported" example returns a -32603 JSON-RPC error instead. Those are different contracts. Can we formalize that the response carries the task's current status, which may not be cancelled, and drop the -32603 example.

To deal with that, in this revision, tasks/cancel no longer has any result (and is also eventually-consistent, like tasks/update).

  1. ttl and pollInterval are now in different units. The schema still documents pollInterval in milliseconds while the SEP moves ttl to seconds. So { ttl: 60, pollInterval: 5000 } is 60 seconds next to 5000 milliseconds. @pja-ant raised this before and I don't see it landed yet. Both fields should match.

A TTL in integer seconds makes sense, but I'm not sure if a polling interval in integer seconds does - 500ms would be a reasonable polling interval for a relatively quick, but high-variance (1s-20s) task. A duration is probably better-expressed with units included in the value (e.g. "500ms"), but that would be nonstandard for us - I suppose I could name it pollIntervalMilliseconds, but that feels awkward and inconsistent in its own right, since nothing else includes units in the field name so far.

edit: updated to include units in the field names

  1. The Failed example might be the wrong status under the new rule. The Task Flow Change section says failed is for JSON-RPC errors and application faults go to completed with isError: true. The Failed example shows error: { code: -32603, message: "API rate limit exceeded" }. A downstream API rate-limiting the tool is an application fault (exactly the case the new rule routes to completed). If -32603 here means the MCP server itself fell over, the message should say that; otherwise the example is the case the rule says not to use failed for.

Noted, I'll update the phrasing here - it actually doesn't really mean the MCP server fell over either, the literal intent is just that if the inner request returns a JSON-RPC error, that's failed, and in every other case (including a tool call with isError: true), that's completed.

edit: updated

  1. Is taskId alone always sufficient for tasks/get? requestState lets a server externalize lookup state to the client (a backend job ID, a serialized continuation) so it doesn't have to keep a mapping table - that makes sense. But in a fully stateless deployment a server could push that to the limit and put the entire task record in requestState, keeping nothing locally. At that point tasks/get { taskId } without requestState has nothing to look up, which runs into the "MUST NOT return CreateTaskResult until tasks/get would find it" guarantee. Should we be explicit about the taskId always being sufficient as a standalone index of a task?

I don't think there's an inconsistency here? requestState is already on the request shape for tasks/get - the requirement is that the client echoes whatever the server gives it. So, in the case where the full task record is in requestState, the server would return the initial value in CreateTaskResult, the client would pick that up, and then it would echo it in tasks/get, maintaining the full record through that flow.

edit: updated, I misinterpreted this - noted here

A couple of schema regressions I noticed too:

  • CallToolRequestParams, CreateMessageRequestParams, and the ElicitRequest*Params types no longer extend anything after TaskAugmentedRequestParams was removed, so they've lost the RequestParams base and _meta? with it.
  • ServerRequest still includes GetTaskRequest and CancelTaskRequest even though client-hosted tasks are removed.

Ah, I missed that on #2557 - I'll make sure this is handled correctly when I write the schema changes here.

@He-Pin
Copy link
Copy Markdown
Contributor

He-Pin commented Apr 29, 2026

This is great, allows integration of various organizational extensions.

@pja-ant
Copy link
Copy Markdown
Contributor

pja-ant commented Apr 29, 2026

A TTL in integer seconds makes sense, but I'm not sure if a polling interval in integer seconds does - 500ms would be a reasonable polling interval for a relatively quick, but high-variance (1s-20s) task. A duration is probably better-expressed with units included in the value (e.g. "500ms"), but that would be nonstandard for us - I suppose I could name it pollIntervalMilliseconds, but that feels awkward and inconsistent in its own right, since nothing else includes units in the field name so far.

The option space is:

  1. Have everything as seconds
  2. Allow different units, but don't include it in the name or value
  3. Allow different units, but use a string (e.g. "500ms")
  4. Allow different units, and add it to the name

IMO:

  1. Too limiting - seconds isn't appropriate for everything
  2. Strongly prefer we don't do this. We know what happens: https://en.wikipedia.org/wiki/Mars_Climate_Orbiter
  3. An option, but IMO having to parse is just annoying.
  4. My strong preference. It's simple and avoids any confusion. It's a little more verbose.

I agree that (4) is non-standard, but IMO we just make it the standard starting now and make sure that TTL lists also adopts this standard.

@LucaButBoring
Copy link
Copy Markdown
Contributor Author

Following further discussions, ttlSeconds is now ttlMs and pollIntervalMilliseconds is now pollIntervalMs - both represent integer milliseconds. #2549 will use ttlMs as well.

@LucaButBoring
Copy link
Copy Markdown
Contributor Author

Pushed changes to remove tasks from the core specification and adjust remaining references to it now that #2322 is merged: 82fb2c4

@panyam
Copy link
Copy Markdown

panyam commented May 9, 2026

Wanted to propose a non-normative "Migration Notes" appendix as an expansion to the Backward Compatibility section. The existing section already mentions hybrid mode in one sentence (Implementations that need to bridge legacy clients can shim at the SDK level...). This turns that into actionable host-side guidance with per-connection routing rules, a recommended timeline, and an observability note for safe sunset.

We could perhaps add this betweeen ## Backward Compatibility and ## Security Implications.

How does verbatim sound? HHappy to iterate on wording or scope, and if you'd prefer to keep this PR focused on the core extension definition, this could land as a sibling SEP or a separate follow-up PR instead.

## Migration Notes

This section provides non-normative guidance for implementations migrating from the experimental `2025-11-25` tasks feature to the extension defined in this proposal.

### Migration is gated on protocol version

A client and server negotiate a protocol version in `initialize`. The extension surfaces only on connections that negotiate `2026-06-30` or later. Connections that negotiate `2025-11-25` continue to use the experimental tasks feature unchanged.

This means a deployment can support both versions during transition without forcing simultaneous client and server upgrades. A v1 client connecting to a hybrid-capable server keeps working against the `2025-11-25` surface. A v2-aware client connecting to the same server negotiates `2026-06-30` and exercises the extension.

### Server migration

#### Hybrid pattern (recommended during transition)

A server **MAY** accept both `2025-11-25` and `2026-06-30` connections during transition. Per-connection behavior:

- On a `2025-11-25` connection, advertise `capabilities.tasks` in the initialize response and serve the experimental tasks methods.
- On a `2026-06-30` connection, advertise `capabilities.extensions["io.modelcontextprotocol/tasks"]` and serve the extension methods. **MUST NOT** advertise `capabilities.tasks` (per [Backward Compatibility](#backward-compatibility)).

The two surfaces share no per-task state and operate independently. A task created over the experimental surface is not visible from the extension surface and vice versa.

#### Post-migration

Once a server's `2025-11-25` traffic falls below an organization-defined threshold, the server **MAY** drop `2025-11-25` support entirely. After that point only `2026-06-30` (and later) connections succeed. Clients still pinned to `2025-11-25` receive a protocol-version negotiation failure on `initialize`, which is the explicit signal to upgrade.

### Client migration

A v1-only client requires code changes to use the extension. The wire-shape mapping is described in [Backward Compatibility](#backward-compatibility). In summary:

- `tasks/result` is removed. Replace with polling via `tasks/get`. The result is inlined on the `tasks/get` response when the task reaches a terminal status.
- The `task` parameter on `CallToolRequest` is removed. Replace with declaring the extension at session level (or per-request via SEP-2575) and handling the polymorphic `resultType` discriminator on the response.
- `tasks/cancel` returns an empty acknowledgement instead of a rich task envelope. Observe the resulting `cancelled` status via the next `tasks/get` call.
- The `tasks/list` method is removed. There is no replacement (per [Security Implications](#security-implications) below).

### Recommended migration timeline

The following is non-normative guidance.

- Servers **SHOULD** support the hybrid pattern for at least one specification release cycle after `2026-06-30` reaches general availability, to give clients time to upgrade.
- Servers **MAY** end hybrid support sooner if they have direct visibility into their client population and have confirmed `2025-11-25` traffic is zero.
- Clients **SHOULD** prioritize migrating to the extension before any given server stops advertising the experimental surface. A hybrid server that drops `2025-11-25` support is not visible to v1 clients.

### Observability for safe sunset

To support a deliberate sunset decision, both sides **SHOULD** instrument which surface is in use.

- Servers **SHOULD** log, per session, which protocol version was negotiated and which task surface (experimental or extension) the session exercised. Aggregate `2025-11-25` traffic counts inform the sunset decision.
- Clients **SHOULD** report which task surface they exercised against each server, via deployment-side telemetry or operator reporting channels. This helps server operators understand the migration pace from the client side.

The extension does not require these instrumentation patterns. They are listed here because the absence of a `tasks/list` method means servers cannot otherwise inspect cross-session client behavior, so structured logging is the only practical signal.

@LucaButBoring
Copy link
Copy Markdown
Contributor Author

@panyam I think that's too detailed, and we'd be better off considering this a wholesale replacement for most purposes - we'd be better served by reducing it to just a couple of rules:

  1. If the negotiated protocol version is 2025-11-25, we disallow capabilities.extensions["io.modelcontextprotocol/tasks"] and follow all of the v2 semantics.
  2. If the negotiated protocol version is 2026-06-30, we disallow capabilities.tasks and follow all of the v1 semantics.

As for some immediate issues with the proposed guidance, servers that weren't already adopting v1 tasks will generally not want to do so during the migration period, and clients won't generally have any way of signaling simultaneous support for both versions to servers beyond their capability declarations. We also need to consider how things interact with #2575 - initialize doesn't exist in the new version and we signal on per-request client capabilities instead and use server/discover to communicate server capabilities. I think at this point, the harder we bifurcate this the better, otherwise we'll get stuck in a backwards-compatibility mire. I think this also follows suit with what e.g. #2322 and similar proposals are doing.

Comment thread docs/specification/draft/basic/utilities/mrtr.mdx Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move the Tasks documentation under Extensions, since this will be an official extension like Apps and Auth instead of just removing it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also get a re-write of this section to match what the SEP is proposing.

Comment thread docs/specification/draft/basic/utilities/mrtr.mdx Outdated
Comment thread docs/specification/draft/basic/utilities/progress.mdx
Comment thread docs/specification/draft/basic/utilities/cancellation.mdx
Comment thread seps/2663-tasks-extension.md Outdated
Comment thread seps/2663-tasks-extension.md Outdated
Comment thread seps/2663-tasks-extension.md
Comment thread seps/2663-tasks-extension.md Outdated

#### Request State Management

Servers **MAY** set an optional `requestState` string on any `Task` object to pass opaque routing or state information back to the client. When a client receives a `Task` with a `requestState` value, it **MUST** echo back the exact value of that field in the `requestState` field of subsequent `tasks/get`, `tasks/update`, and `tasks/cancel` requests for the same task. The server can use this echoed value to recover routing context or cache task metadata without maintaining per-task server-side session data, enabling stateless, load-balanced deployments. `requestState` is a best-effort optimization — servers **MUST NOT** depend on receiving the latest value for correctness, and **MUST** tolerate receiving a stale or outdated value gracefully (e.g. by falling back to a canonical lookup).
Copy link
Copy Markdown
Contributor

@CaitieM20 CaitieM20 May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a real use case for this? seems like its adding some weird complexity since all the state is supposed to be associated with the taskId.

Also its unclear how this interacts on the get vs the update methods (i.e. cancel, input, etcc)...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markdroth would love your thoughts here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing for now: 3f1c3cf

Comment thread seps/2663-tasks-extension.md Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

accepted-with-changes extension roadmap/agents Roadmap: Agent Communication (Tasks lifecycle) SEP

Projects

Status: Review Batch

Development

Successfully merging this pull request may close these issues.