SEP-2663: Tasks Extension#2663
Conversation
02b7564 to
7715860
Compare
|
Moving discussion from #2557 over here @CaitieM20 @markdroth @Randgalt @kurtisvg @localden @pja-ant @dsp-ant @maxisbey @maciej-kisiel @ylxlpl (Tagged everyone who commented on #2557) |
|
Thanks for putting this together, @LucaButBoring - I'll post the comment that I was typing earlier in #2557 and let you validate how much of this is still relevant. Some notes beyond the other bits I called out in the review. There's a few places where I think the SEP is a little underspecified:
A couple of schema regressions I noticed too:
|
|
@localden Thanks for the feedback, going through this:
This revision limits edit: updated
This revision does require keys to be unique over the lifetime of a task, and not reused between distinct requests.
Yup, that's how
I struck out that phrasing in this revision, now it can actually be either, as
To deal with that, in this revision,
A TTL in integer seconds makes sense, but I'm not sure if a polling interval in integer seconds does - 500ms would be a reasonable polling interval for a relatively quick, but high-variance (1s-20s) task. A duration is probably better-expressed with units included in the value (e.g. edit: updated to include units in the field names
Noted, I'll update the phrasing here - it actually doesn't really mean the MCP server fell over either, the literal intent is just that if the inner request returns a JSON-RPC error, that's edit: updated
edit: updated, I misinterpreted this - noted here
Ah, I missed that on #2557 - I'll make sure this is handled correctly when I write the schema changes here. |
|
This is great, allows integration of various organizational extensions. |
The option space is:
IMO:
I agree that (4) is non-standard, but IMO we just make it the standard starting now and make sure that TTL lists also adopts this standard. |
|
Following further discussions, |
|
Wanted to propose a non-normative "Migration Notes" appendix as an expansion to the Backward Compatibility section. The existing section already mentions hybrid mode in one sentence ( We could perhaps add this betweeen How does verbatim sound? HHappy to iterate on wording or scope, and if you'd prefer to keep this PR focused on the core extension definition, this could land as a sibling SEP or a separate follow-up PR instead. ## Migration Notes
This section provides non-normative guidance for implementations migrating from the experimental `2025-11-25` tasks feature to the extension defined in this proposal.
### Migration is gated on protocol version
A client and server negotiate a protocol version in `initialize`. The extension surfaces only on connections that negotiate `2026-06-30` or later. Connections that negotiate `2025-11-25` continue to use the experimental tasks feature unchanged.
This means a deployment can support both versions during transition without forcing simultaneous client and server upgrades. A v1 client connecting to a hybrid-capable server keeps working against the `2025-11-25` surface. A v2-aware client connecting to the same server negotiates `2026-06-30` and exercises the extension.
### Server migration
#### Hybrid pattern (recommended during transition)
A server **MAY** accept both `2025-11-25` and `2026-06-30` connections during transition. Per-connection behavior:
- On a `2025-11-25` connection, advertise `capabilities.tasks` in the initialize response and serve the experimental tasks methods.
- On a `2026-06-30` connection, advertise `capabilities.extensions["io.modelcontextprotocol/tasks"]` and serve the extension methods. **MUST NOT** advertise `capabilities.tasks` (per [Backward Compatibility](#backward-compatibility)).
The two surfaces share no per-task state and operate independently. A task created over the experimental surface is not visible from the extension surface and vice versa.
#### Post-migration
Once a server's `2025-11-25` traffic falls below an organization-defined threshold, the server **MAY** drop `2025-11-25` support entirely. After that point only `2026-06-30` (and later) connections succeed. Clients still pinned to `2025-11-25` receive a protocol-version negotiation failure on `initialize`, which is the explicit signal to upgrade.
### Client migration
A v1-only client requires code changes to use the extension. The wire-shape mapping is described in [Backward Compatibility](#backward-compatibility). In summary:
- `tasks/result` is removed. Replace with polling via `tasks/get`. The result is inlined on the `tasks/get` response when the task reaches a terminal status.
- The `task` parameter on `CallToolRequest` is removed. Replace with declaring the extension at session level (or per-request via SEP-2575) and handling the polymorphic `resultType` discriminator on the response.
- `tasks/cancel` returns an empty acknowledgement instead of a rich task envelope. Observe the resulting `cancelled` status via the next `tasks/get` call.
- The `tasks/list` method is removed. There is no replacement (per [Security Implications](#security-implications) below).
### Recommended migration timeline
The following is non-normative guidance.
- Servers **SHOULD** support the hybrid pattern for at least one specification release cycle after `2026-06-30` reaches general availability, to give clients time to upgrade.
- Servers **MAY** end hybrid support sooner if they have direct visibility into their client population and have confirmed `2025-11-25` traffic is zero.
- Clients **SHOULD** prioritize migrating to the extension before any given server stops advertising the experimental surface. A hybrid server that drops `2025-11-25` support is not visible to v1 clients.
### Observability for safe sunset
To support a deliberate sunset decision, both sides **SHOULD** instrument which surface is in use.
- Servers **SHOULD** log, per session, which protocol version was negotiated and which task surface (experimental or extension) the session exercised. Aggregate `2025-11-25` traffic counts inform the sunset decision.
- Clients **SHOULD** report which task surface they exercised against each server, via deployment-side telemetry or operator reporting channels. This helps server operators understand the migration pace from the client side.
The extension does not require these instrumentation patterns. They are listed here because the absence of a `tasks/list` method means servers cannot otherwise inspect cross-session client behavior, so structured logging is the only practical signal. |
|
@panyam I think that's too detailed, and we'd be better off considering this a wholesale replacement for most purposes - we'd be better served by reducing it to just a couple of rules:
As for some immediate issues with the proposed guidance, servers that weren't already adopting v1 tasks will generally not want to do so during the migration period, and clients won't generally have any way of signaling simultaneous support for both versions to servers beyond their capability declarations. We also need to consider how things interact with #2575 - |
…ontextprotocol into feat/ext-tasks
f2eb965 to
e768085
Compare
There was a problem hiding this comment.
I think we should move the Tasks documentation under Extensions, since this will be an official extension like Apps and Auth instead of just removing it.
There was a problem hiding this comment.
Can we also get a re-write of this section to match what the SEP is proposing.
|
|
||
| #### Request State Management | ||
|
|
||
| Servers **MAY** set an optional `requestState` string on any `Task` object to pass opaque routing or state information back to the client. When a client receives a `Task` with a `requestState` value, it **MUST** echo back the exact value of that field in the `requestState` field of subsequent `tasks/get`, `tasks/update`, and `tasks/cancel` requests for the same task. The server can use this echoed value to recover routing context or cache task metadata without maintaining per-task server-side session data, enabling stateless, load-balanced deployments. `requestState` is a best-effort optimization — servers **MUST NOT** depend on receiving the latest value for correctness, and **MUST** tolerate receiving a stale or outdated value gracefully (e.g. by falling back to a canonical lookup). |
There was a problem hiding this comment.
Do we have a real use case for this? seems like its adding some weird complexity since all the state is supposed to be associated with the taskId.
Also its unclear how this interacts on the get vs the update methods (i.e. cancel, input, etcc)...
Co-authored-by: Caitie McCaffrey <caitiem20@github.com>
This SEP defines an extension that allows a server respond to a
tools/callrequest with an asynchronous task handle instead of a final result, allowing the client to retrieve the eventual result by polling. The extension introduces three methods:tasks/get,tasks/update, andtasks/cancel; a polymorphic-result discriminator (resultType: "task"); and aTaskshape that carries a task status, in-progress server-to-client requests, and a final result or error. Task creation is server-directed: the client signals support by including the extension in its per-request capabilities, and the server decides on a per-request basis whether to materialize a task.Tasks will become a foundational building block of MCP and are expected to be supported in future protocol versions. The experimental
tasksfeature in the2025-11-25specification served as a stopgap until the protocol's extension mechanism was available. Now that extensions have been formalized, moving tasks to an official extension gives the feature time to incubate and evolve based on additional real-world implementation feedback, without being constrained by the core specification's release cadence. Once the extension has stabilized and achieved broad adoption, it is intended to be promoted into the core protocol.This proposal removes the version of tasks specified in the
2025-11-25release. It is shaped by implementation feedback since that release and by several changes to the base protocol expected to arrive in the2026-06-30specification:Motivation and Context
The experimental tasks feature served as an alternate execution mode for tool calls, elicitation, and sampling, allowing receivers to return a poll handle instead of blocking until a final result was ready. Implementation experience surfaced several challenges:
The handshake is fragile. Tasks today expose method-level capabilities (
tasks.requests.tools.calldeclares thattools/callMAY be task-augmented) alongside a tool-levelexecution.taskSupportfield that declares whether a particular tool will accept the augmentation. Clients express their own support for tasks by passing ataskparameter on their requests, but MUST NOT include it if the method/tool does not support tasks. A client that wants to opt into tasks must therefore prime its state with atools/listcall before issuing any task-augmented request, and cannot blindly attach ataskparameter to every request to handle tools isomorphically. This is confusing, implicit, and easy to get wrong.tasks/resultis a blocking trap. In the current flow, a client that observesinput_requiredis required to calltasks/resultprematurely so that the server has an SSE stream on which to side-channel elicitation or sampling requests.tasks/resultthen blocks until the entire operation completes. This forces long-lived persistent connections that many clients and servers do not want to implement, and it conflicts with SEP-2260, which disallows unsolicited server-to-client requests outright. Under SEP-2260, the SSE semantics that justified the blocking behavior no longer apply.tasks/listscoping cannot be defined. To avoid clients cancelling or retrieving results for tasks they shouldn't have access to, all tasks should be bound to some sort of "authorization context," the implementation of which is left to individual servers according to their existing bespoke permission models. However, in many cases, it is not possible to perform this binding, in which case the task ID becomes the only line of defense against contamination. In this scenario, it is unsafe for a server to supporttasks/listat all. While it was possible for tasks to instead be bound to a session, SEP-2567 removes sessions from the protocol. There is no other natural scope a server can define unilaterally — task IDs can be unguessable handles that a server can recognize one at a time, but servers cannot reliably correlate two unrelated handles to the same caller without additional state.Beyond implementation challenges, tasks face another structural issue: Client-hosted tasks are no longer expressible. SEP-1686 permitted clients to host tasks for elicitation and sampling, in part to avoid coupling tasks to tool calls. SEP-2260 makes any unsolicited server-to-client request invalid; every server-to-client polling request under client-hosted tasks would be unsolicited by definition.
This proposal intends to solve the above issues by redesigning certain aspects of the feature and moving tasks out to an official extension. Redefining tasks as an official extension gives the feature more time to incubate and evolve independently of the core specification, promoting adoption. As part of the redesign, this proposal consolidates the polling lifecycle into
tasks/getand a newtasks/updateto remove the blockingtasks/resultmethod. The redesign allows servers to return tasks unsolicited (in response to ordinary, non-task-flagged requests) to eliminate the per-request opt-in and thetools/listwarmup, relying instead on the extension capability as the single handshake point. Finally, this proposal removes client-hosted elicitation and sampling tasks in compliance with SEP-2260.How Has This Been Tested?
Conformance test suite: modelcontextprotocol/conformance#262
Breaking Changes
Described in proposal.
Types of changes
Checklist
Additional context
Supersedes #2557.
AI Use Disclosure: The extension SEP document in this PR was initially drafted using
claude.aiwith the previous iteration as a reference. I rewrote/rephrased many sections myself and verified its correctness, usingclaude.aias a reviewer to iteratively scrub out issues.