SEP-2549: TTL for List Results#2549
SEP-2549: TTL for List Results#2549CaitieM20 wants to merge 4 commits intomodelcontextprotocol:mainfrom
Conversation
pja-ant
left a comment
There was a problem hiding this comment.
Looks good! Thanks for putting this together so quickly. A few relatively minor things inline.
|
|
||
| ```typescript | ||
| /** @internal */ | ||
| export interface PaginatedResult extends Result { |
There was a problem hiding this comment.
While convenient (as all list results extend this), I feel like this belongs in a different interface. Not all paginated things may have TTLs and if we introduce something in future that wants pagination but no TTL (e.g. a tool result returning a paginated list of things) then I think we'd regret this.
There was a problem hiding this comment.
totally fair, I can pull this out of the paginated interface
| * seconds after receiving the response. The client SHOULD NOT re-fetch | ||
| * before the TTL expires unless it receives a list_changed notification. | ||
| */ | ||
| ttl?: number; |
There was a problem hiding this comment.
nit: ttlSeconds to encode units in schema?
Sadness: I just noticed that Task has ttl (and pollInterval) and it is milliseconds :(
(cc @LucaButBoring - may want to consider fixing this while it is experimental...)
| } | ||
| ``` | ||
|
|
||
| > **Open Question — TTL format**: An alternative representation is an ISO 8601 duration string (e.g., `"PT5M"` for 5 minutes). Integer seconds are simpler, consistent with HTTP `max-age`, and easier to compare arithmetically. ISO 8601 durations are more human-readable and used in some Azure/AWS APIs. Community input is welcome on which format to adopt. The remainder of this specification uses integer seconds for illustration. |
There was a problem hiding this comment.
seconds are good IMO - keeps it simple
|
|
||
| ## Abstract | ||
|
|
||
| This SEP proposes adding an optional `ttl` (time-to-live) field to the result objects returned by `tools/list`, `prompts/list`, `resources/list`, and `resources/templates/list`. The TTL tells clients how long the response may be considered fresh before re-fetching. This allows clients to cache feature lists and poll on a predictable schedule, reducing reliance on server-push `list_changed` notifications while remaining fully backward compatible. TTL supplements rather than replaces the existing notification mechanism — both can coexist. |
There was a problem hiding this comment.
Should we consider resource/read here also, so that you don't need to use resource subscriptions to check resource freshness?
There was a problem hiding this comment.
I was going to say exactly the same thing.
The overall goal here is to make notifications (which essentially require an SSE stream) a purely optional optimization. That implies that anything that we should attach TTLs to everything that currently has a notification. Looking at the schema, I see the following notification types that I think should be included in this SEP:
- ResourceListChangedNotification: already covered here
- PromptListChangedNotification: already covered here
- ToolListChangedNotification: already covered here
- ResourceUpdatedNotification: we should add this
I think the remaining notification types should not be added here, for the following reasons:
- InitializedNotification: going away in SEP-1442
- CancelledNotification: not relevant (and shouldn't be needed anywhere but the stdio transport once we have MRTR)
- RootsListChangedNotification: not relevant (and since this is a client-generated notification, we should probably remove this in a post-MRTR world)
- ProgressNotification: not relevant (might go away in favor of tasks in the long run)
- TaskStatusNotification: not relevant (and may get removed as we revamp tasks?)
- LoggingMessageNotification: not relevant (and may get removed?)
- ElicitationCompleteNotification: not relevant (and maybe we should removing this as part of MRTR, unless we decide to keep URL elicitations in the ephemeral workflow)
There was a problem hiding this comment.
If this is the goal, have we considered a field where the server would return a server version per request? This would allow the client to invalidate its cache when a new server version is detected, which would be more direct than a TTL.
Server versioning is a big subject with various pitfalls, but assuming that most tools stay relatively constant between versions it's generally likely that the updated version will allow the client to adapt most seamlessly in most cases.
The main issue generally with a TTL is that it will be hard to set except in cases where servers deploy consistently at a given time.
|
|
||
| ### No new capability flag | ||
|
|
||
| No new capability flag is needed. The `ttl` field is optional on the response object. Servers that do not wish to provide a TTL simply omit the field. Clients that do not understand the field ignore it per standard JSON handling of unknown properties. |
There was a problem hiding this comment.
Clients that do not understand the field ignore it per standard JSON handling of unknown properties.
I think this might want to be left out -- it implies clients can ignore it which contradicts the previous SHOULD's
| ### Error handling | ||
|
|
||
| - If `ttl` is present but is not a non-negative integer, the client SHOULD ignore it and behave as if it were absent. | ||
| - Clients MUST NOT treat a missing `ttl` as an implicit TTL of 0 or any other value. |
There was a problem hiding this comment.
nit: SHOULD NOT?
I don't think we can stop them...
There was a problem hiding this comment.
whoops got too many double negatives there and its confusing, rephrased. Goal is to say if its a negative integer the clients SHOULD ignore it.
markdroth
left a comment
There was a problem hiding this comment.
Thanks for writing this up, Caitie! Overall, I think this is definitely the right direction.
|
|
||
| ## Abstract | ||
|
|
||
| This SEP proposes adding an optional `ttl` (time-to-live) field to the result objects returned by `tools/list`, `prompts/list`, `resources/list`, and `resources/templates/list`. The TTL tells clients how long the response may be considered fresh before re-fetching. This allows clients to cache feature lists and poll on a predictable schedule, reducing reliance on server-push `list_changed` notifications while remaining fully backward compatible. TTL supplements rather than replaces the existing notification mechanism — both can coexist. |
There was a problem hiding this comment.
I was going to say exactly the same thing.
The overall goal here is to make notifications (which essentially require an SSE stream) a purely optional optimization. That implies that anything that we should attach TTLs to everything that currently has a notification. Looking at the schema, I see the following notification types that I think should be included in this SEP:
- ResourceListChangedNotification: already covered here
- PromptListChangedNotification: already covered here
- ToolListChangedNotification: already covered here
- ResourceUpdatedNotification: we should add this
I think the remaining notification types should not be added here, for the following reasons:
- InitializedNotification: going away in SEP-1442
- CancelledNotification: not relevant (and shouldn't be needed anywhere but the stdio transport once we have MRTR)
- RootsListChangedNotification: not relevant (and since this is a client-generated notification, we should probably remove this in a post-MRTR world)
- ProgressNotification: not relevant (might go away in favor of tasks in the long run)
- TaskStatusNotification: not relevant (and may get removed as we revamp tasks?)
- LoggingMessageNotification: not relevant (and may get removed?)
- ElicitationCompleteNotification: not relevant (and maybe we should removing this as part of MRTR, unless we decide to keep URL elicitations in the ephemeral workflow)
|
|
||
| This approach has several limitations: | ||
|
|
||
| 1. **Stateless and HTTP-based transports**: Clients communicating over stateless transports (e.g., pure HTTP request/response without SSE or WebSocket) cannot receive server-push notifications. These clients have no guidance on when to re-poll and must either poll excessively or risk stale data. |
There was a problem hiding this comment.
To be fair, the current notification mechanism does work with SSE streams. However, SSE streams are a problem to support in a lot of environments, for the same reasons that we argued in the MRTR SEP. So it might make sense to explicitly say here that we want to make SSE streams a purely optional part of the protocol, used only as an optimization.
There was a problem hiding this comment.
yup fair, updated to clarify
| * SHOULD NOT serve a cached copy. | ||
| * - If positive, the client SHOULD consider the list fresh for this many | ||
| * seconds after receiving the response. The client SHOULD NOT re-fetch | ||
| * before the TTL expires unless it receives a list_changed notification. |
There was a problem hiding this comment.
I think we should also make allowances for the client to re-fetch even before the TTL has expired if it has some other reason to believe that the list has been invalidated. For example, if the client makes a tool call and gets a result back with isError set to true, that indicates some sort of validation error, which might be caused by the tool schema having changed since the last time the tool list was fetched.
|
|
||
| Clients SHOULD NOT treat TTL as a polling interval that triggers automatic background refetches. The TTL is a **freshness hint**: the client checks freshness when it needs the list, and re-fetches only if stale. Implementations that do choose to poll SHOULD apply jitter and backoff. | ||
|
|
||
| ### Interaction with `list_changed` notifications |
There was a problem hiding this comment.
Have we considered how this will affect the SDK APIs? There probably aren't any major problems here, but let's do our due diligence.
I know that (e.g.) the python SDK automatically fetches the tool list if not cached before sending a tool call, so that it can perform input schema validation. That part can obviously just look at the TTL to determine when to refresh the tool list.
However, I think there's also a direct API call to fetch the tool list and return it to the application. I guess we'd need to change that API to return the cached tool list if we haven't yet hit the TTL, right? Or would we want it to proactively re-fetch, because it was explicitly asked to do so by the application?
What happens if the application is expecting that the tool list it was given by the SDK remains valid until it receives a list changed notification, but the server doesn't support list changed notifications? Given that tool list notifications are optional even today, I guess this is already possible, but I'm not sure how the SDKs handle this -- we should make sure there aren't any surprises here.
|
|
||
| When a list result includes `nextCursor` (indicating more pages), the `ttl` applies to the **entire paginated list**, not to individual pages. Specifically: | ||
|
|
||
| - The TTL SHOULD only appear on every page with the same value. Clients SHOULD use the TTL from the last page they fetched to determine freshness. |
There was a problem hiding this comment.
Hmm. What happens if a client takes its sweet time fetching all the pages? Let's say the TTL is 1 hour, and it takes 45 mins for the client to fetch all the pages. It seems like this might artificially inflate the TTL.
Actually, what happens if the list gets updated between fetches of different pages? Presumably this is a problem even today, but how does the client know that the two pages actually come from two different lists? I'm wondering if we should have a generation ID on the paginated reponse, so that the client can tell if the list changes in the middle of fetching the pages.
(I realize that this may be slightly tangential to the main point of this SEP, but we should at least consider if there are problems here.)
|
|
||
| Integer seconds is the most common representation across these systems. | ||
|
|
||
| ### Why not use HTTP caching directly? |
There was a problem hiding this comment.
At one point we were considering ETags. Did we discard that approach?
The reason I liked it is it means if tool descriptions haven't changed for weeks, you don't have to refetch at all. It also gives a good indicator of whether the client has actually refetched, so it makes it easier for a server to reject requests if it really cares.
A TTL is conceptually simpler, and maybe has nicer failure modes (e.g. does setting an ETag and never rejecting mean a client never re-lists? is that what you want?), but doesn't give the same level of control to server authors.
There was a problem hiding this comment.
We discussed in transports and we just wanted to keep things simple for now since we're close to deadline. Could add the ETag stuff next version.
Motivation and Context
This SEP proposes adding an optional
ttl(time-to-live) field to the result objects returned bytools/list,prompts/list,resources/list, andresources/templates/list. The TTL tells clients how long the response may be considered fresh before re-fetching. This allows clients to cache feature lists and poll on a predictable schedule, reducing reliance on server-pushlist_changednotifications while remaining fully backward compatible. TTL supplements rather than replaces the existing notification mechanism — both can coexist.See SEP for more details.
How Has This Been Tested?
Not yet
Breaking Changes
No
Types of changes
Checklist
Additional context
This is part of the list of transport priorities agreed upon by core maintainers in December 2025 blob post