-
Notifications
You must be signed in to change notification settings - Fork 1.6k
RFC: add Tool.outputSchema and CallToolResult.structuredContent
#371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -181,14 +181,19 @@ A tool definition includes: | |
| - `name`: Unique identifier for the tool | ||
| - `description`: Human-readable description of functionality | ||
| - `inputSchema`: JSON Schema defining expected parameters | ||
| - `outputSchema`: Optional JSON Schema defining expected output structure | ||
| - `annotations`: optional properties describing tool behavior | ||
|
|
||
| <Warning>For trust & safety and security, clients **MUST** consider | ||
| tool annotations to be untrusted unless they come from trusted servers.</Warning> | ||
|
|
||
| ### Tool Result | ||
|
|
||
| Tool results can contain multiple content items of different types: | ||
| Tool results may be **structured** or **unstructured**, depending on whether the tool definition specifies an [output schema](#output-schema). | ||
|
|
||
| **Structured** tool results are JSON objects that are valid with respect to the tool's output schema. | ||
|
|
||
| **Unstructured** tool results can contain multiple content items of different types: | ||
|
|
||
| #### Text Content | ||
|
|
||
|
|
@@ -235,6 +240,150 @@ or data, behind a URI that can be subscribed to or fetched again by the client l | |
| } | ||
| ``` | ||
|
|
||
| ### Output Schema | ||
|
|
||
| Tools that produce structured results can use the `outputSchema` property to provide a JSON Schema describing the expected structure of their output. | ||
|
|
||
| When a tool specifies an `outputSchema`: | ||
|
|
||
| 1. Clients **MUST** validate that results from that tool contain a `structuredContent` field whose contents validate against the declared `outputSchema`. | ||
|
|
||
| 2. Servers **MUST** provide structured results in `structuredContent` that conform to the declared `outputSchema` of the tool. | ||
|
|
||
| <Info> | ||
| For backwards compatibility, a tool that declares an `outputSchema` may also return unstructured results in the `content` field. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this be based on the MCP client version string during version negotiation rather than doing this unconditionally?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems useful to avoid mandating complex version-dependent logic here. (Paraphrasing offline discussion with @ihrpr:) In practice the SDKs will be handling construction of actual results, so it's useful to leave some freedom, and client/server devs will be spared the implementation details in any case. But even if both formats are sent unconditionally, the perf impact isn't huge, definitely not worth baking complexity into the spec to avoid. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why does it say "for backwards compatibility"? Does that mean structured output is always preferable over unstructured output? Why not specifically encourage tools to return both an unstructured and a structured response, so that the MCP client can pick the desired one based on the use case? For example, an unstructured response can be optimized for an LLM to read (e.g. when chatting with Claude), and a structured result can be used when further transformations are required. Furthermore, you might also want to allow structured results contain extra data like IDs or timestamps that are not very meaningful for an LLM, but can be useful when doing data transformations (This also contradicts the bullet points below). |
||
| * If present, the unstructured result should be functionally equivalent to the structured result. (For example, serialized JSON can be returned in a `TextContent` block.) | ||
| * Clients that support `structuredContent` should ignore the `content` field if present. | ||
| </Info> | ||
|
|
||
| Example tool with output schema: | ||
|
|
||
| ```json | ||
| { | ||
| "name": "get_weather_data", | ||
| "description": "Get current weather conditions and forecast data for a location", | ||
| "inputSchema": { | ||
| "type": "object", | ||
| "properties": { | ||
| "location": { | ||
| "type": "string", | ||
| "description": "City name or zip code" | ||
| }, | ||
| "units": { | ||
| "type": "string", | ||
| "enum": ["celsius", "fahrenheit"], | ||
| "default": "celsius", | ||
| "description": "Temperature unit" | ||
| } | ||
| }, | ||
| "required": ["location"] | ||
| }, | ||
| "outputSchema": { | ||
| "type": "object", | ||
| "properties": { | ||
| "current": { | ||
| "type": "object", | ||
| "properties": { | ||
| "temperature": { "type": "number" }, | ||
| "humidity": { "type": "number" }, | ||
| "conditions": { "type": "string" }, | ||
| "wind": { | ||
| "type": "object", | ||
| "properties": { | ||
| "speed": { "type": "number" }, | ||
| "direction": { "type": "string" } | ||
| }, | ||
| "required": ["speed", "direction"] | ||
| } | ||
| }, | ||
| "required": ["temperature", "humidity", "conditions", "wind"] | ||
| }, | ||
| "forecast": { | ||
| "type": "array", | ||
| "items": { | ||
| "type": "object", | ||
| "properties": { | ||
| "date": { "type": "string", "format": "date" }, | ||
| "high": { "type": "number" }, | ||
| "low": { "type": "number" }, | ||
| "conditions": { "type": "string" } | ||
| }, | ||
| "required": ["date", "high", "low", "conditions"] | ||
| } | ||
| }, | ||
| "location": { | ||
| "type": "object", | ||
| "properties": { | ||
| "city": { "type": "string" }, | ||
| "country": { "type": "string" }, | ||
| "coordinates": { | ||
| "type": "object", | ||
| "properties": { | ||
| "latitude": { "type": "number" }, | ||
| "longitude": { "type": "number" } | ||
| }, | ||
| "required": ["latitude", "longitude"] | ||
| } | ||
| }, | ||
| "required": ["city", "country", "coordinates"] | ||
| } | ||
| }, | ||
| "required": ["current", "forecast", "location"] | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| Example valid response for this tool: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we also add an example for a valid response that is b/w comaptible? |
||
|
|
||
| ```json | ||
| { | ||
| "jsonrpc": "2.0", | ||
| "id": 5, | ||
| "result": { | ||
| "structuredContent": { | ||
| "current": { | ||
| "temperature": 22.5, | ||
| "humidity": 65, | ||
| "conditions": "Partly cloudy", | ||
| "wind": { | ||
| "speed": 12, | ||
| "direction": "NW" | ||
| } | ||
| }, | ||
| "forecast": [ | ||
| { | ||
| "date": "2024-03-28", | ||
| "high": 25, | ||
| "low": 18, | ||
| "conditions": "Sunny" | ||
| }, | ||
| { | ||
| "date": "2024-03-29", | ||
| "high": 23, | ||
| "low": 17, | ||
| "conditions": "Cloudy" | ||
| } | ||
| ], | ||
| "location": { | ||
| "city": "San Francisco", | ||
| "country": "US", | ||
| "coordinates": { | ||
| "latitude": 37.7749, | ||
| "longitude": -122.4194 | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| The `outputSchema` helps clients and LLMs understand and properly handle structured tool outputs by: | ||
|
|
||
| - Enabling strict schema validation of responses | ||
| - Providing type information for better integration with programming languages | ||
| - Guiding clients and LLMs to properly parse and utilize the returned data | ||
| - Supporting better documentation and developer experience | ||
|
|
||
| ## Error Handling | ||
|
|
||
| Tools use two error reporting mechanisms: | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -101,14 +101,69 @@ | |
| "type": "object" | ||
| }, | ||
| "CallToolResult": { | ||
| "description": "The server's response to a tool call.\n\nAny errors that originate from the tool SHOULD be reported inside the result\nobject, with `isError` set to true, _not_ as an MCP protocol-level error\nresponse. Otherwise, the LLM would not be able to see that an error occurred\nand self-correct.\n\nHowever, any errors in _finding_ the tool, an error indicating that the\nserver does not support tool calls, or any other exceptional conditions,\nshould be reported as an MCP error response.", | ||
| "anyOf": [ | ||
| { | ||
| "$ref": "#/definitions/CallToolUnstructuredResult" | ||
| }, | ||
| { | ||
| "$ref": "#/definitions/CallToolStructuredResult" | ||
| } | ||
| ], | ||
| "description": "The server's response to a tool call.\n\nAny errors that originate from the tool SHOULD be reported inside the result\nobject, with `isError` set to true, _not_ as an MCP protocol-level error\nresponse. Otherwise, the LLM would not be able to see that an error occurred\nand self-correct.\n\nHowever, any errors in _finding_ the tool, an error indicating that the\nserver does not support tool calls, or any other exceptional conditions,\nshould be reported as an MCP error response." | ||
| }, | ||
| "CallToolStructuredResult": { | ||
| "description": "Tool result for tools that do declare an outputSchema.", | ||
| "properties": { | ||
| "_meta": { | ||
| "additionalProperties": {}, | ||
| "description": "This result property is reserved by the protocol to allow clients and servers to attach additional metadata to their responses.", | ||
| "type": "object" | ||
| }, | ||
| "content": { | ||
| "description": "If the Tool defines an outputSchema, this field MAY be present in the result.\nTools should use this field to provide compatibility with older clients that do not support structured content.\nClients that support structured content should ignore this field.", | ||
| "items": { | ||
| "anyOf": [ | ||
| { | ||
| "$ref": "#/definitions/TextContent" | ||
| }, | ||
| { | ||
| "$ref": "#/definitions/ImageContent" | ||
| }, | ||
| { | ||
| "$ref": "#/definitions/AudioContent" | ||
| }, | ||
| { | ||
| "$ref": "#/definitions/EmbeddedResource" | ||
| } | ||
| ] | ||
| }, | ||
| "type": "array" | ||
| }, | ||
| "isError": { | ||
| "description": "Whether the tool call ended in an error.\n\nIf not set, this is assumed to be false (the call was successful).", | ||
| "type": "boolean" | ||
| }, | ||
| "structuredContent": { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thinking about this, given that outputSchema is any arbitrary json schema (which makes sense), should this be of type
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah see above - on balance I think the sticking with the top-level-object restriction in the rest of the protocol (and standard practice more generally) is worth the extra trouble of wrapping top-level primitives/arrays |
||
| "additionalProperties": {}, | ||
| "description": "An object containing structured tool output.\n\nIf the Tool defines an outputSchema, this field MUST be present in the result, and contain a JSON object that matches the schema.", | ||
| "type": "object" | ||
| } | ||
| }, | ||
| "required": [ | ||
| "structuredContent" | ||
| ], | ||
| "type": "object" | ||
| }, | ||
| "CallToolUnstructuredResult": { | ||
| "description": "Tool result for tools that do not declare an outputSchema.", | ||
| "properties": { | ||
| "_meta": { | ||
| "additionalProperties": {}, | ||
| "description": "This result property is reserved by the protocol to allow clients and servers to attach additional metadata to their responses.", | ||
| "type": "object" | ||
| }, | ||
| "content": { | ||
| "description": "A list of content objects that represent the result of the tool call.\n\nIf the Tool does not define an outputSchema, this field MUST be present in the result.", | ||
| "items": { | ||
| "anyOf": [ | ||
| { | ||
|
|
@@ -358,6 +413,25 @@ | |
| ], | ||
| "type": "object" | ||
| }, | ||
| "ContentList": { | ||
| "items": { | ||
| "anyOf": [ | ||
| { | ||
| "$ref": "#/definitions/TextContent" | ||
| }, | ||
| { | ||
| "$ref": "#/definitions/ImageContent" | ||
| }, | ||
| { | ||
| "$ref": "#/definitions/AudioContent" | ||
| }, | ||
| { | ||
| "$ref": "#/definitions/EmbeddedResource" | ||
| } | ||
| ] | ||
| }, | ||
| "type": "array" | ||
| }, | ||
| "CreateMessageRequest": { | ||
| "description": "A request from the server to sample an LLM via the client. The client has full discretion over which model to select. The client should also inform the user before beginning sampling, to allow them to inspect the request (human in the loop) and decide whether to approve it.", | ||
| "properties": { | ||
|
|
@@ -1904,7 +1978,10 @@ | |
| "$ref": "#/definitions/ListToolsResult" | ||
| }, | ||
| { | ||
| "$ref": "#/definitions/CallToolResult" | ||
| "$ref": "#/definitions/CallToolUnstructuredResult" | ||
| }, | ||
| { | ||
| "$ref": "#/definitions/CallToolStructuredResult" | ||
| }, | ||
| { | ||
| "$ref": "#/definitions/CompleteResult" | ||
|
|
@@ -2049,6 +2126,12 @@ | |
| "name": { | ||
| "description": "The name of the tool.", | ||
| "type": "string" | ||
| }, | ||
| "outputSchema": { | ||
| "additionalProperties": true, | ||
| "description": "An optional JSON Schema object defining the structure of the tool's output.\n\nIf set, a CallToolResult for this Tool MUST contain a structuredContent field whose contents validate against this schema.\nIf not set, a CallToolResult for this Tool MUST contain a content field.", | ||
| "properties": {}, | ||
| "type": "object" | ||
| } | ||
| }, | ||
| "required": [ | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we explicitly have some b/w compat recommendations based on the MCP version string header i.e. if the client is on an older version, we return the json serialized text in a content text block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My feeling was that we should avoid being overly proscriptive/fine-grained about logic based on client versioning, but adding a suggestion about serialized JSON in a text block is a great idea. Thanks!