Skip to content

Commit beb7be5

Browse files
QuentinAmbardQuentin Ambardclaude
authored
Fix MCP server crash on cancellation and structured_content validation (#411)
* Fix MCP server crash on request cancellation When a client cancels a long-running MCP request, there's a race condition between the cancellation and normal response paths: 1. Client cancels request → RequestResponder.cancel() sends error response and sets _completed = True 2. Middleware catches CancelledError and returns a ToolResult 3. MCP SDK tries to call message.respond(response) 4. Crash: assert not self._completed fails Fix: Re-raise CancelledError instead of returning a result, allowing the MCP SDK's cancellation handler to properly manage the response lifecycle. See: modelcontextprotocol/python-sdk#1153 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add structured_content to error responses for MCP SDK validation When tools have an outputSchema (auto-generated from return type like Dict[str, Any]), MCP SDK requires structured_content in all responses. The middleware was returning ToolResult without structured_content for error cases (timeout, exceptions), causing validation errors: "Output validation error: outputSchema defined but no structured output returned" Fix: Include structured_content with the same error data in all error responses. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix structured_content not populated for tools with return type annotations FastMCP auto-generates outputSchema from return type annotations (e.g., -> Dict[str, Any]) but doesn't populate structured_content in ToolResult. MCP SDK validation then fails: "outputSchema defined but no structured output" Fix: Intercept successful results and populate structured_content from JSON text content when missing. Only modifies results when: 1. structured_content is missing 2. There's exactly one TextContent item 3. The text is valid JSON that parses to a dict 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(mcp): apply async wrapper on all platforms to prevent cancellation crashes The asyncio.to_thread() wrapper was only applied on Windows, but it's needed on ALL platforms to enable proper cancellation handling. Without this fix, when a sync tool runs longer than the client timeout: 1. Client sends cancellation 2. Sync tool blocks event loop, can't receive CancelledError 3. Tool eventually returns, but MCP SDK already responded to cancel 4. AssertionError: "Request already responded to" → server crashes This was discovered when uploading 7,375 files triggered a timeout, crashing the MCP server on macOS. Extends the fix from PR #411 which added CancelledError handling in middleware - that fix only works when cancellation can propagate, which requires async execution via to_thread(). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix: don't set structured_content on error responses Setting structured_content causes MCP SDK to validate it against the tool's outputSchema. For error responses, the error dict {"error": True, ...} doesn't match the expected return type (e.g., Union[str, List[Dict]]), causing "Output validation error: 'result' is a required property". Fix: Only set structured_content for successful responses, not errors. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Quentin Ambard <quentin.ambard@databricks.com> Co-authored-by: Claude <noreply@anthropic.com>
1 parent edc4f67 commit beb7be5

2 files changed

Lines changed: 72 additions & 63 deletions

File tree

databricks-mcp-server/databricks_mcp_server/middleware.py

Lines changed: 52 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
Provides cross-cutting concerns like timeout and error handling for all MCP tool calls.
55
"""
66

7-
import asyncio
7+
import anyio
88
import json
99
import logging
1010
import traceback
@@ -44,7 +44,27 @@ async def on_call_tool(
4444
arguments = context.message.arguments
4545

4646
try:
47-
return await call_next(context)
47+
result = await call_next(context)
48+
49+
# Fix for FastMCP not populating structured_content automatically.
50+
# When a tool has a return type annotation (e.g., -> Dict[str, Any]),
51+
# FastMCP generates an outputSchema but doesn't set structured_content.
52+
# MCP SDK then fails validation: "outputSchema defined but no structured output"
53+
# We fix this by parsing the JSON text content and setting structured_content.
54+
if result and not result.structured_content and result.content:
55+
if len(result.content) == 1 and isinstance(result.content[0], TextContent):
56+
try:
57+
parsed = json.loads(result.content[0].text)
58+
if isinstance(parsed, dict):
59+
# Create new ToolResult with structured_content populated
60+
result = ToolResult(
61+
content=result.content,
62+
structured_content=parsed,
63+
)
64+
except (json.JSONDecodeError, TypeError):
65+
pass # Not valid JSON, leave as-is
66+
67+
return result
4868

4969
except TimeoutError as e:
5070
# In Python 3.11+, asyncio.TimeoutError is an alias for TimeoutError,
@@ -53,47 +73,33 @@ async def on_call_tool(
5373
"Tool '%s' timed out. Returning structured result.",
5474
tool_name,
5575
)
76+
# Don't set structured_content for errors - it would be validated against
77+
# the tool's outputSchema and fail (error dict doesn't match expected type)
5678
return ToolResult(
57-
content=[
58-
TextContent(
59-
type="text",
60-
text=json.dumps(
61-
{
62-
"error": True,
63-
"error_type": "timeout",
64-
"tool": tool_name,
65-
"message": str(e) or "Operation timed out",
66-
"action_required": (
67-
"Operation may still be in progress. "
68-
"Do NOT retry the same call. "
69-
"Use the appropriate get/status tool to check current state."
70-
),
71-
}
72-
),
73-
)
74-
]
79+
content=[TextContent(type="text", text=json.dumps({
80+
"error": True,
81+
"error_type": "timeout",
82+
"tool": tool_name,
83+
"message": str(e) or "Operation timed out",
84+
"action_required": (
85+
"Operation may still be in progress. "
86+
"Do NOT retry the same call. "
87+
"Use the appropriate get/status tool to check current state."
88+
),
89+
}))]
7590
)
7691

77-
except asyncio.CancelledError:
92+
except anyio.get_cancelled_exc_class():
93+
# Re-raise CancelledError so MCP SDK's handler catches it and skips
94+
# calling message.respond(). If we return a result here, the SDK will
95+
# try to respond, but the request may already be marked as responded
96+
# by the cancellation handler, causing an AssertionError crash.
97+
# See: https://github.com/modelcontextprotocol/python-sdk/pull/1153
7898
logger.warning(
79-
"Tool '%s' was cancelled. Returning structured result.",
99+
"Tool '%s' was cancelled. Re-raising to let MCP SDK handle cleanup.",
80100
tool_name,
81101
)
82-
return ToolResult(
83-
content=[
84-
TextContent(
85-
type="text",
86-
text=json.dumps(
87-
{
88-
"error": True,
89-
"error_type": "cancelled",
90-
"tool": tool_name,
91-
"message": "Operation was cancelled by the client",
92-
}
93-
),
94-
)
95-
]
96-
)
102+
raise
97103

98104
except Exception as e:
99105
# Log the full traceback for debugging
@@ -104,22 +110,14 @@ async def on_call_tool(
104110
traceback.format_exc(),
105111
)
106112

107-
# Return a structured error response
108-
error_message = str(e)
109-
error_type = type(e).__name__
110-
113+
# Return error as text content only - don't set structured_content.
114+
# Setting structured_content would cause MCP SDK to validate it against
115+
# the tool's outputSchema, which fails (error dict doesn't match expected type).
111116
return ToolResult(
112-
content=[
113-
TextContent(
114-
type="text",
115-
text=json.dumps(
116-
{
117-
"error": True,
118-
"error_type": error_type,
119-
"tool": tool_name,
120-
"message": error_message,
121-
}
122-
),
123-
)
124-
]
117+
content=[TextContent(type="text", text=json.dumps({
118+
"error": True,
119+
"error_type": type(e).__name__,
120+
"tool": tool_name,
121+
"message": str(e),
122+
}))]
125123
)

databricks-mcp-server/databricks_mcp_server/server.py

Lines changed: 20 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -72,15 +72,23 @@ async def async_wrapper(**kwargs):
7272
_patch_subprocess_stdin()
7373

7474

75-
def _patch_tool_decorator_for_windows():
76-
"""Wrap sync tool functions in asyncio.to_thread() on Windows.
75+
def _patch_tool_decorator_for_async():
76+
"""Wrap sync tool functions in asyncio.to_thread() on all platforms.
7777
7878
FastMCP's FunctionTool.run() calls sync functions directly on the asyncio
79-
event loop thread, which blocks the stdio transport's I/O tasks. On Windows
80-
with ProactorEventLoop this causes a deadlock where all MCP tools hang.
79+
event loop thread, which blocks the stdio transport's I/O tasks. This causes:
80+
81+
1. On Windows with ProactorEventLoop: deadlock where all MCP tools hang.
82+
83+
2. On ALL platforms: cancellation race conditions. When the MCP client
84+
cancels a request (e.g., timeout), the event loop can't propagate the
85+
CancelledError to blocking sync code. The sync function eventually
86+
returns, but the MCP SDK has already responded to the cancellation,
87+
causing "Request already responded to" assertion errors and crashes.
8188
8289
This patch intercepts @mcp.tool registration to wrap sync functions so they
83-
run in a thread pool, yielding control back to the event loop for I/O.
90+
run in a thread pool, yielding control back to the event loop for I/O and
91+
enabling proper cancellation handling via anyio's task cancellation.
8492
"""
8593
original_tool = mcp.tool
8694

@@ -132,11 +140,14 @@ async def _noop_lifespan(*args, **kwargs):
132140
# Register middleware (see middleware.py for details on each)
133141
mcp.add_middleware(TimeoutHandlingMiddleware())
134142

135-
# Apply async wrapper on Windows to prevent event loop deadlocks.
143+
# Apply async wrapper on ALL platforms to:
144+
# 1. Prevent event loop deadlocks (critical on Windows)
145+
# 2. Enable proper cancellation handling (critical on all platforms)
146+
# Without this, sync tools block the event loop, preventing CancelledError
147+
# propagation and causing "Request already responded to" crashes.
136148
# TODO: FastMCP 3.x automatically wraps sync functions in asyncio.to_thread().
137-
# Test if this Windows-specific patch is still needed with FastMCP 3.x.
138-
if sys.platform == "win32":
139-
_patch_tool_decorator_for_windows()
149+
# Test if this patch is still needed with FastMCP 3.x.
150+
_patch_tool_decorator_for_async()
140151

141152
# Import and register all tools (side-effect imports: each module registers @mcp.tool decorators)
142153
from .tools import ( # noqa: F401, E402

0 commit comments

Comments
 (0)