test: interaction-model end-to-end test suite with a requirements manifest#2691
Draft
maxisbey wants to merge 13 commits into
Draft
test: interaction-model end-to-end test suite with a requirements manifest#2691maxisbey wants to merge 13 commits into
maxisbey wants to merge 13 commits into
Conversation
New tests/interaction/ suite asserting client<->server round trips through the public API only. Tests are organised around a requirements manifest (_requirements.py) mapping each test to the spec or SDK behaviour it exercises, with known divergences from the spec recorded on the requirement; test_coverage.py enforces that every non-deferred requirement is exercised by at least one test. Covers tools, prompts, resources, and ping against the low-level Server, plus MCPServer tool-call behaviours. Removes two 'pragma: no cover' comments on the ping send/answer paths now that they are covered.
…action tests Extends the interaction suite with the initialize handshake (server identity, instructions, capability derivation, client identity and capabilities as seen by the server), completion round trips, logging notifications, and the MCPServer resource/prompt/structured-output behaviours. Records two more divergences on the requirements manifest: MCPServer reports unknown resources and prompts with error code 0 rather than the codes the spec documents. Removes the 'pragma: no cover' from the method-not-found fallback now that it is covered.
Covers the server-to-client half of the interaction model: sampling, form-mode elicitation, roots, progress in both directions, list_changed notifications, and request cancellation, all against the low-level Server through the public Client API. Records a further divergence: the server answers cancelled requests with an error response where the spec says no response should be sent. Removes five more 'pragma: no cover' comments on paths these tests now cover (server list_changed senders, the client roots send path, and the default elicitation callback).
…eta interaction tests Covers URL-mode elicitation (including the elicitation/complete lifecycle and the -32042 rejection flow), resource subscriptions and update notifications, cursor pagination across all four list methods, request and session read timeouts, _meta round trips, and the MCPServer Context convenience methods. Removes the 'pragma: no cover' from the resource-updated send path now that it is covered.
…ction tests Adds ClientSession-level tests for pre-initialization request rejection and protocol version negotiation, a proof that concurrent tool calls are dispatched simultaneously and answered independently, and tests pinning three behaviour gaps: tool-set mutations send no list_changed notification, logging/setLevel is not supported by MCPServer and no level filtering exists, and tool-enabled sampling is rejected because the high-level client cannot declare the sampling.tools capability.
A RecordingTransport wrapper tees every message crossing the client's transport boundary so the suite can assert properties that are invisible to API callers: request ids are unique and never null, notifications are never answered, and exactly one initialized notification is sent between the initialize response and the first feature request.
Drives the streamable HTTP Starlette app through httpx's ASGI transport so the full HTTP framing layer (session ids, SSE and JSON response encoding, stateful and stateless modes) runs with no sockets, threads, or subprocesses. Covers the handshake, tool calls and errors, mid-call notifications, the stateless rejection of server-initiated requests, and the routing of unrelated server messages to the standalone stream. Removes the 'pragma: no cover' comments these tests now cover (the session-manager accessors, the no-session-id validation path, and the related-request-id routing branch). The session-manager accessor's unreachable error guard keeps its pragma, moved onto the raise statement itself so the now-executed condition above it is measured normally.
…irements manifest Fixes the spec deep links that pointed at non-existent anchors, records the divergences for the client's default not-supported answers (the spec names -32601 for roots and -32602 for an undeclared elicitation mode; the default callbacks answer -32600), and adds a logging:capability requirement noting that MCPServer emits log message notifications without declaring the logging capability. Also tightens behaviour sentences and docstrings to match what the tests assert, and adds a test pinning that Context.report_progress is a silent no-op when the caller supplied no progress token, removing the pragma on that path.
…n-rejection interaction tests
… two-way test coverage
…tiated requests over streamable HTTP
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds
tests/interaction/: an end-to-end suite that enumerates the MCP interaction model — one test per behaviour, asserting full client↔server round trips through the public API only — plus a requirements manifest (_requirements.py) that maps every behaviour the SDK must satisfy to the spec section that mandates it, the tests that prove it, and the gaps that remain (recorded as structured divergence/deferred data rather than skipped tests).Motivation and Context
Two purposes: (1) a parity bar for upcoming internal rewrites of the send/receive path — the suite pins current observable behaviour exactly, so before/after runs prove equivalence; (2) a complete, reviewable ledger of the SDK's behaviour surface against the 2025-11-25 spec revision, including behaviours not yet implemented or not yet covered, each tracked with a reason. The ID vocabulary is aligned with the TypeScript SDK's end-to-end requirements suite so coverage can be compared across SDKs.
Status: work in progress. Currently 120 tests / 340 tracked requirements (207 deferred with reasons). Still to come in this PR: transport-parametrized execution (in-memory + streamable HTTP in-process, SSE), stdio E2E, transport/hosting conformance tests, authorization tests, and the remaining per-area gap tests.
How Has This Been Tested?
The suite itself:
uv run --frozen pytest tests/interaction/— in-memory and event-driven, ~1s, no sockets/threads/subprocesses, no sleeps. 100% line+branch coverage including the tests, pyright/ruff clean.src/changes are limited to removing# pragma: no covermarkers on lines the new tests now execute.Breaking Changes
None — tests and test-support only.
Types of changes
Checklist
tests/interaction/README.md)Additional context
Conventions, the manifest model, and the divergence lifecycle are documented in
tests/interaction/README.md. Known SDK gaps surfaced while writing the suite will be filed as issues separately; manifest entries link them via theirissuefield once filed.AI Disclaimer