Feat/mcp resilience4j#1033
Open
Pratyay wants to merge 5 commits into
Open
Conversation
Adds a new mcp-resilience4j module providing circuit breaking, retry, rate limiting, time limiting, and bulkhead policies for McpClientTransport. - ResilientMcpClientTransport: decorator wrapping any McpClientTransport with all five Resilience4j policies on sendMessage(), CB+Retry only on connect(). Policy order follows the standard hierarchy: Retry → CircuitBreaker → RateLimiter → TimeLimiter → Bulkhead. - McpResilienceConfig: high-level fluent facade for configuring the transport wrapper via config objects or shared registries. - 13 unit tests covering delegation, retry, circuit breaker, time limiter, and all transparent-delegation methods. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents the five policies, their ordering rationale, quick-start examples for both McpResilienceConfig and the direct builder, registry usage with the name-collision warning, built-in observability logging, and a Google ADK integration pattern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Title: Add mcp-resilience4j module with transport-level resilience
Adds a new
mcp-resilience4jmodule that wraps anyMcpClientTransportwithconfigurable Resilience4j policies, making MCP tool calls resilient to transient
failures, slow servers, and traffic spikes.
Motivation and Context
MCP tool calls cross a network. Without resilience, a slow or flaky MCP server
can cause cascading failures in AI agent pipelines blocking threads indefinitely,
repeatedly hammering a server that cannot recover, or overwhelming a rate-limited
endpoint during a burst of parallel tool invocations.
McpClientTransportis the natural integration point: it is the single boundaryall MCP clients share, it is the interface frameworks like Google ADK expose for
custom transport injection, and wrapping it leaves the rest of the MCP client
stack entirely unchanged.
How Has This Been Tested
13 unit tests covering:
call receives CallNotPermittedException
All 13 tests pass locally (
mvn test -pl mcp-resilience4j).Breaking Changes
None. This is a new optional module. Existing code and dependencies are unchanged.
Types of Changes
Checklist
Additional Context
Policy ordering — Retry → CircuitBreaker → RateLimiter → TimeLimiter → Bulkhead
follows the standard Resilience4j recommended hierarchy. Retry is outermost so it
orchestrates the full inner chain per attempt. Bulkhead is innermost so concurrency
slots are released during Retry's backoff sleep rather than held, preventing slot
exhaustion from blocking healthy concurrent callers. RateLimiter is inside Retry so
each retry attempt consumes a token, keeping the local rate count aligned with actual
server-side request volume.
sendMessage()applies all five policies.connect()applies only CircuitBreakerand Retry, session establishment is not throttled or timed out.
Why not a client-level wrapper? An earlier design explored wrapping McpAsyncClient
directly. This was removed because McpAsyncClient has a package-private constructor
(not subclassable), and frameworks like Google ADK create McpSyncClient internally
with no injection point for a custom async client. The transport is the only hook
these frameworks expose.
ThreadPoolBulkhead is intentionally excluded. The semaphore Bulkhead is correct
for reactive code, injecting a ThreadPoolBulkheadOperator would force a thread-pool
handoff inside the reactive chain, competing with Reactor's own schedulers.
Registry name collisions Resilience4j registries silently return a cached
instance when a name already exists, ignoring any supplied config. The builder logs
a WARN when this is detected. Callers sharing a registry across multiple transports
must use unique names per transport.