improvement(mcp): post-merge hardening — protocol negotiation + distributed OAuth lock + typed errors by waleedlatif1 · Pull Request #4722 · simstudioai/sim

waleedlatif1 · 2026-05-22T15:50:36Z

Summary

Inbound MCP server protocol-version negotiation (serve/[serverId]/route.ts): echoes the client's protocolVersion when supported (2025-06-18 / 2025-03-26 / 2024-11-05), falls back to our latest. Previously hardcoded to the oldest spec version, forcing modern clients into legacy behavior.
Distributed OAuth refresh lock (oauth/storage.ts): withMcpOauthRefreshLock now wraps fn in a Postgres advisory transaction lock (pg_advisory_xact_lock) in addition to the in-process Promise chain. Multi-task ECS deployments no longer race rotated refresh tokens.
Typed-error dispatch in categorizeError (utils.ts): dispatches on McpOauthAuthorizationRequiredError / UnauthorizedError / McpConnectionError first. Adds HTTP 502 for connection failures and 503 for cooldown. Substring matching kept only as a fallback for SDK/third-party errors we don't own a class for.
Drop dead allSettled().catch() (service.ts): each deferred side-effect already self-logs; just mark per-promise as handled to silence unhandled-rejection warnings.

Type of Change

Improvement

Testing

Tested manually. 298 MCP tests pass (8 new: 4 covering protocol-version negotiation paths, 4 covering typed-error dispatch). `bun run check:api-validation` and lint clean.

Checklist

Code follows project style guidelines
Self-reviewed my changes
Tests added/updated and passing
No new warnings introduced
I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

vercel · 2026-05-22T15:50:42Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
docs	Skipped		May 22, 2026 5:19pm

cursor · 2026-05-22T15:50:46Z

PR Summary

Medium Risk
Touches MCP protocol handshake and OAuth token-refresh coordination, plus adds Redis-based locking; failures here could impact connectivity/auth flows across deployments, though changes are additive with fallbacks and covered by new tests.

Overview
Adds MCP protocol version negotiation on the workflow MCP serve initialize path, echoing a client-requested version when supported and otherwise falling back to the SDK LATEST_PROTOCOL_VERSION (with new route tests covering supported/unsupported/omitted cases).

Hardens outbound MCP OAuth by upgrading withMcpOauthRefreshLock from in-process serialization to a distributed Redis mutex with TTL watchdog, polling, and bounded wait to avoid refresh-token rotation races across processes (with extensive new unit tests).

Improves reliability/ops behavior by pooling undici pinned-DNS Agents per resolvedIP (LRU-capped, no-close eviction) and by making categorizeError prefer typed MCP/SDK auth/connection errors (returning 401/502/503 accordingly); also removes a redundant allSettled().catch() in discovery deferred side-effects to avoid unhandled-rejection warnings.

^{Reviewed by Cursor Bugbot for commit 4c05f63. Configure here.}

greptile-apps · 2026-05-22T15:55:04Z

Greptile Summary

This PR hardens the MCP integration across four areas: protocol-version negotiation on the inbound serve endpoint (echo client's version if supported, fall back to latest), a two-tier distributed OAuth refresh lock replacing the single-process promise chain (Redis mutex with watchdog TTL extension + in-process chain), typed instanceof dispatch in categorizeError before substring fallback, and an LRU connection-pool for pinned undici Agents to reuse keep-alive sockets across McpClient instances.

Protocol negotiation (route.ts): negotiateProtocolVersion() echoes SUPPORTED_PROTOCOL_VERSIONS from the SDK; tests cover all four negotiation paths.
Distributed lock (storage.ts): withMcpOauthRefreshLock now chains an in-process serialization queue with a polled Redis mutex and a watchdog setInterval that extends the TTL while fn() runs; falls open (logs a warning) if Redis is unavailable.
categorizeError (utils.ts): adds McpConnectionError → 502/503 and UnauthorizedError → 401 paths before the substring fallback; pinnedAgents LRU pool in pinned-fetch.ts reuses undici Agents per resolved IP.

Confidence Score: 5/5

Safe to merge — all four changes are well-bounded, thoroughly tested, and the two-tier OAuth lock correctly preserves the per-caller fn() invariant.

No incorrect behavior in the changed paths. The distributed lock implementation correctly handles Redis unavailability, lock release on fn() throws, watchdog extension for long-running refreshes, and per-row key isolation. The protocol negotiation, typed error dispatch, and agent pool are all verified by the new tests. The one observation is a test-utility export in the production module, which does not affect runtime correctness.

apps/sim/lib/mcp/pinned-fetch.ts — __resetPinnedAgentsForTests is a test-only helper exported from the production module.

Important Files Changed

Filename	Overview
apps/sim/lib/mcp/oauth/storage.ts	Replaces single-process promise chain with two-tier Redis + in-process lock; watchdog extends TTL, falls open on Redis unavailability, correct cleanup in finally block.
apps/sim/lib/mcp/pinned-fetch.ts	Adds LRU agent pool (64 entries, no TTL) to reuse keep-alive connections; exports __resetPinnedAgentsForTests test helper from production module.
apps/sim/lib/mcp/utils.ts	Typed instanceof dispatch for McpConnectionError/UnauthorizedError/McpOauthAuthorizationRequiredError added before substring fallback; clean separation of concerns.
apps/sim/app/api/mcp/serve/[serverId]/route.ts	negotiateProtocolVersion() echoes SDK-supported versions from the client's request, correctly falls back to LATEST_PROTOCOL_VERSION.
apps/sim/lib/mcp/service.ts	Replaces dead Promise.allSettled().catch() with per-promise .catch(()=>{}) — all deferred side-effects already self-log via try/catch in their private methods.

Sequence Diagram

sequenceDiagram
    participant C1 as Caller 1 (in-process)
    participant C2 as Caller 2 (in-process)
    participant CP as Caller P (other process)
    participant IC as inflightChains
    participant R as Redis
    participant F as fn()

    C1->>IC: "get(lockKey) prev=resolved"
    C1->>IC: set(lockKey, next1)
    C2->>IC: "get(lockKey) prev=next1"
    C2->>IC: set(lockKey, next2)
    C1->>R: acquireLock(key, token1, 15s) true
    C1->>C1: start watchdog setInterval(5s)
    C1->>F: fn() runs
    CP->>R: acquireLock(key, tokenP, 15s) false
    CP->>CP: sleep(100ms), poll...
    C1->>R: extendLock(key, token1, 15s)
    F-->>C1: result1
    C1->>C1: clearInterval(watchdog)
    C1->>R: releaseLock(key, token1)
    C1->>IC: cleanup delete(lockKey)
    C2->>R: acquireLock(key, token2, 15s) true
    C2->>F: fn() runs own invocation
    F-->>C2: result2

_{Reviews (5): Last reviewed commit: "fix(mcp): throw instead of falling open ..." | Re-trigger Greptile}

…ibuted OAuth lock + typed error dispatch - Inbound MCP server now negotiates protocolVersion per MCP 2025-06-18: echoes the client's requested version when supported, falls back to our latest. Previously hardcoded to the oldest spec version (2024-11-05). - withMcpOauthRefreshLock now takes a Postgres advisory transaction lock in addition to the in-process Promise chain, so concurrent processes (multi-task ECS) serialize on a per-OAuth-row basis. Previously a refresh race across processes could rotate a token under another process's feet and force re-auth. - categorizeError dispatches on McpOauthAuthorizationRequiredError / UnauthorizedError / McpConnectionError first, only falling back to substring matching for SDK / third-party errors. Adds 502 for connection failures and 503 for cooldown. Tests cover all four typed cases. - discoverTools no longer pretends to handle deferred-side-effect rejections via a dead allSettled().catch() — each side-effect already self-logs; we just swallow per-promise to silence unhandled-rejection warnings.

- Replace `as never` cast in negotiateProtocolVersion with `as readonly string[]` on the array — preserves TypeScript narrowing on the comparison while satisfying the readonly-tuple `.includes()` constraint properly. - Document the pg_advisory_xact_lock tradeoff: session-level locks (`pg_advisory_lock`) would release the connection earlier, but PgBouncer transaction-pooling mode breaks them. xact_lock is the correct choice for Sim's deployment; if pool pressure becomes real, the comment notes the Redlock escape hatch.

- McpTool now extends `Pick<Tool, 'name' | 'description'>` from @modelcontextprotocol/sdk so name/description fields stay in sync with the SDK; serverId/serverName remain Sim-specific additions. - Drop file-header restatements ("MCP Types - for connecting to external MCP servers"), one-line wrapper docstrings ("Get connection status"), and narrative comment blocks that just restate visible code. - Keep only comments that document non-obvious "why" — OAuth refresh-lock tradeoff, in-flight dedup key composition, SDK Tool.inputSchema typing, preregistered-client semantics, postMessage handshake contract.

withMcpOauthRefreshLock now uses `coalesceLocally` + Redis acquireLock/ releaseLock with polling — the same primitives backing regular OAuth refresh (`app/api/auth/oauth/utils.ts`). No more pinning a Postgres connection for the duration of the SDK's OAuth HTTP refresh. - In-process dedup: shared promise via `coalesceLocally`. - Cross-process: Redis SET NX EX mutex; followers poll until the leader releases (30s max wait, 100ms poll), then acquire and run fn(). - Each MCP caller still constructs its own client (semantics preserved). - Falls open when Redis is unavailable — same behavior as the regular OAuth refresh code path.

waleedlatif1 · 2026-05-22T16:17:42Z

@greptile

waleedlatif1 · 2026-05-22T16:17:46Z

@cursor review

…ts + cover OAuth lock - McpClient.SUPPORTED_VERSIONS removed; getVersionInfo() and the inbound serve route both import LATEST_PROTOCOL_VERSION / SUPPORTED_PROTOCOL_VERSIONS directly from @modelcontextprotocol/sdk/types.js. New protocol revisions ship automatically with SDK upgrades. - pinned-fetch now caches undici Agents in a module-level LRU keyed by resolvedIP (max 64). Back-to-back MCP calls to the same server reuse the keep-alive connection pool instead of opening fresh TCP + TLS each time. - New integration tests for withMcpOauthRefreshLock covering: in-process dedup via coalesceLocally, cross-process serialization via Redis mutex, fall-open on Redis unavailable, lock release on throw, release-failure swallow, per-row key isolation.

withMcpOauthRefreshLock previously wrapped fn() in coalesceLocally, which returns the SAME promise (and the same resolved value) to all in-process callers. fn() returns a stateful McpClient — sharing it meant whichever caller finished first would disconnect the client while another was still mid-call, leaving in-flight RPC on a closed connection. Swap coalesceLocally for a per-row Promise chain: each caller waits for the previous to settle, then runs its OWN fn() (gets its own client). Cross-process Redis mutex semantics unchanged. The "shareable scalar" assumption that makes coalesceLocally correct for regular OAuth refresh (returns an access token string) does not hold for MCP, where each caller needs an independent connection.

waleedlatif1 · 2026-05-22T16:59:09Z

@greptile

waleedlatif1 · 2026-05-22T16:59:12Z

@cursor review

…inned agents - Redis refresh lock now uses a 15s TTL with a watchdog that extends every 5s while fn() runs. Long-running OAuth refreshes no longer lose the lock mid-flight and let another process race the same refresh. - Pinned-agent LRU eviction no longer calls `agent.close()`. Existing `createMcpPinnedFetch` closures hold the dispatcher reference and were using a closed Agent after eviction. We drop from the cache and let GC release the dispatcher once the last closure dies; undici closes idle keep-alive connections via its own internal timeout. - New tests: watchdog extends while fn() runs and stops once it settles; evicted agents are not closed and captured closures still work.

waleedlatif1 · 2026-05-22T17:05:31Z

@greptile

waleedlatif1 · 2026-05-22T17:05:35Z

@cursor review

…s deadline When the Redis refresh lock can't be acquired within REFRESH_MAX_WAIT_MS the previous code ran fn() uncoordinated — but another process can still be holding the lock (watchdog-extended) and refreshing the same OAuth row, recreating the exact race the lock prevents. Throw on deadline. The caller can retry; the Redis-down branch remains the only path that runs fn() uncoordinated (no coordination is possible there).

waleedlatif1 · 2026-05-22T17:11:05Z

@greptile

waleedlatif1 · 2026-05-22T17:11:09Z

@cursor review

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit 4c05f63. Configure here.}

…hods Earlier comment-trim pass went too far on a few exports — restored the TSDoc that explained non-obvious "why" decisions: - SimMcpOauthProviderInit.preregistered: when set, the SDK skips DCR. - McpServerConfig.userId: required for OAuth; selects which user's stored tokens to use. - McpOauthAuthorizationRequiredError: benign pending state vs failure. - McpToolsChangedCallback / McpClientOptions: notification semantics, DNS-rebinding pinning rationale, OAuth provider contract. - StoredMcpToolReference / StoredMcpTool: minimal vs extended use. - McpClient.connect: documents listChanged handler registration. - McpService.executeTool: documents session-error retry behavior. Pure-restatement comments ("Disconnect from MCP server") stay trimmed.

greptile-apps Bot reviewed May 22, 2026

View reviewed changes

Comment thread apps/sim/lib/mcp/oauth/storage.ts Outdated

Comment thread apps/sim/app/api/mcp/serve/[serverId]/route.ts Outdated

vercel Bot temporarily deployed to Preview May 22, 2026 16:00 Inactive

vercel Bot temporarily deployed to Preview May 22, 2026 16:07 Inactive

vercel Bot temporarily deployed to Preview May 22, 2026 16:15 Inactive

waleedlatif1 added 4 commits May 22, 2026 09:16

waleedlatif1 force-pushed the improvement/mcp-hardening branch from 14e22e9 to cdb3e14 Compare May 22, 2026 16:16

vercel Bot temporarily deployed to Preview May 22, 2026 16:17 Inactive

cursor Bot reviewed May 22, 2026

View reviewed changes

Comment thread apps/sim/lib/mcp/oauth/storage.ts Outdated

greptile-apps Bot reviewed May 22, 2026

View reviewed changes

Comment thread apps/sim/lib/mcp/oauth/storage.ts Outdated

vercel Bot temporarily deployed to Preview May 22, 2026 16:47 Inactive

vercel Bot temporarily deployed to Preview May 22, 2026 16:56 Inactive

cursor Bot reviewed May 22, 2026

View reviewed changes

Comment thread apps/sim/lib/mcp/oauth/storage.ts

Comment thread apps/sim/lib/mcp/pinned-fetch.ts

vercel Bot temporarily deployed to Preview May 22, 2026 17:03 Inactive

cursor Bot reviewed May 22, 2026

View reviewed changes

Comment thread apps/sim/lib/mcp/oauth/storage.ts Outdated

vercel Bot temporarily deployed to Preview May 22, 2026 17:10 Inactive

cursor Bot reviewed May 22, 2026

View reviewed changes

vercel Bot temporarily deployed to Preview May 22, 2026 17:19 Inactive

waleedlatif1 merged commit b2ad5e9 into staging May 22, 2026
9 checks passed

waleedlatif1 deleted the improvement/mcp-hardening branch May 22, 2026 17:19

waleedlatif1 mentioned this pull request May 22, 2026

v0.6.89: connectors ui, perf improvements, mcp hardening, og image #4729

Merged

Conversation

waleedlatif1 commented May 22, 2026

Summary

Type of Change

Testing

Checklist

Uh oh!

vercel Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

greptile-apps Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

waleedlatif1 commented May 22, 2026

Uh oh!

waleedlatif1 commented May 22, 2026

Uh oh!

Uh oh!

Uh oh!

waleedlatif1 commented May 22, 2026

Uh oh!

waleedlatif1 commented May 22, 2026

Uh oh!

Uh oh!

Uh oh!

waleedlatif1 commented May 22, 2026

Uh oh!

waleedlatif1 commented May 22, 2026

Uh oh!

Uh oh!

waleedlatif1 commented May 22, 2026

Uh oh!

waleedlatif1 commented May 22, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 22, 2026 •

edited

Loading

cursor Bot commented May 22, 2026 •

edited

Loading

greptile-apps Bot commented May 22, 2026 •

edited

Loading