Skip to content

v0.6.89: connectors ui, perf improvements, mcp hardening, og image#4729

Open
waleedlatif1 wants to merge 11 commits into
mainfrom
staging
Open

v0.6.89: connectors ui, perf improvements, mcp hardening, og image#4729
waleedlatif1 wants to merge 11 commits into
mainfrom
staging

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

waleedlatif1 and others added 10 commits May 22, 2026 06:40
)

* fix(combobox): show selected values in multi-select trigger label

The collapsed trigger was reading only `selectedOption` (the single-value
path) and falling back to the placeholder when nothing matched, so a
multi-select dropdown with 1+ checked items still rendered "Select one or
more channels" instead of the actual selections.

Added `multiSelectLabel` derived from `multiSelectValues`:
- 1 value  → that label
- 2 values → "A, B"
- 3+       → "A, B +N"

Trigger now prefers `multiSelectLabel` when present and falls back to the
single-select label / placeholder otherwise. Muted-text color also flips
off when multi has any selection.

* chore(kb-connectors): strip redundant field-level descriptions

Removed 41 inline `description:` lines from configFields across 16 connectors
(Slack, MS Teams, GCal, Gmail, Notion, Linear-adjacent, Discord, Dropbox,
Evernote, Fireflies, Google Sheets, Intercom, Obsidian, Outlook, Reddit,
ServiceNow, WordPress, Zendesk). They mostly restated the field title (e.g.
"Channels to sync messages from" under a "Channels" label) and cluttered
the add/edit modal. Field titles + placeholders already communicate intent.

Connector-level `description` (used in the connector picker grid) is
unchanged.

* test(leader-lock): use fake timers to deterministically test follower polling

The "follower does a final read after timeout" test (and the
"follower returns null after timeout" test) relied on real-clock
`setTimeout` and `Date.now()` with very tight bounds (pollIntervalMs=5,
maxWaitMs=9). Any CI scheduler jitter of >4ms would cause the second
in-loop poll to be skipped, the polls counter to end at 2 instead of 3,
and the assertion `expect(result).toBe('late-leader')` to fail.

Switched both tests to `vi.useFakeTimers()` so the schedule is driven by
mocked time advanced via `vi.advanceTimersByTimeAsync`. The intent is
unchanged — verify that the in-loop deadline triggers exactly one
post-deadline last-chance call to `onFollower` — but the assertions no
longer depend on wall-clock timing.

Verified across 5 sequential runs with zero flakes.

* improvement(kb-connectors): restore field descriptions as info-icon tooltips

Restores the 41 field-level `description` lines stripped in fc64421, but
instead of rendering them as inline muted-text paragraphs they're shown via a
small Info icon next to each field title. Hovering or focusing the icon
reveals the description in the existing emcn Tooltip. Keeps the modal layout
tight while preserving the per-field guidance.

Used <button type="button"> as the tooltip trigger (Radix asChild) rather
than <span tabIndex={0}> to satisfy a11y/noNoninteractiveTabindex.
…#4723)

* fix(hubspot): fall back to objectType default in selector fetchOptions

useSubBlockStore.getValue returns null for default-valued dropdowns
until the user interacts with them. The properties, pipelines, stages,
and ownerId selectors were treating that as "no selection" and
short-circuiting, so the dropdowns appeared empty even though the
trigger uses 'contact' as the visible default.

Adds resolveSelectedObjectType to mirror the rendered default, so the
selectors fire on first paint with a valid objectType.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(hubspot): validate credentialId in selector routes

Mirrors the Gmail/Webflow/Jira selector route security pattern by
rejecting non-alphanumeric credentialId values before authorization
or token refresh.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(hubspot): use resolveSelectedObjectType in pipelineId/stageId fetchOptions

Both selectors used inline `?? 'contact'` fallbacks while properties and
targetPropertyName already routed through the resolver. Switch to the
shared helper so custom-object handling stays consistent across every
cascading selector.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…ibuted OAuth lock + typed errors (#4722)

* improvement(mcp): post-merge hardening — protocol negotiation + distributed OAuth lock + typed error dispatch

- Inbound MCP server now negotiates protocolVersion per MCP 2025-06-18: echoes
  the client's requested version when supported, falls back to our latest.
  Previously hardcoded to the oldest spec version (2024-11-05).
- withMcpOauthRefreshLock now takes a Postgres advisory transaction lock in
  addition to the in-process Promise chain, so concurrent processes (multi-task
  ECS) serialize on a per-OAuth-row basis. Previously a refresh race across
  processes could rotate a token under another process's feet and force re-auth.
- categorizeError dispatches on McpOauthAuthorizationRequiredError /
  UnauthorizedError / McpConnectionError first, only falling back to substring
  matching for SDK / third-party errors. Adds 502 for connection failures and
  503 for cooldown. Tests cover all four typed cases.
- discoverTools no longer pretends to handle deferred-side-effect rejections
  via a dead allSettled().catch() — each side-effect already self-logs; we just
  swallow per-promise to silence unhandled-rejection warnings.

* fix(mcp): bugbot review on hardening PR

- Replace `as never` cast in negotiateProtocolVersion with `as readonly string[]`
  on the array — preserves TypeScript narrowing on the comparison while
  satisfying the readonly-tuple `.includes()` constraint properly.
- Document the pg_advisory_xact_lock tradeoff: session-level locks
  (`pg_advisory_lock`) would release the connection earlier, but PgBouncer
  transaction-pooling mode breaks them. xact_lock is the correct choice for
  Sim's deployment; if pool pressure becomes real, the comment notes the
  Redlock escape hatch.

* chore(mcp): trim verbose comments + reuse SDK Tool type in McpTool

- McpTool now extends `Pick<Tool, 'name' | 'description'>` from
  @modelcontextprotocol/sdk so name/description fields stay in sync with
  the SDK; serverId/serverName remain Sim-specific additions.
- Drop file-header restatements ("MCP Types - for connecting to external
  MCP servers"), one-line wrapper docstrings ("Get connection status"),
  and narrative comment blocks that just restate visible code.
- Keep only comments that document non-obvious "why" — OAuth refresh-lock
  tradeoff, in-flight dedup key composition, SDK Tool.inputSchema typing,
  preregistered-client semantics, postMessage handshake contract.

* improvement(mcp): swap PG advisory lock for Redis mutex on OAuth refresh

withMcpOauthRefreshLock now uses `coalesceLocally` + Redis acquireLock/
releaseLock with polling — the same primitives backing regular OAuth
refresh (`app/api/auth/oauth/utils.ts`). No more pinning a Postgres
connection for the duration of the SDK's OAuth HTTP refresh.

- In-process dedup: shared promise via `coalesceLocally`.
- Cross-process: Redis SET NX EX mutex; followers poll until the leader
  releases (30s max wait, 100ms poll), then acquire and run fn().
- Each MCP caller still constructs its own client (semantics preserved).
- Falls open when Redis is unavailable — same behavior as the regular
  OAuth refresh code path.

* improvement(mcp): use SDK protocol versions + pool pinned undici agents + cover OAuth lock

- McpClient.SUPPORTED_VERSIONS removed; getVersionInfo() and the inbound
  serve route both import LATEST_PROTOCOL_VERSION / SUPPORTED_PROTOCOL_VERSIONS
  directly from @modelcontextprotocol/sdk/types.js. New protocol revisions
  ship automatically with SDK upgrades.
- pinned-fetch now caches undici Agents in a module-level LRU keyed by
  resolvedIP (max 64). Back-to-back MCP calls to the same server reuse the
  keep-alive connection pool instead of opening fresh TCP + TLS each time.
- New integration tests for withMcpOauthRefreshLock covering: in-process
  dedup via coalesceLocally, cross-process serialization via Redis mutex,
  fall-open on Redis unavailable, lock release on throw, release-failure
  swallow, per-row key isolation.

* fix(mcp): serialize OAuth refresh callers; do not share McpClient

withMcpOauthRefreshLock previously wrapped fn() in coalesceLocally, which
returns the SAME promise (and the same resolved value) to all in-process
callers. fn() returns a stateful McpClient — sharing it meant whichever
caller finished first would disconnect the client while another was still
mid-call, leaving in-flight RPC on a closed connection.

Swap coalesceLocally for a per-row Promise chain: each caller waits for
the previous to settle, then runs its OWN fn() (gets its own client).
Cross-process Redis mutex semantics unchanged.

The "shareable scalar" assumption that makes coalesceLocally correct for
regular OAuth refresh (returns an access token string) does not hold for
MCP, where each caller needs an independent connection.

* fix(mcp): bugbot — TTL watchdog on OAuth lock + don't close evicted pinned agents

- Redis refresh lock now uses a 15s TTL with a watchdog that extends every
  5s while fn() runs. Long-running OAuth refreshes no longer lose the lock
  mid-flight and let another process race the same refresh.
- Pinned-agent LRU eviction no longer calls `agent.close()`. Existing
  `createMcpPinnedFetch` closures hold the dispatcher reference and were
  using a closed Agent after eviction. We drop from the cache and let GC
  release the dispatcher once the last closure dies; undici closes idle
  keep-alive connections via its own internal timeout.
- New tests: watchdog extends while fn() runs and stops once it settles;
  evicted agents are not closed and captured closures still work.

* fix(mcp): throw instead of falling open when refresh lock wait exceeds deadline

When the Redis refresh lock can't be acquired within REFRESH_MAX_WAIT_MS
the previous code ran fn() uncoordinated — but another process can still
be holding the lock (watchdog-extended) and refreshing the same OAuth
row, recreating the exact race the lock prevents.

Throw on deadline. The caller can retry; the Redis-down branch remains
the only path that runs fn() uncoordinated (no coordination is possible
there).

* docs(mcp): restore TSDoc that documented intent on exported types/methods

Earlier comment-trim pass went too far on a few exports — restored
the TSDoc that explained non-obvious "why" decisions:

- SimMcpOauthProviderInit.preregistered: when set, the SDK skips DCR.
- McpServerConfig.userId: required for OAuth; selects which user's
  stored tokens to use.
- McpOauthAuthorizationRequiredError: benign pending state vs failure.
- McpToolsChangedCallback / McpClientOptions: notification semantics,
  DNS-rebinding pinning rationale, OAuth provider contract.
- StoredMcpToolReference / StoredMcpTool: minimal vs extended use.
- McpClient.connect: documents listChanged handler registration.
- McpService.executeTool: documents session-error retry behavior.

Pure-restatement comments ("Disconnect from MCP server") stay trimmed.
… SSRF (#4725)

* fix(tools): pin resolved IP in DB connectors to prevent DNS-rebinding SSRF

`validateDatabaseHost` resolved an IP that was then discarded — drivers re-resolved
the hostname at connect time, enabling DNS-rebinding TOCTOU.

- mongodb: pass resolved IP via MongoClient `lookup` option
- mysql: pin TCP socket via `stream` factory; keep hostname for TLS servername
- postgresql: connect to resolved IP; pass `ssl` object with `servername` for SNI
- redis: parse URL explicitly and pass options-only (URL+options breaks override
  due to ioredis's lodash.defaults); pin host and set `tls.servername` for rediss
- neo4j: pin IP for plain `bolt://`; leave `bolt+s`/`neo4j+s` unchanged to keep
  Aura cert validation working (driver hardcodes servername with no override)

* chore(tools): remove explainer comments from DB connector SSRF fix

* fix(tools): add explicit TCP timeout to mysql stream factory

* fix(tools): unify postgres ssl handling to send SNI in preferred mode

* fix(tools): preserve postgres 'preferred' fallback behavior for backward compat

* fix(tools): reject non-numeric Redis URL db segment instead of silently using db 0
* fix(db): disable statement_timeout for migrations

* fix(ci): route migration workflow through guarded migrate.ts
* fix(large-refs): cleanup based on table read

* address comments

* address comments

* bubble up storage ref errors

* cleanup code

* do not attempt blob deletion for infra outage

* cleanup dup helper
@vercel
Copy link
Copy Markdown

vercel Bot commented May 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 23, 2026 12:57am

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 22, 2026

PR Summary

High Risk
High risk because it changes log/large-value deletion behavior, adds new metadata tracking used during execution finalization, and modifies MCP protocol/OAuth coordination and outbound DB connector networking (SSRF/DNS pinning). Failures could affect data retention, token refresh reliability, or connectivity to external services.

Overview
Adds a new large execution value metadata layer (executionLargeValues + references/dependencies) and switches cleanup to delete large values by querying these tables (plus legacy workspaceFiles) instead of scanning execution_data, with bounded batch limits and tombstone pruning; also lowers Trigger.dev cleanup concurrency and tightens chunkedBatchDelete accounting.

Hardens execution logging to persist large-value reference sets transactionally, updates payload storage/materialization to register owners, track references, and aggressively clean up untracked durable uploads, and updates tests accordingly.

Improves MCP robustness by adding server-side protocol version negotiation (echo supported versions else fallback to SDK latest), aligning client version info to SDK SUPPORTED_PROTOCOL_VERSIONS, and introducing a distributed Redis-backed OAuth refresh lock with TTL watchdog.

Closes several security/robustness gaps by pinning resolved IPs for MongoDB/MySQL/Postgres/Neo4j/Redis connections (plus stricter Redis URL parsing) and validating HubSpot credentialId inputs; also includes UI/UX refinements across knowledge-base connectors, combobox multi-select trigger labeling, and small landing/footer tweaks, and updates migrations CI to run scripts/migrate.ts.

Reviewed by Cursor Bugbot for commit 5c19344. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 22, 2026

Greptile Summary

This release bundles ten focused improvements: MCP post-merge hardening (protocol negotiation, distributed Redis OAuth lock, typed error categorization), DNS-rebinding SSRF fixes for all five DB connectors, a new metadata-driven large-value GC system backed by three new tables, copilot query narrowing, a combobox multi-select label fix, and minor UI/branding updates.

  • Large-value GC (large-value-metadata.ts, store.ts, cleanup-logs.ts, migration 0212): A reference-tracking system (execution_large_values, execution_large_value_references, execution_large_value_dependencies) replaces the previous JSONB substring-scan approach. markLargeValuesDeleted is now the authoritative deletion signal, but deleteLargeValueKeys deletes storage blobs before updating the DB, creating a potential consistency gap if the DB write fails. Additionally, addLargeValueReference is unconditionally awaited before the cache check in materializeLargeValueRef, adding 1–3 DB round trips to every large-value cache hit.
  • SSRF fixes (postgresql/utils.ts, mysql/utils.ts, mongodb/utils.ts, redis/execute/route.ts, neo4j/utils.ts): All five connectors now pin outbound connections to the pre-validated IP. PostgreSQL ssl: 'preferred' and Neo4j TLS connections still re-resolve DNS inside the driver (see inline comments), leaving a narrow TOCTOU window for those modes.
  • MCP hardening (oauth/storage.ts, serve/route.ts, client.ts): The OAuth refresh lock gains a cross-process Redis mutex with a watchdog extension loop; the workflow MCP serve route now negotiates the protocol version dynamically instead of hardcoding 2024-11-05.

Confidence Score: 3/5

The PR introduces a new large-value GC system and SSRF mitigations that work correctly in the common path, but the cleanup function deletes storage blobs before committing the DB tombstone, which can leave entries stranded if the DB write fails after the blob is already gone.

The storage-then-DB ordering in deleteLargeValueKeys means a transient DB failure after successful blob deletion produces a key that is permanently unresolvable if the storage backend is not idempotent on deletes. Two additional SSRF windows remain open (PostgreSQL ssl: 'preferred' and Neo4j TLS connections skip IP pinning). The addLargeValueReference write before the cache check in materializeLargeValueRef adds repeated DB overhead to what was previously a zero-DB cache-hit path. The MCP and combobox changes are clean, and the migration and migration-script changes look correct.

apps/sim/background/cleanup-logs.ts (storage-before-DB deletion order), apps/sim/lib/execution/payloads/store.ts (unconditional DB write before cache), apps/sim/app/api/tools/postgresql/utils.ts and apps/sim/app/api/tools/neo4j/utils.ts (remaining SSRF windows for specific SSL modes).

Security Review

  • DNS rebinding / SSRF (PostgreSQL ssl: 'preferred'): apps/sim/app/api/tools/postgresql/utils.ts does not apply IP pinning for ssl: 'preferred' mode; a second DNS resolution inside the postgres() driver happens after host validation, leaving a TOCTOU window.
  • DNS rebinding / SSRF (Neo4j TLS): apps/sim/app/api/tools/neo4j/utils.ts preserves the original hostname in the URI for bolt+s/neo4j+s connections, so the driver re-resolves DNS after validation.
  • No secrets or credentials appear to be exposed; OAuth tokens are encrypted at rest and the new Redis mutex keys contain only the DB row ID.
  • The distributed OAuth lock in apps/sim/lib/mcp/oauth/storage.ts correctly falls open (continues uncoordinated) when Redis is unavailable, rather than blocking forever.

Important Files Changed

Filename Overview
apps/sim/lib/execution/payloads/large-value-metadata.ts New 618-line module for GC-aware large value tracking; contains a BFS dependency closure that generates potentially large NOT IN (...) clauses and an atomicity gap in the deletion path.
apps/sim/lib/execution/payloads/store.ts Adds addLargeValueReference DB write before cache check in materializeLargeValueRef, making every large-value cache hit incur 1-3 DB round trips.
apps/sim/background/cleanup-logs.ts Major overhaul of cleanup logic to use the new metadata-driven GC system; deleteLargeValueKeys deletes storage blobs before marking DB records, creating a consistency gap if the DB update fails.
apps/sim/lib/mcp/oauth/storage.ts Adds Redis distributed mutex (watchdog + polling loop) on top of the existing in-process Promise chain for OAuth refresh serialization; falls open to uncoordinated mode on Redis unavailability.
apps/sim/app/api/tools/postgresql/utils.ts Pins TCP connection to validated IP for non-preferred SSL modes; ssl: 'preferred' connections still use the original hostname, leaving a TOCTOU SSRF window for that mode.
apps/sim/app/api/tools/neo4j/utils.ts Pins Neo4j connection URI to resolved IP when TLS is disabled; encrypted connections (bolt+s) still use the original hostname, leaving a TOCTOU window for that mode.
apps/sim/app/api/mcp/serve/[serverId]/route.ts Adds protocol version negotiation in the initialize handler; falls back to LATEST_PROTOCOL_VERSION for unknown versions instead of hardcoding 2024-11-05.
packages/db/migrations/0212_sturdy_guardsmen.sql Adds three new tables (execution_large_values, execution_large_value_references, execution_large_value_dependencies) with appropriate indexes and workspace-cascading foreign keys.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[storeLargeValue] -->|1. collect deps| B[collectLargeValueKeys]
    A -->|2. persist blob| C[persistValue → Blob Storage]
    C --> D{blob written?}
    D -->|yes| E[registerLargeValueOwner]
    E --> F[getDependencyClosure BFS]
    F --> G[(executionLargeValueDependencies)]
    E --> H[(executionLargeValues INSERT)]
    D -->|no| I[in-memory cache only]

    J[materializeLargeValueRef] -->|1. always await| K[addLargeValueReference]
    K --> L[(executionLargeValueReferences SELECT+INSERT)]
    K -->|2. then check| M[materializeLargeValueRefSync cache]
    M -->|cache hit| N[return value]
    M -->|cache miss| O[readLargeValueRefFromStorage]

    P[cleanupLargeExecutionValues] -->|query unreferenced keys| Q[unreferencedLargeValuePredicate]
    Q --> R[deleteLargeValueKeys]
    R -->|1. delete blobs| S[StorageService.deleteFiles]
    S -->|2. mark DB| T[markLargeValuesDeleted]
    T -->|3. cleanup metadata| U[deleteFileMetadata]
    R -->|if DB fails after storage delete| V[⚠️ phantom key - blob gone, DB not updated]

    style V fill:#ff6666,color:#fff
    style L fill:#ffcc00
Loading

Comments Outside Diff (1)

  1. apps/sim/app/api/tools/postgresql/utils.ts, line 11-28 (link)

    P2 security TOCTOU SSRF window remains open for ssl: 'preferred' connections

    pinIP = config.ssl !== 'preferred' is false when ssl is 'preferred', so host falls back to config.host. The DNS resolution happens again inside postgres(), after validateDatabaseHost has already confirmed the IP. A DNS rebinding attack during that window can redirect the connection to a private address. Encrypted connections mitigate some risk (TLS handshake would fail against a server with a mismatched cert), but plain-fallback scenarios under 'prefer' mode can succeed without TLS if the attacker serves a server that doesn't support it.

Reviews (1): Last reviewed commit: "improvement(kb-connectors): align connec..." | Re-trigger Greptile

Comment thread apps/sim/background/cleanup-logs.ts
Comment thread apps/sim/lib/execution/payloads/store.ts
Comment thread apps/sim/app/api/tools/neo4j/utils.ts
Comment thread apps/sim/lib/execution/payloads/large-value-metadata.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants