fix(realtime): re-validate socket role and evict revoked collaborators by waleedlatif1 · Pull Request #5050 · simstudioai/sim

waleedlatif1 · 2026-06-15T04:11:19Z

Summary

Realtime Socket.io authorized workflow access only at join-workflow and cached the workspace role in presence — a removed or downgraded collaborator kept live read/write access until they disconnected.
Re-validate the cached role against the permissions table on mutating events (workflow-operation, subblock-update, variable-update), bounded by a short TTL; refresh the role or evict the socket on change.
Add internal POST /api/permissions-updated so the app reconciles active rooms — evicting revoked users (cross-pod via socketsLeave) and refreshing downgraded roles.
Notify realtime from workspace member removal and permission-change routes.

Type of Change

Bug fix (security)

Testing

Added unit tests: authorizeSocketOperation (TTL/revoke/downgrade/transient-error/pre-upgrade presence), reconcileWorkspaceAccessChange (evict vs. refresh), and the /api/permissions-updated endpoint.
Full realtime suite: 123 passing. tsc clean (realtime + sim). Boundary/prune/api-validation/biome pass.

Checklist

Code follows project style guidelines
Self-reviewed my changes
Tests added/updated and passing
No new warnings introduced
I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

Socket.io authorized workflow access only at join and cached the workspace role in presence, so a removed or downgraded collaborator kept live read/ write access until they disconnected. - Re-validate the cached role against the permissions table on mutating events, bounded by a short TTL; refresh or evict on change - Add /api/permissions-updated so the app reconciles active rooms, evicting revoked users (cross-pod) and refreshing downgraded roles - Notify realtime on workspace member removal and permission changes

vercel · 2026-06-15T04:11:24Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
docs	Skipped		Jun 15, 2026 4:11am

cursor · 2026-06-15T04:11:27Z

PR Summary

High Risk
Changes authorization and eviction on the live collaboration path; mistakes could wrongly evict users or briefly allow stale write access, though TTL re-validation and push reconciliation are designed to limit exposure.

Overview
Fixes a gap where workflow access was only checked at join-workflow, so removed or downgraded collaborators could keep mutating over an open socket until they disconnected.

Per-operation authorization: Mutating paths (workflow-operation, subblock-update, variable-update) now call authorizeSocketOperation, which trusts the cached presence role for 15s then re-reads the live permissions table, updates presence via updateUserRole, and on full revocation returns ACCESS_REVOKED and runs evictRevokedSocket. Transient DB errors during re-validation fall back to the cached role instead of locking users out.

Push reconciliation: The main app calls notifyWorkspaceAccessChanged → realtime POST /api/permissions-updated → reconcileWorkspaceAccessChange, which for active rooms in the workspace either evicts all of the user’s sockets (with workflow-permissions-revoked, cross-pod socketsLeave) or refreshes the cached role on downgrade. Wired from workspace permission PATCH (only when the role actually changed) and member removal.

Presence now tracks roleCheckedAt on join; Redis gets an atomic Lua role update so role refreshes don’t race activity updates.

^{Reviewed by Cursor Bugbot for commit 5c9d711. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 5c9d711. Configure here.}

cursor · 2026-06-15T04:12:30Z

 return 1
 `

+/**


Activity update clobbers role

Medium Severity

Concurrent updateUserActivity and updateUserRole both read-modify-write the full presence JSON in Redis. If activity was read before a role refresh but written after, the stale snapshot can overwrite a downgraded or refreshed role and roleCheckedAt, so mutating ops may skip revalidation until the TTL expires.

Additional Locations (1)

apps/realtime/src/rooms/redis-manager.ts#L104-L121

^{Reviewed by Cursor Bugbot for commit 5c9d711. Configure here.}

Not a race here — both updateUserActivity and updateUserRole run as EVALSHA Lua scripts, and Redis executes Lua atomically (script execution is serialized; nothing interleaves between a script's HGET and HSET).

UPDATE_ACTIVITY_SCRIPT re-reads the presence blob inside its own atomic block (HGET → cjson.decode → set only cursor/selection/lastActivity → HSET cjson.encode), so it always preserves whatever role/roleCheckedAt are current at that instant. A concurrent UPDATE_ROLE_SCRIPT can't run between activity's read and write, so it can't be clobbered by a stale snapshot.

The only staleness window is the intended ≤15s role-revalidation TTL (and the push reconciliation makes full revocation immediate). Leaving the scripts as-is.

greptile-apps · 2026-06-15T04:19:03Z

Greptile Summary

This PR closes a security gap where a revoked or downgraded collaborator kept live Socket.IO read/write access until they disconnected, because the socket role was only checked once at join-workflow. The fix adds two complementary layers: a TTL-bounded re-validation (15 s) on every mutating socket event and a push-based reconciliation endpoint that the main app calls immediately after a permission change.

Pull-based (TTL): authorizeSocketOperation in permissions.ts re-queries the permissions table when the cached roleCheckedAt is stale; a DB error falls back to the cached role so legitimate users aren't locked out during a blip.
Push-based: notifyWorkspaceAccessChanged is called from both the member-removal and permission-update routes; the realtime server reconciles all active workflow rooms via reconcileWorkspaceAccessChange, evicting sockets where access was fully revoked and refreshing the cached role for downgrades.
Redis: A new atomic Lua script (UPDATE_ROLE_SCRIPT) patches role and roleCheckedAt in the presence hash without clobbering concurrent activity updates.

Confidence Score: 3/5

The core eviction and TTL re-validation logic is correct and well-tested, but notifyWorkspaceAccessChanged lacks a request timeout while being synchronously awaited inside the permission-change REST handlers — a stalled realtime connection can hang those API responses indefinitely.

The security fix itself is sound and the two-layer design (pull TTL + push reconcile) is solid. The main concern is that the new HTTP notification helper is awaited in both the member-removal and permission-update routes without any AbortSignal deadline. In a scenario where the realtime server accepts the TCP connection but stalls, the admin-facing REST API response hangs with no bound. The redundant per-workflow DB queries in reconcileWorkspaceAccessChange and the TTL thundering-herd are real but lower-urgency operational concerns.

apps/sim/lib/workspaces/permissions/realtime.ts (missing fetch timeout) and apps/realtime/src/rooms/access.ts (redundant DB queries per workflow room).

Important Files Changed

Filename	Overview
apps/realtime/src/middleware/permissions.ts	New `authorizeSocketOperation` function adds TTL-bounded DB re-validation of cached roles; thundering herd possible at TTL boundary for sockets with burst traffic.
apps/sim/lib/workspaces/permissions/realtime.ts	New best-effort notifier to the realtime server; lacks an `AbortSignal` timeout, so a stalled realtime connection can block the permission-change API response indefinitely.
apps/realtime/src/rooms/access.ts	New `reconcileWorkspaceAccessChange` function; calls `verifyWorkflowAccess` per workflow room (two DB queries each) when a single workspace-permission lookup would suffice for all workflows.
apps/realtime/src/routes/http.ts	New `POST /api/permissions-updated` endpoint; protected by the existing `checkInternalApiKey` guard but accepts empty strings for `workspaceId`/`userId`.
apps/realtime/src/rooms/redis-manager.ts	New `updateUserRole` Lua script atomically patches `role` and `roleCheckedAt` in the presence hash; NOSCRIPT retry path is correct.
apps/realtime/src/handlers/operations.ts	Replaces synchronous `checkRolePermission` with async `authorizeSocketOperation`; eviction path on `accessRevoked` is complete and correct.

Sequence Diagram

sequenceDiagram
    participant Admin as Admin (PATCH /permissions)
    participant SimAPI as sim API
    participant RealtimeHTTP as Realtime HTTP
    participant RoomManager as RoomManager
    participant Socket as Revoked Socket

    Admin->>SimAPI: PATCH /api/workspaces/[id]/permissions
    SimAPI->>SimAPI: Commit DB transaction (update permissions row)
    SimAPI->>RealtimeHTTP: POST /api/permissions-updated
    RealtimeHTTP->>RoomManager: handleWorkspaceAccessChange()
    RoomManager->>RoomManager: reconcileWorkspaceAccessChange()
    RoomManager->>SimAPI: verifyWorkflowAccess(userId, workflowId)
    alt Access revoked
        RoomManager->>Socket: emit(workflow-permissions-revoked)
        RoomManager->>RoomManager: removeUserFromRoom()
        RoomManager->>Socket: socketsLeave(workflowId)
    else Role downgraded
        RoomManager->>RoomManager: updateUserRole() [Lua atomic]
        RoomManager->>RoomManager: broadcastPresenceUpdate()
    end
    RealtimeHTTP-->>SimAPI: 200 OK
    SimAPI-->>Admin: 200 OK

    Note over Socket: TTL path on next mutating event
    Socket->>RoomManager: workflow-operation
    RoomManager->>RoomManager: authorizeSocketOperation()
    alt roleCheckedAt stale
        RoomManager->>SimAPI: verifyWorkflowAccess() [DB re-validation]
        alt Access revoked
            RoomManager->>Socket: emit(ACCESS_REVOKED)
            RoomManager->>RoomManager: evictRevokedSocket()
        else Role changed
            RoomManager->>RoomManager: updateUserRole() [Lua atomic]
        end
    end

_{Reviews (1): Last reviewed commit: "fix(realtime): re-validate socket role a..." | Re-trigger Greptile}

greptile-apps · 2026-06-15T04:19:06Z

+    const response = await fetch(`${getSocketServerUrl()}/api/permissions-updated`, {
+      method: 'POST',
+      headers: {
+        'Content-Type': 'application/json',
+        'x-api-key': env.INTERNAL_API_SECRET,
+      },
+      body: JSON.stringify({ workspaceId, userId }),
+    })
+
+    if (!response.ok) {
+      logger.warn(
+        `Failed to notify realtime of access change for user ${userId} in workspace ${workspaceId} (${response.status})`
+      )
+    }


Missing fetch timeout blocks the caller

notifyWorkspaceAccessChanged is awaited synchronously in both the PATCH /api/workspaces/[id]/permissions handler and the DELETE /api/workspaces/members/[id] handler. If the realtime server accepts the TCP connection but never sends a response (e.g., it is under load, mid-restart, or hanging), fetch has no built-in deadline and the permission-change HTTP response will hang indefinitely — contradicting the "failures are logged but never block the permission mutation" promise in the JSDoc.

Add an AbortSignal.timeout() to bound the worst case: signal: AbortSignal.timeout(3000)

greptile-apps · 2026-06-15T04:19:07Z

+      const access = await verifyWorkflowAccess(userId, workflowId)
+
+      if (access.hasAccess) {
+        const role = access.role || 'read'
+        for (const target of targets) {
+          await manager.updateUserRole(workflowId, target.socketId, role)
+        }
+        await manager.broadcastPresenceUpdate(workflowId)
+        logger.info(
+          `Refreshed cached role to '${role}' for user ${userId} on workflow ${workflowId}`
+        )
+        continue


Redundant per-workflow DB queries for workspace-level permission

verifyWorkflowAccess issues two DB queries per call: one to look up the workflow row and one to check the workspace permission via authorizeWorkflowByWorkspacePermission. Because workspace permission is a single entry in the permissions table that applies uniformly to every workflow in the workspace, calling it inside the per-workflow loop means the same permission row is fetched N times — once for each active room the user is present in. For a user simultaneously in 10+ workflow rooms, reconciliation makes 20+ sequential DB round-trips when 2 would suffice.

A targeted optimization would check the workspace-level access once before the loop and skip the per-workflow permission query, using only the workflow-existence check inside the loop.

greptile-apps · 2026-06-15T04:19:08Z

+        if (typeof workspaceId !== 'string' || typeof userId !== 'string') {
+          sendError(res, 'workspaceId and userId are required', 400)
+          return
+        }


Empty strings bypass the type check and reach the room manager

typeof workspaceId !== 'string' accepts "" as valid, which would then call handleWorkspaceAccessChange('', ''). The DB query filters by workspaceId, so it returns 0 rows and the call is harmless today — but an empty-string guard costs nothing and makes the contract explicit.

Suggested change

if (typeof workspaceId !== 'string' || typeof userId !== 'string') {

sendError(res, 'workspaceId and userId are required', 400)

return

}

if (typeof workspaceId !== 'string' || typeof userId !== 'string' ||

!workspaceId || !userId) {

sendError(res, 'workspaceId and userId are required', 400)

return

}

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-06-15T04:19:09Z

+  const isStale = Date.now() - lastChecked >= ROLE_REVALIDATION_TTL_MS
+
+  if (isStale) {
+    try {
+      const access = await verifyWorkflowAccess(userId, workflowId)
+      if (!access.hasAccess) {
+        return {
+          allowed: false,
+          role,
+          reason: 'Access to this workflow has been revoked',
+          accessRevoked: true,
+        }
+      }
+      role = access.role || 'read'
+      await roomManager.updateUserRole(workflowId, socketId, role)
+    } catch (error) {
+      logger.warn(
+        `Failed to re-validate role for user ${userId} on workflow ${workflowId}; reusing cached role`,
+        error
+      )
+    }
+  }


Thundering herd of DB queries at each TTL boundary

When a socket is actively sending events (e.g., a large paste generates tens of workflow-operation frames in rapid succession), all inflight handlers snapshot presence.roleCheckedAt before any of them can write it back. Every handler sees isStale = true and concurrently calls verifyWorkflowAccess (two DB round-trips each). The first response to return then updates the room-manager timestamp, but the remaining in-flight queries are already underway.

Because TTL expiry is aligned to wall-clock time and batched edits are common in collaborative editors, this burst fires on every 15-second boundary for active sockets. The side-effect is harmless but wastes DB connections. A per-socket in-memory flag (e.g., a Set keyed by socketId) that is cleared on completion could collapse concurrent revalidations to one query.

cursor Bot reviewed Jun 15, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 15, 2026

View reviewed changes

waleedlatif1 merged commit 4ab8760 into staging Jun 15, 2026
15 checks passed

waleedlatif1 deleted the worktree-fix+realtime-stale-role-revocation branch June 15, 2026 04:19

This was referenced Jun 15, 2026

Revert "fix(realtime): re-validate socket role and evict revoked collaborators" #5051

Merged

v0.7.7: square, context.dev integrations, scheduled tasks styling changes, code hygiene #5064

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(realtime): re-validate socket role and evict revoked collaborators#5050

fix(realtime): re-validate socket role and evict revoked collaborators#5050
waleedlatif1 merged 1 commit into
stagingfrom
worktree-fix+realtime-stale-role-revocation

waleedlatif1 commented Jun 15, 2026

Uh oh!

vercel Bot commented Jun 15, 2026

Uh oh!

cursor Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 15, 2026

Uh oh!

waleedlatif1 Jun 15, 2026

Uh oh!

greptile-apps Bot commented Jun 15, 2026

Uh oh!

greptile-apps Bot Jun 15, 2026

Uh oh!

greptile-apps Bot Jun 15, 2026

Uh oh!

greptile-apps Bot Jun 15, 2026

Uh oh!

greptile-apps Bot Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

waleedlatif1 commented Jun 15, 2026

Summary

Type of Change

Testing

Checklist

Uh oh!

vercel Bot commented Jun 15, 2026

Uh oh!

cursor Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 15, 2026

Choose a reason for hiding this comment

Activity update clobbers role

Uh oh!

waleedlatif1 Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented Jun 15, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cursor Bot commented Jun 15, 2026 •

edited

Loading