Skip to content

improvement(sockets): workflow switching state machine#4104

Merged
icecrasher321 merged 2 commits intostagingfrom
improvement/sockets-fsm
Apr 11, 2026
Merged

improvement(sockets): workflow switching state machine#4104
icecrasher321 merged 2 commits intostagingfrom
improvement/sockets-fsm

Conversation

@icecrasher321
Copy link
Copy Markdown
Collaborator

@icecrasher321 icecrasher321 commented Apr 11, 2026

Summary

Workflow switching state machine to prevent timing based issues.

Type of Change

  • Bug fix
  • Other: Reliability improvement

Testing

Tested manually. Add tests for state machine.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 11, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Apr 11, 2026 1:43am

Request Review

@cursor
Copy link
Copy Markdown

cursor bot commented Apr 11, 2026

PR Summary

Medium Risk
Medium risk because it refactors core realtime room-join and operation emission paths; regressions could break collaboration presence/updates or leave users stuck without realtime despite being connected.

Overview
Introduces a SocketJoinController state machine (with capped exponential backoff retries) to coordinate workflow room membership and ignore stale join results during rapid workflow switching.

Updates the client socket-provider to use explicit vs route workflow targets, gate presence/cursor/selection/state events to the currently visible workflow, and prevent emitting operations/updates when the socket is not joined to the intended room (including changing emitWorkflowOperation to require an explicit workflowId).

Improves UX around realtime disruptions by adding an isRetryingWorkflowJoin flag, surfacing a unified “Reconnecting…”/“Joining workflow…” notification, and expanding server join-workflow-error payloads to include workflowId, code, and retryable (with new tests covering the join controller/target helpers, operation queue gating, and server handler errors).

Reviewed by Cursor Bugbot for commit e5f55af. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 11, 2026

Greptile Summary

This PR introduces a SocketJoinController state machine to coordinate Socket.IO room membership during workflow switching, replacing the previous ad-hoc join/leave logic. The controller centralises all join, leave, retry, and blocked-workflow logic into a pure, testable class that emits SocketJoinCommand objects, while the provider simply executes those commands. The refactor eliminates a class of race conditions where rapid workflow switching could leave the client in the wrong room or apply stale server responses.

The overall architecture is well-designed and comprehensively tested. Two minor cleanups are worth addressing before merge: a misleading log message when workflow visibility guards fire, and a dead retriesExhausted field in the error-result interface that is always false.

Confidence Score: 5/5

Safe to merge; both remaining findings are P2 style/cleanup suggestions with no impact on runtime correctness.

The core state-machine design is sound, all key race-condition scenarios are covered by unit tests, and the refactor correctly preserves operation-cancellation semantics on workflow leave. The two open findings are a misleading log level and a dead interface field — neither affects runtime behavior.

No files require special attention; the two style suggestions are in socket-provider.tsx (log message) and socket-join-controller.ts (dead interface field).

Important Files Changed

Filename Overview
apps/sim/app/workspace/providers/socket-join-controller.ts New state machine class; logic is sound and well-tested, but retriesExhausted in SocketJoinErrorResult is always false — either exhaustion logic is missing or the field is dead code.
apps/sim/app/workspace/providers/socket-join-target.ts Small utility with clear semantics for resolving the target workflow from route vs explicit IDs; no issues.
apps/sim/app/workspace/providers/socket-provider.tsx Major refactor to use the state-machine commands; overall correct, but emitSubblockUpdate / emitVariableUpdate log 'no socket connection' even when the guard fires for a workflow-ID mismatch or visibility issue.
apps/sim/app/workspace/providers/socket-join-controller.test.ts Comprehensive unit tests covering all key state-machine transitions (rapid switching, stale success, delete, retry, backoff); no issues.
apps/sim/stores/operation-queue/store.ts Updated emit-function signatures to accept workflowId as first param; currentRegisteredWorkflowId module-level guard is unchanged and correct.
apps/sim/app/workspace/[workspaceId]/w/[workflowId]/components/cursors/cursors.tsx Correctly hides remote cursors during workflow transitions by comparing currentWorkflowId (socket-joined) against activeWorkflowId (registry); no issues.

Sequence Diagram

sequenceDiagram
    participant UI as UI / URL change
    participant SP as SocketProvider
    participant JC as SocketJoinController
    participant S as Socket.IO Server

    UI->>SP: urlWorkflowId changes → requestWorkflow("B")
    SP->>JC: requestWorkflow("B")
    JC-->>SP: [join "B"]
    SP->>S: emit join-workflow{B}
    Note over SP: pendingJoin="B", currentWorkflowId still "A"<br/>Cursors hidden (activeId≠currentId)

    alt Rapid switch to C before response
        UI->>SP: requestWorkflow("C")
        SP->>JC: requestWorkflow("C")
        JC-->>SP: [] (pending for B, flush blocked)
        S-->>SP: join-workflow-success{B}
        SP->>JC: handleJoinSuccess("B")
        JC-->>SP: [join "C"] (desired≠B → ignored)
        SP->>S: emit join-workflow{C}
        S-->>SP: join-workflow-success{C}
        SP->>JC: handleJoinSuccess("C")
        JC-->>SP: [] (apply=true)
        SP->>SP: setCurrentWorkflowId("C")
    else Retryable error
        S-->>SP: join-workflow-error{B, retryable:true}
        SP->>JC: handleJoinError({B, retryable:true})
        JC-->>SP: [schedule-retry{B, delay}]
        SP->>SP: setTimeout → retryJoin("B")
        SP->>JC: retryJoin("B")
        JC-->>SP: [join "B"]
        SP->>S: emit join-workflow{B}
    end
Loading

Reviews (1): Last reviewed commit: "improvement(sockets): workflow switching..." | Re-trigger Greptile

@icecrasher321
Copy link
Copy Markdown
Collaborator Author

bugbot run

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit e5f55af. Configure here.

@icecrasher321 icecrasher321 merged commit bad78cc into staging Apr 11, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant