Skip to content

improvement(executor): subflows, hitl handling cleanup #4604

Open
icecrasher321 wants to merge 10 commits into
stagingfrom
improvement/subflow-orch-cleanup
Open

improvement(executor): subflows, hitl handling cleanup #4604
icecrasher321 wants to merge 10 commits into
stagingfrom
improvement/subflow-orch-cleanup

Conversation

@icecrasher321
Copy link
Copy Markdown
Collaborator

Summary

Refactors parallel batching onto explicit DAG control flow while hardening nested subflow execution, pause/resume, run-from-block, and output resolution edge cases.

Type of Change

  • Other: Code quality

Testing

Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link
Copy Markdown

vercel Bot commented May 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 15, 2026 7:13pm

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 14, 2026

PR Summary

High Risk
Touches core DAG construction/execution, including edge activation/cascade logic, parallel batching control flow, and pause snapshot state; regressions could stall workflows or mis-route subflow exits/continues. Added tests reduce risk but changes span multiple critical runtime paths.

Overview
Refactors subflow handling to use unified subflowId/subflowType metadata for loop/parallel sentinels and nodes, replacing ad-hoc loopId/parallelId/isParallelSentinel fields and centralizing sentinel node creation.

Makes parallel batching explicit in DAG control flow by wiring parallel_continue back-edges, preparing batches at parallel sentinel start, and advancing batch state at sentinel end (instead of dynamically queuing expanded nodes). Adds “start→end” exit bypass edges for empty/skip-at-start subflows and tightens nested subflow edge wiring/terminal detection using shared control-handle sets.

Hardens pause/resume by serializing/restoring deactivatedEdges plus “nodes with activated edges”, normalizing legacy edge keys, fixing edge-clearing to be source-based (no prefix collisions), and marking resumed pause targets as activated before readiness checks. Improves output scoping for cloned/branch nodes (including writing outputs to global branch IDs) and updates loop/parallel orchestrators to handle skip-at-start loops and batch continuation correctly.

Reviewed by Cursor Bugbot for commit 99634ae. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 14, 2026

Greptile Summary

This PR refactors the workflow execution engine by unifying subflow metadata (loopId/parallelId/isParallelSentinelsubflowId/subflowType), moving parallel batch expansion from scope-initialization time to sentinel-start time via prepareCurrentBatch, and adding snapshot serialization support for deactivatedEdges/nodesWithActivatedEdge to enable correct HITL resume across parallel branches.

  • Metadata unification: NodeMetadata now uses subflowId/subflowType pair instead of three separate fields, simplifying sentinel identification across loops and parallels.
  • Batch-on-demand expansion: prepareCurrentBatch is called at sentinel-start rather than scope-init, enabling correct handling of 0-iteration parallels/loops via addSubflowStartExitBypass.
  • HITL buffer patching: updateResumeOutputInAggregationBuffers patches both loop currentIterationOutputs and parallel branchOutputs in the snapshot so resumed workflows have correct aggregation state.

Confidence Score: 3/5

Not safe to merge without investigating the handleParentSubflowCompletion P1 regression for nested subflows inside parallels

The P1 finding in handleParentSubflowCompletion affects result correctness for any workflow that nests a subflow (loop or parallel) inside a parallel branch with additional downstream blocks. The O(n) scan in getScopedBlockOutput is a P2 concern. The rest of the refactor is well-structured and the metadata unification and batch-on-demand expansion are sound.

apps/sim/executor/orchestrators/node.ts (handleParentSubflowCompletion logic) and apps/sim/executor/execution/state.ts (getScopedBlockOutput scan)

Important Files Changed

Filename Overview
apps/sim/executor/orchestrators/node.ts Adds handleParentSubflowCompletion to register nested subflow results in parent parallel scope; P1 issue where non-terminal nested subflows inject extra entries into branchOutputs
apps/sim/executor/execution/state.ts getBlockOutput rewritten with branch-aware getScopedBlockOutput; O(n) scan on hot path is a P2 performance concern
apps/sim/executor/execution/edge-manager.ts Edge key format migrated to JSON array; adds restoreDeactivatedEdges, markNodeWithActivatedEdge, and normalizeSerializedEdgeKey for legacy snapshot compatibility
apps/sim/executor/execution/engine.ts handleNodeCompletion now called in _pauseMetadata path; serializePauseSnapshot receives edgeManager parameter; pendingDynamicNodes queue removed
apps/sim/executor/execution/executor.ts restoreSnapshotParallelBatches returns RestoredClonedSubflowInfo[]; registerRestoredClonedSubflows rebuilds subflowParentMap for HITL resume
apps/sim/executor/orchestrators/parallel.ts prepareCurrentBatch moved to sentinel-start time; advanceToNextBatch replaces async scheduleNextBatch; initializeParallelScope no longer expands batches
apps/sim/executor/dag/construction/edges.ts addSubflowStartExitBypass creates sentinel-start→sentinel-end bypass for 0-iteration subflows; addEdge gains registerIncoming parameter
apps/sim/executor/utils/parallel-expansion.ts cloneDAGNode defers edge wiring to remapClonedEdges; wireSentinelEdges rebuilds incomingEdges from scratch; collectInternalEdgeTopology extracted
apps/sim/lib/workflows/executor/human-in-the-loop-manager.ts updateResumeOutputInAggregationBuffers patches loop and parallel aggregation buffers in snapshot for correct HITL resume
apps/sim/executor/dag/types.ts NodeMetadata unified: isParallelSentinel/parallelId/loopId replaced by subflowType/subflowId pair; isActive removed from DAGEdge
apps/sim/executor/orchestrators/loop.ts 0-iteration loops now set skippedAtStart=true instead of emitting empty events; evaluateLoopContinuation checks flag and creates exit result
apps/sim/executor/dag/construction/sentinels.ts New shared factory createSubflowSentinelNode used by both loop and parallel sentinel constructors
apps/sim/executor/variables/resolvers/block.ts blockIdsInSubflows guard prevents cross-subflow fallback lookups; setNodeOutput writes to both local and global branch IDs

Sequence Diagram

sequenceDiagram
    participant E as Engine
    participant SE as SentinelStart
    participant PO as ParallelOrchestrator
    participant B as BranchBlocks
    participant EE as SentinelEnd
    participant NO as NodeOrchestrator

    E->>SE: processNode(sentinelStart)
    SE->>PO: prepareCurrentBatch(ctx, scope)
    PO->>PO: expandParallel() — clone DAG nodes
    PO->>PO: remapClonedEdges() — wire edges
    PO->>PO: markDirty(cloned nodes)
    SE-->>E: sentinelStart complete
    loop For each branch
        E->>B: processNode(branchBlock)
        B-->>E: output registered in blockStates
    end
    E->>EE: processNode(sentinelEnd)
    EE->>NO: handleParentSubflowCompletion()
    NO->>PO: handleParallelBranchCompletion(branchIndex)
    PO->>PO: branchOutputs[i].push(results)
    alt More batches remaining
        EE->>SE: PARALLEL_CONTINUE back-edge
        SE->>PO: prepareCurrentBatch() — next batch
    else All batches done
        EE-->>E: parallel.results assembled
    end
Loading

Comments Outside Diff (1)

  1. apps/sim/executor/orchestrators/node.ts, line 541-585 (link)

    P1 handleParentSubflowCompletion injects extra entries into branchOutputs for non-terminal nested subflows

    When a parallel body contains a subflow followed by additional blocks (e.g., [innerLoop → agentBlock]), handleParentSubflowCompletion fires on the inner loop's sentinel-end and pushes {results:[...]} into branchOutputs[branchIndex]. Later, agentBlock completes and appends its own output to the same branch slot, producing [{results:[...]}, {agentOutput}] instead of [{agentOutput}]. This corrupts parallel.results[i] for any consumer that expects a flat per-block output array for the branch.

Reviews (6): Last reviewed commit: "fix type issue" | Re-trigger Greptile

Comment thread apps/sim/lib/workflows/executor/human-in-the-loop-manager.ts Outdated
Comment thread apps/sim/lib/workflows/executor/human-in-the-loop-manager.ts Outdated
Comment thread apps/sim/executor/orchestrators/parallel.ts
@icecrasher321
Copy link
Copy Markdown
Collaborator Author

bugbot run

@icecrasher321
Copy link
Copy Markdown
Collaborator Author

@greptile

Comment thread apps/sim/executor/orchestrators/parallel.ts
Comment thread apps/sim/executor/orchestrators/node.ts
@icecrasher321
Copy link
Copy Markdown
Collaborator Author

@greptile

@icecrasher321
Copy link
Copy Markdown
Collaborator Author

bugbot run

Comment thread apps/sim/executor/execution/state.ts
@icecrasher321
Copy link
Copy Markdown
Collaborator Author

@greptile

Comment thread apps/sim/executor/orchestrators/parallel.ts
Comment thread apps/sim/lib/workflows/executor/human-in-the-loop-manager.ts
Comment thread apps/sim/executor/execution/state.ts
@icecrasher321
Copy link
Copy Markdown
Collaborator Author

@greptile

@icecrasher321
Copy link
Copy Markdown
Collaborator Author

bugbot run

Comment thread apps/sim/executor/execution/state.ts
Comment thread apps/sim/executor/dag/construction/edges.ts
Comment thread apps/sim/executor/execution/executor.ts Outdated
Comment thread apps/sim/executor/orchestrators/parallel.ts
@icecrasher321
Copy link
Copy Markdown
Collaborator Author

bugbot run

@icecrasher321
Copy link
Copy Markdown
Collaborator Author

@greptile

Comment thread apps/sim/executor/execution/edge-manager.ts
@icecrasher321
Copy link
Copy Markdown
Collaborator Author

bugbot run

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 99634ae. Configure here.


if (output._pauseMetadata) {
await this.nodeOrchestrator.handleNodeCompletion(this.context, nodeId, output)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paused HITL parallel branches record wrong output

High Severity

handleNodeCompletion is now called for paused (HITL) blocks before the pause is recorded. When a HITL block is a parallel branch node (isParallelBranch: true), this causes handleParallelBranchCompletion to push { _pauseMetadata: ... } into scope.branchOutputs for that branch. Because scope.branchOutputs is serialized into the pause snapshot, it persists across resume. When the workflow resumes and the parallel sentinel end eventually fires, aggregateParallelResults uses the stored pause metadata as the branch's actual output — not the human's response — producing incorrect aggregated results. The new test for this feature (records paused block completion before returning paused result) uses a node with no isParallelBranch metadata, so this path is not exercised.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 99634ae. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant