Skip to content

improvement(executor): faster, more responsive workflow cancellation#4630

Merged
waleedlatif1 merged 4 commits into
stagingfrom
waleedlatif1/investigate-screenshot
May 16, 2026
Merged

improvement(executor): faster, more responsive workflow cancellation#4630
waleedlatif1 merged 4 commits into
stagingfrom
waleedlatif1/investigate-screenshot

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

Summary

  • Unify cancellation through signalCancelled() so every source (abortSignal, pub/sub, backstop) wakes the wait loops immediately
  • Thread abortSignal from LLM provider tool-call loops through executeTool → internal fetch / secureFetchWithPinnedIP / MCP fetch so in-flight tool HTTP aborts instead of orphaning
  • Replace 500ms Redis polling with Redis pub/sub on execution:cancel; keep the Redis key as durable backstop checked once at engine start
  • Migrate executeTool to an options-object signature across 14 providers, 5 handlers, and copilot integration

Type of Change

  • Improvement

Testing

Tested manually on a parallel workflow that fans out LLM agents calling Exa tools — Stop now flips state within ~tens of ms and in-flight tool HTTP shows as cancelled. 125 targeted unit tests pass, TSC clean, check:api-validation passes.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link
Copy Markdown

vercel Bot commented May 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 16, 2026 1:55am

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 16, 2026

PR Summary

Medium Risk
Medium risk because it changes core workflow execution cancellation semantics (pub/sub subscription, backstop checks, wait-loop waking) and threads abort signals through tool HTTP execution across many callers, which could alter timing and error paths.

Overview
Improves workflow cancellation responsiveness by replacing periodic Redis polling in ExecutionEngine with a shared pub/sub cancellation channel plus a one-time Redis durable backstop check at engine start; all cancellation sources now funnel through signalCancelled() to immediately wake waitForAny/AllExecutions, and the engine unsubscribes on completion.

Refactors executeTool to an options-object API ({ skipPostProcess, executionContext, signal }) and updates block handlers, Copilot tool execution, and all LLM providers to use it; the optional signal is now propagated into internal fetches, SSRF-protected external fetches, and MCP tool calls so in-flight tool HTTP can be aborted. Adds/updates unit tests to cover pub/sub wakeups, unsubscribe cleanup, backstop behavior, and caller-signal abort propagation.

Reviewed by Cursor Bugbot for commit d5f86b2. Configure here.

Comment thread apps/sim/lib/execution/cancellation.ts
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 16, 2026

Greptile Summary

This PR replaces 500 ms Redis-polling cancellation with a unified pub/sub + durable-backstop model, threads AbortSignal from LLM provider tool-call loops all the way through executeTool into internal fetch/secureFetchWithPinnedIP/MCP HTTP calls, and migrates executeTool to an options-object signature across 14 providers, 5 handlers, and the copilot integration.

  • Cancellation architecture: signalCancelled() unifies all three sources (AbortSignal, pub/sub event, Redis backstop); abortPromise wakes waitForAnyExecution and waitForAllExecutions races immediately on cancel.
  • Signal threading: request.abortSignal is now forwarded to every executeTool call inside Anthropic and OpenAI (Responses) tool loops, and executeToolRequest chains the signal into its per-attempt AbortController with correct caller-vs-timeout discrimination.
  • Signature migration: executeTool(id, params, options?) replaces the old positional overload; all call sites in providers, handlers, and copilot fallback are updated.

Confidence Score: 5/5

Safe to merge; the core cancellation path (pub/sub wake → abortPromise → engine returns) is correct and well-tested.

The two observations are bounded edge-cases that don't affect the primary cancellation flow. All three previously-identified issues are confirmed fixed. 125 targeted tests pass and TSC is clean.

engine.ts waitForAllExecutions and tools/index.ts retry-sleep signal handling are worth a second look before adding more complex parallel/Response-block workflows.

Important Files Changed

Filename Overview
apps/sim/executor/execution/engine.ts Core engine refactored to use pub/sub cancellation and abortPromise races; waitForAllExecutions has a minor gap where Promise.allSettled after abort fires has no second cancellation guard
apps/sim/lib/execution/cancellation.ts Clean implementation: durable Redis key written first, then pub/sub published; all three branches (unavailable, success, write-failed) now publish the cancellation event
apps/sim/tools/index.ts executeTool migrated to options-object signature; signal properly threaded to internal fetch/secureFetchWithPinnedIP/MCP; retry backoff sleeps don't check signal.aborted
apps/sim/providers/anthropic/core.ts abortSignal correctly threaded to createMessage() and to both executeTool call sites in the tool-call loops
apps/sim/providers/openai/core.ts abortSignal forwarded to postResponses fetch and to executeTool in the parallel tool-execution map; looks correct
apps/sim/executor/execution/engine.test.ts New pub/sub wake test covers the fast-cancellation path; backstop test tightened to toBe(1) reflecting one-shot behavior
apps/sim/lib/execution/cancellation.test.ts Covers all three markExecutionCancelled branches; verifies write-before-publish ordering and publish-on-redis-failure

Sequence Diagram

sequenceDiagram
    participant Client
    participant CancelAPI
    participant PubSub as PubSubChannel
    participant Redis
    participant Engine as ExecutionEngine
    participant Provider as LLM Provider
    participant ET as executeTool
    participant Fetch as HTTP Fetch

    Client->>CancelAPI: POST /cancel
    CancelAPI->>Redis: "SET execution:cancel:{id}"
    CancelAPI->>PubSub: "publish({executionId})"
    PubSub-->>Engine: subscribe callback fires
    Engine->>Engine: signalCancelled()
    Note over Engine: abortPromise resolves
    Engine->>Engine: waitForAnyExecution wakes
    Engine-->>Client: status cancelled
    Provider->>ET: executeTool with signal
    ET->>Fetch: fetch with chained signal
    Fetch--xProvider: AbortError
Loading

Reviews (2): Last reviewed commit: "chore(tests): rename stale 'cancellation..." | Re-trigger Greptile

Comment thread apps/sim/lib/execution/cancellation.ts
Comment thread apps/sim/executor/execution/engine.test.ts Outdated
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit d5f86b2. Configure here.

@waleedlatif1 waleedlatif1 merged commit f8ae249 into staging May 16, 2026
14 checks passed
@waleedlatif1 waleedlatif1 deleted the waleedlatif1/investigate-screenshot branch May 16, 2026 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant