Skip to content

perf(execution): parallelize preflight gates, cache deployed state, memoize Anthropic client#5098

Merged
waleedlatif1 merged 11 commits into
stagingfrom
perf/preflight-and-anthropic-speedups
Jun 16, 2026
Merged

perf(execution): parallelize preflight gates, cache deployed state, memoize Anthropic client#5098
waleedlatif1 merged 11 commits into
stagingfrom
perf/preflight-and-anthropic-speedups

Conversation

@waleedlatif1

Copy link
Copy Markdown
Collaborator

Summary

  • Memoize the Anthropic + Azure-Anthropic SDK clients (new providers/anthropic/client-cache.ts) keyed by apiKey (+ beta header; + baseURL/version/pinnedIP for Azure) so HTTP keep-alive connections are reused instead of a fresh TLS handshake per request. apiKey is the tenant boundary.
  • Parallelize the read-only preflight gates in lib/execution/preprocessing.ts (ban + subscription, then usage + org-member + rate-limit) while preserving the exact error precedence (ban 403 → usage 402 → rate 429) and keeping the sole write (admission reservation) strictly last.
  • Parallelize the independent workflow-state load and env-var load in execution-core.
  • Cache deployed workflow state by the immutable deploymentVersionId with deep-clone-on-read, oldest-first eviction, and a 5-min TTL that bounds the credential-mapping edge across ECS tasks.
  • Parallelize the independent personal-subscription + membership queries in getHighestPrioritySubscription.
  • BYOK: drop the redundant getWorkspaceById existence check (auth already validates the workspace) and read the key list fresh on every call — BYOK isn't a hot query, so this keeps revocation/rotation effective immediately with zero cross-instance staleness.

Safety / no stale data

  • Billing, usage, ban, and permission reads are not cached and not routed to the read replica — they stay fresh on the primary, per the dbReplica convention (never for auth/workflow-state/billing-enforcement).
  • The only caches are: the deployed-state snapshot (keyed by an immutable id, TTL-bounded) and the Anthropic client (per-process, the only correct design for connection reuse).
  • Preflight parallelization is behavior-identical: same status codes, same error messages, same logging, same precedence, exactly one error-log write per rejection.

Type of Change

  • Performance improvement (no behavioral change)

Testing

  • bun run type-check clean
  • 108 affected tests pass (new: Anthropic client cache, BYOK fresh-read + rotation, deployed-state cache hit/clone-isolation/active-version-select/invalidation/TTL, preflight ban-precedence, getHighestPrioritySubscription)
  • Also fixed a pre-existing vitest class-mock incompatibility that had execution-core.test.ts fully red on staging
  • bun run check:api-validation passes

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

…emoize Anthropic client

- Memoize Anthropic + Azure-Anthropic SDK clients (new client-cache.ts) keyed
  by apiKey (+beta header; +baseURL/version/pinnedIP for Azure) so HTTP
  keep-alive connections are reused instead of a fresh TLS handshake per call.
  apiKey is the tenant boundary.
- Parallelize the read-only preflight gates in preprocessing.ts (ban +
  subscription, then usage + org-member + rate-limit) while preserving exact
  error precedence (ban 403 -> usage 402 -> rate 429) and keeping the sole
  write (admission reservation) last.
- Parallelize the independent workflow-state and env-var loads in execution-core.
- Cache deployed workflow state by immutable deploymentVersionId with
  deep-clone-on-read, oldest-first eviction, and a 5-min TTL bounding the
  credential-mapping edge across ECS tasks.
- Parallelize the independent personal-subscription + membership queries in
  getHighestPrioritySubscription.
- BYOK: drop the redundant getWorkspaceById existence check (auth already
  validates the workspace); read the key list fresh every call for zero
  cross-instance staleness.

Billing/usage/ban/permission reads stay fresh on the primary (no cache, no
replica). Adds tests for every new mechanism and fixes a pre-existing vitest
class-mock incompatibility that had execution-core.test.ts fully red on staging.
@vercel

vercel Bot commented Jun 16, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Jun 16, 2026 7:43pm

Request Review

@cursor

cursor Bot commented Jun 16, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Changes touch execution gating order (rate limit deferred), deployed-state caching with credential remap TTL, and BYOK lookup behavior; behavior is heavily tested but concurrent gates and cache invalidation warrant careful review on deploy.

Overview
This PR speeds up workflow execution and provider calls by overlapping independent I/O and adding bounded, intentional caches, without changing billing enforcement semantics.

Execution preflight refactors ban, subscription, usage, and rate-limit checks so ban runs in parallel with subscription fetch, usage runs after subscription is available, and rate limiting runs only after ban and usage pass—preserving 403 → 402 → 429 precedence and avoiding rate-limit token consumption on rejected requests.

Workflow core loads deployed/draft workflow state and personal/workspace env vars concurrently before logging and executor construction.

Deployed workflow state is cached in an LRU keyed by immutable deploymentVersionId (deep-clone on read, 5-minute TTL, explicit invalidation) so repeated runs skip expensive migrations while redeploy/rollback still picks up new versions.

Billing plan lookup parallelizes personal subscription and org membership queries; org member usage documents cap-first short-circuiting to skip usage work when uncapped.

BYOK drops a redundant workspace existence check and re-reads keys from the DB on every call so revocation stays immediate across tasks.

Provider SDK clients (Anthropic, Azure Anthropic, Bedrock, vLLM) are memoized via a shared LRU cache keyed by tenant/credential dimensions to reuse keep-alive connections.

Tests cover gate precedence, parallel subscription queries, deployed-state cache behavior, BYOK fresh reads, client-cache isolation, and Vitest 4–compatible class mocks.

Reviewed by Cursor Bugbot for commit be0228e. Configure here.

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/lib/execution/preprocessing.ts Outdated
@greptile-apps

greptile-apps Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR introduces several independent performance improvements across the execution path: it parallelizes the ban+subscription preflight gates, defers the stateful rate-limit gate until those pass (preserving 403→402→429 precedence), and concurrently loads workflow state and env vars inside executeWorkflowCore. It also memoizes provider SDK clients (Anthropic, Azure Anthropic, Bedrock, vLLM) in a shared LRU cache keyed by API key and connection parameters, and caches post-migration deployed workflow state by the immutable deploymentVersionId with deep-clone-on-read and a TTL that bounds credential-remap staleness.

  • Preflight parallelization (preprocessing.ts): ban check and subscription fetch run concurrently; usage gate runs after subscription resolves; rate-limit gate is still strictly sequential to avoid debiting quota for rejected requests. Error precedence (403 → 402 → 429) is enforced at the aggregation point, and the tests cover all conflict cases.
  • Provider client cache (providers/client-cache.ts): single LRU cache shared across providers, keyed by API key (tenant boundary) plus all connection-varying parameters; updateAgeOnGet: true makes TTL idle-based so warm connections survive indefinitely.
  • Deployed-state cache (persistence/utils.ts): LRU cache keyed by immutable deploymentVersionId with absolute TTL (intentionally not idle-expiring) to bound credential-remap staleness; active-version SELECT still runs on every call so rollback/redeploy is immediately visible.

Confidence Score: 5/5

Safe to merge — all read-only gates are parallelized correctly, the stateful rate-limit gate is still strictly sequential, error precedence is preserved, and the deployed-state cache deep-clones on every read to prevent cross-request state corruption.

The preflight refactor preserves observable behavior (same status codes, same error messages, same precedence), the deployed-state cache is keyed on an immutable ID with per-read deep cloning so cached entries cannot be mutated across requests, and the client cache is correctly tenant-isolated by API key. The test suite covers ban precedence conflicts, cache hit/miss/clone isolation, TTL invalidation, and LRU eviction. No correctness issues found.

No files require special attention.

Important Files Changed

Filename Overview
apps/sim/providers/client-cache.ts New LRU cache for provider SDK clients; correct idle-TTL config, tenant isolation via API key in key, and test-only clear hook.
apps/sim/lib/execution/preprocessing.ts Preflight gates parallelized; ban+subscription concurrent, usage sequential after subscription, rate-limit gate strictly last. Error precedence (403→402→429) correctly enforced via null-coalescing at aggregation point.
apps/sim/lib/workflows/persistence/utils.ts Deployed-state LRU cache added; keyed by immutable deploymentVersionId, deep-clones on read, absolute TTL bounds credential-remap staleness, active-version SELECT still runs every call.
apps/sim/lib/workflows/executor/execution-core.ts Workflow state load and env-var load now run concurrently; early resolution of personalEnvUserId guards the parallelization correctly.
apps/sim/lib/billing/core/plan.ts Personal-subscription and membership queries parallelized via Promise.all; org-existence and org-sub follow-ups correctly remain sequential (data dependency).
apps/sim/lib/api-key/byok.ts Removed redundant getWorkspaceById guard (auth already validates workspace); reads key list fresh on every call for immediate revocation.
apps/sim/providers/anthropic/index.ts Anthropic client creation routed through getCachedProviderClient; cache key correctly encodes apiKey and beta-header flag.
apps/sim/providers/azure-anthropic/index.ts Azure Anthropic client cached; pinnedIP extracted into a local variable and included in cache key so IP changes produce a fresh client.
apps/sim/providers/vllm/index.ts vLLM OpenAI client cached; key encodes apiKey, baseUrl, and pinnedIP so credential or endpoint changes yield a fresh client; DNS validation still runs on every request.
apps/sim/providers/bedrock/index.ts BedrockRuntimeClient cached; key encodes region and full credential (accessKeyId + secretKey) so a rotated secret produces a fresh client.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant C as Caller
    participant P as preprocessExecution
    participant B as BanCheck
    participant S as SubscriptionFetch
    participant U as UsageCheckTask
    participant RL as RateLimitGate
    participant DB as Database

    C->>P: preprocessExecution(options)
    par Ban + Subscription (concurrent)
        P->>B: getActivelyBannedUserIds()
        B->>DB: SELECT banned_user_ids
        DB-->>B: result
        B-->>P: "GateFailure | null"
    and
        P->>S: getHighestPrioritySubscription()
        S->>DB: SELECT personal_subs + memberships (parallel)
        DB-->>S: results
        S-->>P: subscription
    end
    P->>P: await Promise.all([banCheck, subscriptionFetch])

    P->>U: usageCheckTask (uses subscription)
    U->>DB: checkServerSideUsageLimits
    DB-->>U: usage
    U-->>P: "{failure, snapshot}"

    alt banFailure OR usageFailure
        P->>P: recordPreprocessingError()
        P-->>C: 403 / 402 error response
    else Both pass
        P->>RL: runRateLimitGate() [stateful - sequential]
        RL->>DB: checkRateLimitWithSubscription (consumes token)
        DB-->>RL: result
        alt Rate limit exceeded
            RL-->>P: GateFailure (429)
            P-->>C: 429 error response
        else All gates pass
            RL-->>P: null
            P->>DB: STEP 7 admission reservation
            P-->>C: success
        end
    end
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant C as Caller
    participant P as preprocessExecution
    participant B as BanCheck
    participant S as SubscriptionFetch
    participant U as UsageCheckTask
    participant RL as RateLimitGate
    participant DB as Database

    C->>P: preprocessExecution(options)
    par Ban + Subscription (concurrent)
        P->>B: getActivelyBannedUserIds()
        B->>DB: SELECT banned_user_ids
        DB-->>B: result
        B-->>P: "GateFailure | null"
    and
        P->>S: getHighestPrioritySubscription()
        S->>DB: SELECT personal_subs + memberships (parallel)
        DB-->>S: results
        S-->>P: subscription
    end
    P->>P: await Promise.all([banCheck, subscriptionFetch])

    P->>U: usageCheckTask (uses subscription)
    U->>DB: checkServerSideUsageLimits
    DB-->>U: usage
    U-->>P: "{failure, snapshot}"

    alt banFailure OR usageFailure
        P->>P: recordPreprocessingError()
        P-->>C: 403 / 402 error response
    else Both pass
        P->>RL: runRateLimitGate() [stateful - sequential]
        RL->>DB: checkRateLimitWithSubscription (consumes token)
        DB-->>RL: result
        alt Rate limit exceeded
            RL-->>P: GateFailure (429)
            P-->>C: 429 error response
        else All gates pass
            RL-->>P: null
            P->>DB: STEP 7 admission reservation
            P-->>C: success
        end
    end
Loading

Reviews (7): Last reviewed commit: "test(execution): make RateLimiter mock c..." | Re-trigger Greptile

Comment thread apps/sim/providers/anthropic/client-cache.ts Outdated
Comment thread apps/sim/lib/workflows/persistence/utils.ts Outdated
Comment thread apps/sim/providers/anthropic/client-cache.ts Outdated
The rate-limit gate is not read-only — checkRateLimitWithSubscription consumes
a token — so running it in parallel with the read-only gates debited rate-limit
quota for requests that the ban (403) or usage (402) gates reject, which the
original sequential flow never did.

Move the rate-limit gate to run sequentially after the ban and usage gates pass,
preserving the read-only gates' parallelism (ban + subscription + usage) and the
exact ban -> usage -> rate precedence. Add regression tests asserting the rate
limiter is not consumed when an earlier gate rejects, and is consumed once when
they pass.

Caught by Cursor Bugbot review.
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile review

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 45e0fe6. Configure here.

Tighten the gate overview to match the sequential rate-limit gate and drop
inline notes that duplicated it or the runRateLimitGate doc.
… for deployed state

- client-cache: add updateAgeOnGet so the TTL is genuinely idle-based (active
  clients keep their warm keep-alive connections; the JSDoc now matches behavior).
- deployed-state: replace the hand-rolled Map + manual FIFO eviction/TTL with
  LRUCache (real LRU eviction, built-in TTL), matching the effectiveDecryptedEnv
  and integration-tool-schema caches. TTL stays absolute (not reset on read) so
  the credential-migration remap still propagates across ECS tasks.

Both per review feedback from Greptile.
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile review

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit eb074db. Configure here.

The 'consumes the rate-limit gate once' test reached the STEP 7 admission
reservation, which depends on Redis — it passed locally (reserve throws and is
swallowed) but failed in CI (reserve returns not-reserved -> 429). Pass
skipConcurrencyReservation so the test isolates the rate gate deterministically.
…drock, vllm)

Generalize the Anthropic client cache into one shared memoizer
(providers/client-cache.ts) and apply it only where each new client owns its own
connection pool — so reuse actually keeps connections warm:

- bedrock: AWS SDK clients hold a per-client connection pool (reuse is the AWS
  best practice). Keyed by region + credential identity.
- vllm: a pinned endpoint creates its own undici Agent per call; key by the
  resolved IP so DNS re-validation still runs each request.
- anthropic + azure-anthropic: migrated onto the shared memoizer.

Deliberately NOT applied to the OpenAI-compatible providers, groq, cerebras, or
google: their SDKs share a process-global keep-alive pool (Node openai-sdk module
singleton agent; anthropic/global undici), so a fresh client per request already
reuses connections and memoization would add complexity with ~no benefit. litellm
uses a plain shared-agent client (no pinning) and is likewise skipped.

Bounded LRU (max 1000, 30m idle TTL) with no close-on-eviction, avoiding the
unbounded-growth and eviction-closes-in-use-client failure modes seen in similar
client caches.
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile review

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/providers/bedrock/index.ts
…y id

A corrected secret under the same access key id would otherwise keep serving the
stale cached client until TTL/eviction. Caught by Cursor Bugbot.
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile review

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 0a269ad. Configure here.

…r client cache in tests

- preprocessing.test: re-establish the checkOrgMemberUsageLimit mock in beforeEach
  (the only gate mock not re-set). In the full suite its implementation was reset
  so the success-path test got undefined -> threw -> 500 -> success:false. Mirrors
  how checkServerSideUsageLimits is handled.
- client-cache: add clearProviderClientCacheForTests; call it in the bedrock and
  vllm test beforeEach so construction assertions always start from a cache miss
  now that those providers memoize their client.
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile review

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit de40ca6. Configure here.

The RateLimiter mock used an arrow factory (vi.fn(() => ({...}))). vitest 4.x
(CI) rejects `new` on an arrow-implemented mock ("not a constructor"); 3.2.4
allowed it. The new rate-gate test is the first to actually `new RateLimiter()`,
so it surfaced the failure only in CI. Switch the mock to a regular function and
drop the speculative beforeEach re-establishments that didn't address it.
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile review

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit be0228e. Configure here.

@waleedlatif1 waleedlatif1 merged commit cc56408 into staging Jun 16, 2026
16 checks passed
@waleedlatif1 waleedlatif1 deleted the perf/preflight-and-anthropic-speedups branch June 16, 2026 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant