perf(execution): parallelize preflight gates, cache deployed state, memoize Anthropic client by waleedlatif1 · Pull Request #5098 · simstudioai/sim

waleedlatif1 · 2026-06-16T18:01:19Z

Summary

Memoize the Anthropic + Azure-Anthropic SDK clients (new providers/anthropic/client-cache.ts) keyed by apiKey (+ beta header; + baseURL/version/pinnedIP for Azure) so HTTP keep-alive connections are reused instead of a fresh TLS handshake per request. apiKey is the tenant boundary.
Parallelize the read-only preflight gates in lib/execution/preprocessing.ts (ban + subscription, then usage + org-member + rate-limit) while preserving the exact error precedence (ban 403 → usage 402 → rate 429) and keeping the sole write (admission reservation) strictly last.
Parallelize the independent workflow-state load and env-var load in execution-core.
Cache deployed workflow state by the immutable deploymentVersionId with deep-clone-on-read, oldest-first eviction, and a 5-min TTL that bounds the credential-mapping edge across ECS tasks.
Parallelize the independent personal-subscription + membership queries in getHighestPrioritySubscription.
BYOK: drop the redundant getWorkspaceById existence check (auth already validates the workspace) and read the key list fresh on every call — BYOK isn't a hot query, so this keeps revocation/rotation effective immediately with zero cross-instance staleness.

Safety / no stale data

Billing, usage, ban, and permission reads are not cached and not routed to the read replica — they stay fresh on the primary, per the dbReplica convention (never for auth/workflow-state/billing-enforcement).
The only caches are: the deployed-state snapshot (keyed by an immutable id, TTL-bounded) and the Anthropic client (per-process, the only correct design for connection reuse).
Preflight parallelization is behavior-identical: same status codes, same error messages, same logging, same precedence, exactly one error-log write per rejection.

Type of Change

Performance improvement (no behavioral change)

Testing

bun run type-check clean
108 affected tests pass (new: Anthropic client cache, BYOK fresh-read + rotation, deployed-state cache hit/clone-isolation/active-version-select/invalidation/TTL, preflight ban-precedence, getHighestPrioritySubscription)
Also fixed a pre-existing vitest class-mock incompatibility that had execution-core.test.ts fully red on staging
bun run check:api-validation passes

Checklist

Code follows project style guidelines
Self-reviewed my changes
Tests added/updated and passing
No new warnings introduced
I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

…emoize Anthropic client - Memoize Anthropic + Azure-Anthropic SDK clients (new client-cache.ts) keyed by apiKey (+beta header; +baseURL/version/pinnedIP for Azure) so HTTP keep-alive connections are reused instead of a fresh TLS handshake per call. apiKey is the tenant boundary. - Parallelize the read-only preflight gates in preprocessing.ts (ban + subscription, then usage + org-member + rate-limit) while preserving exact error precedence (ban 403 -> usage 402 -> rate 429) and keeping the sole write (admission reservation) last. - Parallelize the independent workflow-state and env-var loads in execution-core. - Cache deployed workflow state by immutable deploymentVersionId with deep-clone-on-read, oldest-first eviction, and a 5-min TTL bounding the credential-mapping edge across ECS tasks. - Parallelize the independent personal-subscription + membership queries in getHighestPrioritySubscription. - BYOK: drop the redundant getWorkspaceById existence check (auth already validates the workspace); read the key list fresh every call for zero cross-instance staleness. Billing/usage/ban/permission reads stay fresh on the primary (no cache, no replica). Adds tests for every new mechanism and fixes a pre-existing vitest class-mock incompatibility that had execution-core.test.ts fully red on staging.

vercel · 2026-06-16T18:01:25Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
docs	Skipped		Jun 16, 2026 7:43pm

cursor · 2026-06-16T18:01:32Z

PR Summary

Medium Risk
Changes touch execution gating order (rate limit deferred), deployed-state caching with credential remap TTL, and BYOK lookup behavior; behavior is heavily tested but concurrent gates and cache invalidation warrant careful review on deploy.

Overview
This PR speeds up workflow execution and provider calls by overlapping independent I/O and adding bounded, intentional caches, without changing billing enforcement semantics.

Execution preflight refactors ban, subscription, usage, and rate-limit checks so ban runs in parallel with subscription fetch, usage runs after subscription is available, and rate limiting runs only after ban and usage pass—preserving 403 → 402 → 429 precedence and avoiding rate-limit token consumption on rejected requests.

Workflow core loads deployed/draft workflow state and personal/workspace env vars concurrently before logging and executor construction.

Deployed workflow state is cached in an LRU keyed by immutable deploymentVersionId (deep-clone on read, 5-minute TTL, explicit invalidation) so repeated runs skip expensive migrations while redeploy/rollback still picks up new versions.

Billing plan lookup parallelizes personal subscription and org membership queries; org member usage documents cap-first short-circuiting to skip usage work when uncapped.

BYOK drops a redundant workspace existence check and re-reads keys from the DB on every call so revocation stays immediate across tasks.

Provider SDK clients (Anthropic, Azure Anthropic, Bedrock, vLLM) are memoized via a shared LRU cache keyed by tenant/credential dimensions to reuse keep-alive connections.

Tests cover gate precedence, parallel subscription queries, deployed-state cache behavior, BYOK fresh reads, client-cache isolation, and Vitest 4–compatible class mocks.

^{Reviewed by Cursor Bugbot for commit be0228e. Configure here.}

waleedlatif1 · 2026-06-16T18:09:31Z

@greptile

waleedlatif1 · 2026-06-16T18:09:36Z

@cursor review

greptile-apps · 2026-06-16T18:17:51Z

Greptile Summary

This PR introduces several independent performance improvements across the execution path: it parallelizes the ban+subscription preflight gates, defers the stateful rate-limit gate until those pass (preserving 403→402→429 precedence), and concurrently loads workflow state and env vars inside executeWorkflowCore. It also memoizes provider SDK clients (Anthropic, Azure Anthropic, Bedrock, vLLM) in a shared LRU cache keyed by API key and connection parameters, and caches post-migration deployed workflow state by the immutable deploymentVersionId with deep-clone-on-read and a TTL that bounds credential-remap staleness.

Preflight parallelization (preprocessing.ts): ban check and subscription fetch run concurrently; usage gate runs after subscription resolves; rate-limit gate is still strictly sequential to avoid debiting quota for rejected requests. Error precedence (403 → 402 → 429) is enforced at the aggregation point, and the tests cover all conflict cases.
Provider client cache (providers/client-cache.ts): single LRU cache shared across providers, keyed by API key (tenant boundary) plus all connection-varying parameters; updateAgeOnGet: true makes TTL idle-based so warm connections survive indefinitely.
Deployed-state cache (persistence/utils.ts): LRU cache keyed by immutable deploymentVersionId with absolute TTL (intentionally not idle-expiring) to bound credential-remap staleness; active-version SELECT still runs on every call so rollback/redeploy is immediately visible.

Confidence Score: 5/5

Safe to merge — all read-only gates are parallelized correctly, the stateful rate-limit gate is still strictly sequential, error precedence is preserved, and the deployed-state cache deep-clones on every read to prevent cross-request state corruption.

The preflight refactor preserves observable behavior (same status codes, same error messages, same precedence), the deployed-state cache is keyed on an immutable ID with per-read deep cloning so cached entries cannot be mutated across requests, and the client cache is correctly tenant-isolated by API key. The test suite covers ban precedence conflicts, cache hit/miss/clone isolation, TTL invalidation, and LRU eviction. No correctness issues found.

No files require special attention.

Important Files Changed

Filename	Overview
apps/sim/providers/client-cache.ts	New LRU cache for provider SDK clients; correct idle-TTL config, tenant isolation via API key in key, and test-only clear hook.
apps/sim/lib/execution/preprocessing.ts	Preflight gates parallelized; ban+subscription concurrent, usage sequential after subscription, rate-limit gate strictly last. Error precedence (403→402→429) correctly enforced via null-coalescing at aggregation point.
apps/sim/lib/workflows/persistence/utils.ts	Deployed-state LRU cache added; keyed by immutable deploymentVersionId, deep-clones on read, absolute TTL bounds credential-remap staleness, active-version SELECT still runs every call.
apps/sim/lib/workflows/executor/execution-core.ts	Workflow state load and env-var load now run concurrently; early resolution of personalEnvUserId guards the parallelization correctly.
apps/sim/lib/billing/core/plan.ts	Personal-subscription and membership queries parallelized via Promise.all; org-existence and org-sub follow-ups correctly remain sequential (data dependency).
apps/sim/lib/api-key/byok.ts	Removed redundant getWorkspaceById guard (auth already validates workspace); reads key list fresh on every call for immediate revocation.
apps/sim/providers/anthropic/index.ts	Anthropic client creation routed through getCachedProviderClient; cache key correctly encodes apiKey and beta-header flag.
apps/sim/providers/azure-anthropic/index.ts	Azure Anthropic client cached; pinnedIP extracted into a local variable and included in cache key so IP changes produce a fresh client.
apps/sim/providers/vllm/index.ts	vLLM OpenAI client cached; key encodes apiKey, baseUrl, and pinnedIP so credential or endpoint changes yield a fresh client; DNS validation still runs on every request.
apps/sim/providers/bedrock/index.ts	BedrockRuntimeClient cached; key encodes region and full credential (accessKeyId + secretKey) so a rotated secret produces a fresh client.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant C as Caller
    participant P as preprocessExecution
    participant B as BanCheck
    participant S as SubscriptionFetch
    participant U as UsageCheckTask
    participant RL as RateLimitGate
    participant DB as Database

    C->>P: preprocessExecution(options)
    par Ban + Subscription (concurrent)
        P->>B: getActivelyBannedUserIds()
        B->>DB: SELECT banned_user_ids
        DB-->>B: result
        B-->>P: "GateFailure | null"
    and
        P->>S: getHighestPrioritySubscription()
        S->>DB: SELECT personal_subs + memberships (parallel)
        DB-->>S: results
        S-->>P: subscription
    end
    P->>P: await Promise.all([banCheck, subscriptionFetch])

    P->>U: usageCheckTask (uses subscription)
    U->>DB: checkServerSideUsageLimits
    DB-->>U: usage
    U-->>P: "{failure, snapshot}"

    alt banFailure OR usageFailure
        P->>P: recordPreprocessingError()
        P-->>C: 403 / 402 error response
    else Both pass
        P->>RL: runRateLimitGate() [stateful - sequential]
        RL->>DB: checkRateLimitWithSubscription (consumes token)
        DB-->>RL: result
        alt Rate limit exceeded
            RL-->>P: GateFailure (429)
            P-->>C: 429 error response
        else All gates pass
            RL-->>P: null
            P->>DB: STEP 7 admission reservation
            P-->>C: success
        end
    end

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant C as Caller
    participant P as preprocessExecution
    participant B as BanCheck
    participant S as SubscriptionFetch
    participant U as UsageCheckTask
    participant RL as RateLimitGate
    participant DB as Database

    C->>P: preprocessExecution(options)
    par Ban + Subscription (concurrent)
        P->>B: getActivelyBannedUserIds()
        B->>DB: SELECT banned_user_ids
        DB-->>B: result
        B-->>P: "GateFailure | null"
    and
        P->>S: getHighestPrioritySubscription()
        S->>DB: SELECT personal_subs + memberships (parallel)
        DB-->>S: results
        S-->>P: subscription
    end
    P->>P: await Promise.all([banCheck, subscriptionFetch])

    P->>U: usageCheckTask (uses subscription)
    U->>DB: checkServerSideUsageLimits
    DB-->>U: usage
    U-->>P: "{failure, snapshot}"

    alt banFailure OR usageFailure
        P->>P: recordPreprocessingError()
        P-->>C: 403 / 402 error response
    else Both pass
        P->>RL: runRateLimitGate() [stateful - sequential]
        RL->>DB: checkRateLimitWithSubscription (consumes token)
        DB-->>RL: result
        alt Rate limit exceeded
            RL-->>P: GateFailure (429)
            P-->>C: 429 error response
        else All gates pass
            RL-->>P: null
            P->>DB: STEP 7 admission reservation
            P-->>C: success
        end
    end

_{Reviews (7): Last reviewed commit: "test(execution): make RateLimiter mock c..." | Re-trigger Greptile}

The rate-limit gate is not read-only — checkRateLimitWithSubscription consumes a token — so running it in parallel with the read-only gates debited rate-limit quota for requests that the ban (403) or usage (402) gates reject, which the original sequential flow never did. Move the rate-limit gate to run sequentially after the ban and usage gates pass, preserving the read-only gates' parallelism (ban + subscription + usage) and the exact ban -> usage -> rate precedence. Add regression tests asserting the rate limiter is not consumed when an earlier gate rejects, and is consumed once when they pass. Caught by Cursor Bugbot review.

waleedlatif1 · 2026-06-16T18:19:48Z

@greptile review

waleedlatif1 · 2026-06-16T18:19:50Z

@cursor review

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit 45e0fe6. Configure here.}

Tighten the gate overview to match the sequential rate-limit gate and drop inline notes that duplicated it or the runRateLimitGate doc.

… for deployed state - client-cache: add updateAgeOnGet so the TTL is genuinely idle-based (active clients keep their warm keep-alive connections; the JSDoc now matches behavior). - deployed-state: replace the hand-rolled Map + manual FIFO eviction/TTL with LRUCache (real LRU eviction, built-in TTL), matching the effectiveDecryptedEnv and integration-tool-schema caches. TTL stays absolute (not reset on read) so the credential-migration remap still propagates across ECS tasks. Both per review feedback from Greptile.

waleedlatif1 · 2026-06-16T18:31:25Z

@greptile review

waleedlatif1 · 2026-06-16T18:31:26Z

@cursor review

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit eb074db. Configure here.}

The 'consumes the rate-limit gate once' test reached the STEP 7 admission reservation, which depends on Redis — it passed locally (reserve throws and is swallowed) but failed in CI (reserve returns not-reserved -> 429). Pass skipConcurrencyReservation so the test isolates the rate gate deterministically.

…drock, vllm) Generalize the Anthropic client cache into one shared memoizer (providers/client-cache.ts) and apply it only where each new client owns its own connection pool — so reuse actually keeps connections warm: - bedrock: AWS SDK clients hold a per-client connection pool (reuse is the AWS best practice). Keyed by region + credential identity. - vllm: a pinned endpoint creates its own undici Agent per call; key by the resolved IP so DNS re-validation still runs each request. - anthropic + azure-anthropic: migrated onto the shared memoizer. Deliberately NOT applied to the OpenAI-compatible providers, groq, cerebras, or google: their SDKs share a process-global keep-alive pool (Node openai-sdk module singleton agent; anthropic/global undici), so a fresh client per request already reuses connections and memoization would add complexity with ~no benefit. litellm uses a plain shared-agent client (no pinning) and is likewise skipped. Bounded LRU (max 1000, 30m idle TTL) with no close-on-eviction, avoiding the unbounded-growth and eviction-closes-in-use-client failure modes seen in similar client caches.

waleedlatif1 · 2026-06-16T18:57:02Z

@greptile review

waleedlatif1 · 2026-06-16T18:57:03Z

@cursor review

…y id A corrected secret under the same access key id would otherwise keep serving the stale cached client until TTL/eviction. Caught by Cursor Bugbot.

waleedlatif1 · 2026-06-16T19:08:57Z

@greptile review

waleedlatif1 · 2026-06-16T19:08:58Z

@cursor review

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit 0a269ad. Configure here.}

…r client cache in tests - preprocessing.test: re-establish the checkOrgMemberUsageLimit mock in beforeEach (the only gate mock not re-set). In the full suite its implementation was reset so the success-path test got undefined -> threw -> 500 -> success:false. Mirrors how checkServerSideUsageLimits is handled. - client-cache: add clearProviderClientCacheForTests; call it in the bedrock and vllm test beforeEach so construction assertions always start from a cache miss now that those providers memoize their client.

waleedlatif1 · 2026-06-16T19:30:52Z

@greptile review

waleedlatif1 · 2026-06-16T19:30:53Z

@cursor review

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit de40ca6. Configure here.}

The RateLimiter mock used an arrow factory (vi.fn(() => ({...}))). vitest 4.x (CI) rejects `new` on an arrow-implemented mock ("not a constructor"); 3.2.4 allowed it. The new rate-gate test is the first to actually `new RateLimiter()`, so it surfaced the failure only in CI. Switch the mock to a regular function and drop the speculative beforeEach re-establishments that didn't address it.

waleedlatif1 · 2026-06-16T20:02:14Z

@greptile review

waleedlatif1 · 2026-06-16T20:02:16Z

@cursor review

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit be0228e. Configure here.}

cursor Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread apps/sim/lib/execution/preprocessing.ts Outdated

greptile-apps Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread apps/sim/providers/anthropic/client-cache.ts Outdated

Comment thread apps/sim/lib/workflows/persistence/utils.ts Outdated

Comment thread apps/sim/providers/anthropic/client-cache.ts Outdated

vercel Bot temporarily deployed to Preview June 16, 2026 18:19 Inactive

cursor Bot reviewed Jun 16, 2026

View reviewed changes

chore(execution): trim redundant preflight comments

a258a38

Tighten the gate overview to match the sequential rate-limit gate and drop inline notes that duplicated it or the runRateLimitGate doc.

vercel Bot temporarily deployed to Preview June 16, 2026 18:25 Inactive

vercel Bot temporarily deployed to Preview June 16, 2026 18:30 Inactive

cursor Bot reviewed Jun 16, 2026

View reviewed changes

vercel Bot temporarily deployed to Preview June 16, 2026 18:40 Inactive

vercel Bot temporarily deployed to Preview June 16, 2026 18:56 Inactive

cursor Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread apps/sim/providers/bedrock/index.ts

chore(perf): trim verbose comments to terse why-notes

b6d207f

vercel Bot temporarily deployed to Preview June 16, 2026 19:02 Inactive

chore(perf): drop obvious inline comments, keep nuance as TSDoc

a349fc7

vercel Bot temporarily deployed to Preview June 16, 2026 19:06 Inactive

fix(bedrock): key client cache on full credential, not just access ke…

0a269ad

…y id A corrected secret under the same access key id would otherwise keep serving the stale cached client until TTL/eviction. Caught by Cursor Bugbot.

vercel Bot temporarily deployed to Preview June 16, 2026 19:08 Inactive

cursor Bot reviewed Jun 16, 2026

View reviewed changes

vercel Bot temporarily deployed to Preview June 16, 2026 19:30 Inactive

cursor Bot reviewed Jun 16, 2026

View reviewed changes

vercel Bot temporarily deployed to Preview June 16, 2026 19:43 Inactive

cursor Bot reviewed Jun 16, 2026

View reviewed changes

waleedlatif1 merged commit cc56408 into staging Jun 16, 2026
16 checks passed

waleedlatif1 deleted the perf/preflight-and-anthropic-speedups branch June 16, 2026 20:49

Conversation

waleedlatif1 commented Jun 16, 2026

Summary

Safety / no stale data

Type of Change

Testing

Checklist

Uh oh!

vercel Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

waleedlatif1 commented Jun 16, 2026

Uh oh!

waleedlatif1 commented Jun 16, 2026

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

waleedlatif1 commented Jun 16, 2026

Uh oh!

waleedlatif1 commented Jun 16, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

waleedlatif1 commented Jun 16, 2026

Uh oh!

waleedlatif1 commented Jun 16, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

waleedlatif1 commented Jun 16, 2026

Uh oh!

waleedlatif1 commented Jun 16, 2026

Uh oh!

Uh oh!

waleedlatif1 commented Jun 16, 2026

Uh oh!

waleedlatif1 commented Jun 16, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

waleedlatif1 commented Jun 16, 2026

Uh oh!

waleedlatif1 commented Jun 16, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

waleedlatif1 commented Jun 16, 2026

Uh oh!

waleedlatif1 commented Jun 16, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jun 16, 2026 •

edited

Loading

cursor Bot commented Jun 16, 2026 •

edited

Loading

greptile-apps Bot commented Jun 16, 2026 •

edited

Loading