perf(execution): parallelize preflight gates, cache deployed state, memoize Anthropic client#5098
Conversation
…emoize Anthropic client - Memoize Anthropic + Azure-Anthropic SDK clients (new client-cache.ts) keyed by apiKey (+beta header; +baseURL/version/pinnedIP for Azure) so HTTP keep-alive connections are reused instead of a fresh TLS handshake per call. apiKey is the tenant boundary. - Parallelize the read-only preflight gates in preprocessing.ts (ban + subscription, then usage + org-member + rate-limit) while preserving exact error precedence (ban 403 -> usage 402 -> rate 429) and keeping the sole write (admission reservation) last. - Parallelize the independent workflow-state and env-var loads in execution-core. - Cache deployed workflow state by immutable deploymentVersionId with deep-clone-on-read, oldest-first eviction, and a 5-min TTL bounding the credential-mapping edge across ECS tasks. - Parallelize the independent personal-subscription + membership queries in getHighestPrioritySubscription. - BYOK: drop the redundant getWorkspaceById existence check (auth already validates the workspace); read the key list fresh every call for zero cross-instance staleness. Billing/usage/ban/permission reads stay fresh on the primary (no cache, no replica). Adds tests for every new mechanism and fixes a pre-existing vitest class-mock incompatibility that had execution-core.test.ts fully red on staging.
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryMedium Risk Overview Execution preflight refactors ban, subscription, usage, and rate-limit checks so ban runs in parallel with subscription fetch, usage runs after subscription is available, and rate limiting runs only after ban and usage pass—preserving 403 → 402 → 429 precedence and avoiding rate-limit token consumption on rejected requests. Workflow core loads deployed/draft workflow state and personal/workspace env vars concurrently before logging and executor construction. Deployed workflow state is cached in an LRU keyed by immutable Billing plan lookup parallelizes personal subscription and org membership queries; org member usage documents cap-first short-circuiting to skip usage work when uncapped. BYOK drops a redundant workspace existence check and re-reads keys from the DB on every call so revocation stays immediate across tasks. Provider SDK clients (Anthropic, Azure Anthropic, Bedrock, vLLM) are memoized via a shared LRU cache keyed by tenant/credential dimensions to reuse keep-alive connections. Tests cover gate precedence, parallel subscription queries, deployed-state cache behavior, BYOK fresh reads, client-cache isolation, and Vitest 4–compatible class mocks. Reviewed by Cursor Bugbot for commit be0228e. Configure here. |
|
@greptile |
|
@cursor review |
Greptile SummaryThis PR introduces several independent performance improvements across the execution path: it parallelizes the ban+subscription preflight gates, defers the stateful rate-limit gate until those pass (preserving 403→402→429 precedence), and concurrently loads workflow state and env vars inside
Confidence Score: 5/5Safe to merge — all read-only gates are parallelized correctly, the stateful rate-limit gate is still strictly sequential, error precedence is preserved, and the deployed-state cache deep-clones on every read to prevent cross-request state corruption. The preflight refactor preserves observable behavior (same status codes, same error messages, same precedence), the deployed-state cache is keyed on an immutable ID with per-read deep cloning so cached entries cannot be mutated across requests, and the client cache is correctly tenant-isolated by API key. The test suite covers ban precedence conflicts, cache hit/miss/clone isolation, TTL invalidation, and LRU eviction. No correctness issues found. No files require special attention. Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant C as Caller
participant P as preprocessExecution
participant B as BanCheck
participant S as SubscriptionFetch
participant U as UsageCheckTask
participant RL as RateLimitGate
participant DB as Database
C->>P: preprocessExecution(options)
par Ban + Subscription (concurrent)
P->>B: getActivelyBannedUserIds()
B->>DB: SELECT banned_user_ids
DB-->>B: result
B-->>P: "GateFailure | null"
and
P->>S: getHighestPrioritySubscription()
S->>DB: SELECT personal_subs + memberships (parallel)
DB-->>S: results
S-->>P: subscription
end
P->>P: await Promise.all([banCheck, subscriptionFetch])
P->>U: usageCheckTask (uses subscription)
U->>DB: checkServerSideUsageLimits
DB-->>U: usage
U-->>P: "{failure, snapshot}"
alt banFailure OR usageFailure
P->>P: recordPreprocessingError()
P-->>C: 403 / 402 error response
else Both pass
P->>RL: runRateLimitGate() [stateful - sequential]
RL->>DB: checkRateLimitWithSubscription (consumes token)
DB-->>RL: result
alt Rate limit exceeded
RL-->>P: GateFailure (429)
P-->>C: 429 error response
else All gates pass
RL-->>P: null
P->>DB: STEP 7 admission reservation
P-->>C: success
end
end
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant C as Caller
participant P as preprocessExecution
participant B as BanCheck
participant S as SubscriptionFetch
participant U as UsageCheckTask
participant RL as RateLimitGate
participant DB as Database
C->>P: preprocessExecution(options)
par Ban + Subscription (concurrent)
P->>B: getActivelyBannedUserIds()
B->>DB: SELECT banned_user_ids
DB-->>B: result
B-->>P: "GateFailure | null"
and
P->>S: getHighestPrioritySubscription()
S->>DB: SELECT personal_subs + memberships (parallel)
DB-->>S: results
S-->>P: subscription
end
P->>P: await Promise.all([banCheck, subscriptionFetch])
P->>U: usageCheckTask (uses subscription)
U->>DB: checkServerSideUsageLimits
DB-->>U: usage
U-->>P: "{failure, snapshot}"
alt banFailure OR usageFailure
P->>P: recordPreprocessingError()
P-->>C: 403 / 402 error response
else Both pass
P->>RL: runRateLimitGate() [stateful - sequential]
RL->>DB: checkRateLimitWithSubscription (consumes token)
DB-->>RL: result
alt Rate limit exceeded
RL-->>P: GateFailure (429)
P-->>C: 429 error response
else All gates pass
RL-->>P: null
P->>DB: STEP 7 admission reservation
P-->>C: success
end
end
Reviews (7): Last reviewed commit: "test(execution): make RateLimiter mock c..." | Re-trigger Greptile |
The rate-limit gate is not read-only — checkRateLimitWithSubscription consumes a token — so running it in parallel with the read-only gates debited rate-limit quota for requests that the ban (403) or usage (402) gates reject, which the original sequential flow never did. Move the rate-limit gate to run sequentially after the ban and usage gates pass, preserving the read-only gates' parallelism (ban + subscription + usage) and the exact ban -> usage -> rate precedence. Add regression tests asserting the rate limiter is not consumed when an earlier gate rejects, and is consumed once when they pass. Caught by Cursor Bugbot review.
|
@greptile review |
|
@cursor review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 45e0fe6. Configure here.
Tighten the gate overview to match the sequential rate-limit gate and drop inline notes that duplicated it or the runRateLimitGate doc.
… for deployed state - client-cache: add updateAgeOnGet so the TTL is genuinely idle-based (active clients keep their warm keep-alive connections; the JSDoc now matches behavior). - deployed-state: replace the hand-rolled Map + manual FIFO eviction/TTL with LRUCache (real LRU eviction, built-in TTL), matching the effectiveDecryptedEnv and integration-tool-schema caches. TTL stays absolute (not reset on read) so the credential-migration remap still propagates across ECS tasks. Both per review feedback from Greptile.
|
@greptile review |
|
@cursor review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit eb074db. Configure here.
The 'consumes the rate-limit gate once' test reached the STEP 7 admission reservation, which depends on Redis — it passed locally (reserve throws and is swallowed) but failed in CI (reserve returns not-reserved -> 429). Pass skipConcurrencyReservation so the test isolates the rate gate deterministically.
…drock, vllm) Generalize the Anthropic client cache into one shared memoizer (providers/client-cache.ts) and apply it only where each new client owns its own connection pool — so reuse actually keeps connections warm: - bedrock: AWS SDK clients hold a per-client connection pool (reuse is the AWS best practice). Keyed by region + credential identity. - vllm: a pinned endpoint creates its own undici Agent per call; key by the resolved IP so DNS re-validation still runs each request. - anthropic + azure-anthropic: migrated onto the shared memoizer. Deliberately NOT applied to the OpenAI-compatible providers, groq, cerebras, or google: their SDKs share a process-global keep-alive pool (Node openai-sdk module singleton agent; anthropic/global undici), so a fresh client per request already reuses connections and memoization would add complexity with ~no benefit. litellm uses a plain shared-agent client (no pinning) and is likewise skipped. Bounded LRU (max 1000, 30m idle TTL) with no close-on-eviction, avoiding the unbounded-growth and eviction-closes-in-use-client failure modes seen in similar client caches.
|
@greptile review |
|
@cursor review |
…y id A corrected secret under the same access key id would otherwise keep serving the stale cached client until TTL/eviction. Caught by Cursor Bugbot.
|
@greptile review |
|
@cursor review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 0a269ad. Configure here.
…r client cache in tests - preprocessing.test: re-establish the checkOrgMemberUsageLimit mock in beforeEach (the only gate mock not re-set). In the full suite its implementation was reset so the success-path test got undefined -> threw -> 500 -> success:false. Mirrors how checkServerSideUsageLimits is handled. - client-cache: add clearProviderClientCacheForTests; call it in the bedrock and vllm test beforeEach so construction assertions always start from a cache miss now that those providers memoize their client.
|
@greptile review |
|
@cursor review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit de40ca6. Configure here.
The RateLimiter mock used an arrow factory (vi.fn(() => ({...}))). vitest 4.x
(CI) rejects `new` on an arrow-implemented mock ("not a constructor"); 3.2.4
allowed it. The new rate-gate test is the first to actually `new RateLimiter()`,
so it surfaced the failure only in CI. Switch the mock to a regular function and
drop the speculative beforeEach re-establishments that didn't address it.
|
@greptile review |
|
@cursor review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit be0228e. Configure here.
Summary
providers/anthropic/client-cache.ts) keyed byapiKey(+ beta header; + baseURL/version/pinnedIP for Azure) so HTTP keep-alive connections are reused instead of a fresh TLS handshake per request.apiKeyis the tenant boundary.lib/execution/preprocessing.ts(ban + subscription, then usage + org-member + rate-limit) while preserving the exact error precedence (ban 403 → usage 402 → rate 429) and keeping the sole write (admission reservation) strictly last.execution-core.deploymentVersionIdwith deep-clone-on-read, oldest-first eviction, and a 5-min TTL that bounds the credential-mapping edge across ECS tasks.getHighestPrioritySubscription.getWorkspaceByIdexistence check (auth already validates the workspace) and read the key list fresh on every call — BYOK isn't a hot query, so this keeps revocation/rotation effective immediately with zero cross-instance staleness.Safety / no stale data
dbReplicaconvention (never for auth/workflow-state/billing-enforcement).Type of Change
Testing
bun run type-checkcleangetHighestPrioritySubscription)execution-core.test.tsfully red on stagingbun run check:api-validationpassesChecklist