Skip to content

improvement(redis): strip idempotency body and cap mothership stream zsets#4625

Merged
waleedlatif1 merged 3 commits into
stagingfrom
waleedlatif1/memory-leak-investigation
May 16, 2026
Merged

improvement(redis): strip idempotency body and cap mothership stream zsets#4625
waleedlatif1 merged 3 commits into
stagingfrom
waleedlatif1/memory-leak-investigation

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

Summary

Two targeted fixes that reduce Redis memory pressure following the 2026-05-15 OOM incident:

  • Idempotency body stripping (apps/sim/lib/core/idempotency/service.ts): webhook + polling idempotency now store only a { success, status } marker instead of the full response body. As a webhook receiver, the provider's retry only needs a 2xx — not our cached response. TTLs unchanged (webhook 7d to cover Gmail/Pub-Sub retry window, polling 3d).
  • Mothership stream ZSET cap (apps/sim/lib/copilot/request/session/buffer.ts): appendEvents now calls ZREMRANGEBYRANK to enforce eventLimit (default 5,000) on every append. Previously the limit was dead code — streams grew unbounded until the 1h TTL fired.

Both are additive and backward-compatible. New storeResultBody config flag defaults to true so callers (Stripe-style idempotency) that need the cached body are unaffected.

Test plan

  • bun run lint clean
  • Inverted buffer test asserts zremrangebyrank is called on every append
  • Verify in staging that webhook idempotency keys are ~50 bytes vs ~500-1700 bytes
  • Verify mothership stream ZSETs cap at eventLimit under load

@vercel
Copy link
Copy Markdown

vercel Bot commented May 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 16, 2026 0:28am

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 16, 2026

PR Summary

Medium Risk
Moderate risk because it changes idempotency behavior for webhookIdempotency/pollingIdempotency (duplicates may now return undefined instead of a cached body) and enforces hard trimming of stream replay history, which could affect consumers relying on older events.

Overview
Reduces Redis memory pressure by making idempotency storage optionally omit operation result bodies via a new storeResultBody flag (default true), and configuring webhookIdempotency + pollingIdempotency to persist only a lightweight completion marker.

Also caps mothership stream event history by trimming the Redis ZSET to eventLimit on every appendEvents call (ZREMRANGEBYRANK), with the buffer test updated to assert trimming happens on each append.

Reviewed by Cursor Bugbot for commit 78b3d96. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 16, 2026

Greptile Summary

Two targeted Redis memory fixes following a 2026-05-15 OOM incident: the mothership stream buffer now enforces its eventLimit cap on every append (previously dead code let ZSETs grow unbounded until the 1-hour TTL fired), and the webhook/polling idempotency services now store only a { success, status } marker instead of the full response body.

  • buffer.ts: A single ZREMRANGEBYRANK key 0 -(eventLimit+1) is inserted into the appendEvents pipeline immediately after ZADD, trimming the oldest entries so the ZSET is capped at eventLimit (default 5,000) on every write. The test pins the exact stop argument (-5_001) to guard the off-by-one boundary.
  • service.ts: New storeResultBody flag (defaults true to preserve Stripe-style callers) is set to false on webhookIdempotency and pollingIdempotency; duplicate calls still short-circuit but the cached value returned on the completed/in-progress paths will be undefined as T for these two instances.

Confidence Score: 4/5

Safe to merge for the buffer cap change; the idempotency body-stripping is intentionally lossy for webhook and polling paths where callers don't read the return value, but that assumption isn't enforced by the type system.

The pollingIdempotency change is the one place worth a second look: duplicate polling calls that arrive while the first is in-progress or after it completes will receive undefined as T from executeWithIdempotency. TypeScript's cast suppresses the type error, so any polling handler that reads the resolved value will silently receive undefined on the duplicate path without a compile-time signal.

apps/sim/lib/core/idempotency/service.ts — specifically any callers of pollingIdempotency.executeWithIdempotency that read the resolved value

Important Files Changed

Filename Overview
apps/sim/lib/copilot/request/session/buffer.ts Adds ZREMRANGEBYRANK to the append pipeline to enforce eventLimit on every write; previously the limit was declared but never applied, causing unbounded ZSET growth
apps/sim/lib/copilot/request/session/buffer.test.ts Inverts the trimming assertion to verify zremrangebyrank is called with exact arguments (key, 0, -5_001) on every append, pinning the off-by-one boundary
apps/sim/lib/core/idempotency/service.ts Adds storeResultBody config flag (default true); webhookIdempotency and pollingIdempotency are set to false, storing only a { success, status } marker instead of the full response body to reduce Redis memory pressure

Reviews (2): Last reviewed commit: "test(buffer): pin exact ZREMRANGEBYRANK ..." | Re-trigger Greptile

Comment thread apps/sim/lib/copilot/request/session/buffer.test.ts
Pinning -5_001 (= -(DEFAULT_EVENT_LIMIT) - 1) so the off-by-one
boundary is directly validated; expect.any(Number) would have passed
a wrong formula like -eventLimit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

Re Greptile's outside-diff note on waitForResult returning undefined as T when storeResultBody: false:

Audited all callers and this is safe:

  • All pollingIdempotency callers (gmail.ts, outlook.ts, imap.ts, rss.ts, google-drive.ts, google-calendar.ts, google-sheets.ts) invoke as bare await pollingIdempotency.executeWithIdempotency(...) and discard the return value.
  • webhookIdempotency caller (background/webhook-execution.ts:268) returns the value to jobQueue.completeJob(jobId, output). A duplicate-delivery returning undefined here just stores undefined as the BullMQ job output, which is benign — the original delivery already executed the workflow and recorded its real output.

The contract is also documented on the storeResultBody field: "Duplicate calls still short-circuit but resolve to undefined". No code change needed.

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 78b3d96. Configure here.

@waleedlatif1 waleedlatif1 merged commit 93f7be4 into staging May 16, 2026
14 checks passed
@waleedlatif1 waleedlatif1 deleted the waleedlatif1/memory-leak-investigation branch May 16, 2026 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant