Skip to content

improvement(redis): strip idempotency body and cap mothership stream zsets#4624

Closed
waleedlatif1 wants to merge 65 commits into
stagingfrom
waleedlatif1/memory-leak-investigation
Closed

improvement(redis): strip idempotency body and cap mothership stream zsets#4624
waleedlatif1 wants to merge 65 commits into
stagingfrom
waleedlatif1/memory-leak-investigation

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

Summary

  • Stop persisting the full workflow result on webhook/polling idempotency keys — as receivers we only need a dedup marker, not a replayable body. Drops per-key size from a fat tail of multi-KB-to-400-KB Gmail/Outlook payloads down to ~150 B.
  • Add the missing ZREMRANGEBYRANK trim to appendEvents so mothership stream ZSETs respect the existing COPILOT_STREAM_EVENT_LIMIT config (was dead code; biggest observed key was 3,755 members and climbing).
  • Together these eliminate the two prefixes that drove Redis Cloud into OOM on 2026-05-15.

Type of Change

  • Bug fix

Testing

Tested manually — audited dedup-path callers (`processor.ts` passes the return to `jobQueue.completeJob` for logging only; polling sites discard it) to confirm body-stripping is observably safe. Updated `buffer.test.ts` to assert the trim is called.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

waleedlatif1 and others added 30 commits April 3, 2026 23:30
…ership workflow edits via sockets, ui improvements
…ration, signup method feature flags, SSO improvements
* feat(posthog): Add tracking on mothership abort (#4023)

Co-authored-by: Theodore Li <theo@sim.ai>

* fix(login): fix captcha headers for manual login  (#4025)

* fix(signup): fix turnstile key loading

* fix(login): fix captcha header passing

* Catch user already exists, remove login form captcha
…nts, secrets performance, polling refactors, drag resources in mothership
…endar triggers, docs updates, integrations/models pages improvements
…mat, logs performance improvements

fix(csp): add missing analytics domains, remove unsafe-eval, fix workspace CSP gap (#4179)
fix(landing): return 404 for invalid dynamic route slugs (#4182)
improvement(seo): optimize sitemaps, robots.txt, and core web vitals across sim and docs (#4170)
fix(gemini): support structured output with tools on Gemini 3 models (#4184)
feat(brightdata): add Bright Data integration with 8 tools (#4183)
fix(mothership): fix superagent credentials (#4185)
fix(logs): close sidebar when selected log disappears from filtered list; cleanup (#4186)
v0.6.46: mothership streaming fixes, brightdata integration
icecrasher321 and others added 25 commits April 27, 2026 12:37
…rizations, mothership positional table row insertion, CI improvements, org-external users, file viewer improvements
v0.6.62: fix new copilot chat creation and selection on refresh
…ixes, db query optimizations, contract boundaries code hygiene, CORS, toast improvements, tables infinite query, executor robustness, reranker support
…ogs block, parallel-in-loop wall clock, gpt-image-2
…s, logs panel width, tables UI/DB decoupling

v0.6.67: VFS upload fix, posthog/copilot correlation, exa date filters, logs panel width, tables UI/DB decoupling
…ering upgrades, data drains, security hardening, paginated dropdowns
…ntegrations, robots.txt update, workday hardening
v0.6.72: billing pool contention fix
…personation fixes, md rendering, doc/pdf/pptx generation improvements
…pentelemetry updates, data drains to snowflake, blob, datadog, bigquery
…ip md polish

v0.6.75: scheduler claim-budget drain, helm chart hardening, mothership md polish
v0.6.78: file block get
@vercel
Copy link
Copy Markdown

vercel Bot commented May 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 16, 2026 0:18am

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 16, 2026

PR Summary

Medium Risk
Moderate risk: changes idempotency semantics for webhookIdempotency/pollingIdempotency so duplicate calls may now return undefined instead of a cached body, which could break any hidden callers relying on the replayed result. Stream trimming is a straightforward Redis ZSET maintenance change but affects retention behavior under load.

Overview
Prevents unbounded Redis growth by trimming mothership stream event history on every appendEvents call via ZREMRANGEBYRANK, enforcing the existing COPILOT_STREAM_EVENT_LIMIT.

Adds storeResultBody to IdempotencyService and configures webhookIdempotency and pollingIdempotency to persist only a small completion marker (not the operation result), so duplicates short-circuit without caching large response payloads. Updates the session buffer test to assert trimming occurs on append.

Reviewed by Cursor Bugbot for commit 832b782. Configure here.

@gitguardian
Copy link
Copy Markdown

gitguardian Bot commented May 16, 2026

⚠️ GitGuardian has uncovered 2 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
29606901 Triggered Generic High Entropy Secret a54dcbe apps/sim/providers/utils.test.ts View secret
32763747 Triggered Generic Password 3e9849b helm/sim/tests/validators_test.yaml View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 16, 2026

Greptile Summary

This PR fixes two Redis memory drivers behind the 2026-05-15 OOM incident: it strips the full workflow result from webhook/polling idempotency keys (only a { success, status } marker is stored), and activates the previously-dead ZREMRANGEBYRANK trim inside appendEvents so mothership stream ZSETs are now bounded by COPILOT_STREAM_EVENT_LIMIT.

  • service.ts: Introduces storeResultBody config option (defaults to true); both webhookIdempotency and pollingIdempotency singletons set it to false. Duplicate calls still short-circuit; callers of those two instances are verified to discard the return value.
  • buffer.ts: Adds pipeline.zremrangebyrank(key, 0, -config.eventLimit - 1) immediately after zadd, trimming the sorted set to the newest eventLimit entries on every append.
  • buffer.test.ts: Flips the trim assertion from not.toHaveBeenCalled (reflecting the previous dead-code state) to a positive expectation that zremrangebyrank is called with the correct key and index pair.

Confidence Score: 5/5

Safe to merge — both changes are targeted, well-reasoned, and address a confirmed production OOM.

The ZREMRANGEBYRANK stop index (-eventLimit - 1) is arithmetically correct: when the set is under the limit the resulting rank is negative and Redis returns an empty range (no-op); at and above the limit it evicts the correct surplus. All callers of webhookIdempotency and pollingIdempotency are await-without-capture, so the undefined returned on duplicate calls is silently discarded exactly as intended. The test mock's handling of negative stop indices matches real Redis semantics. No behavioural regressions are visible on any changed path.

No files require special attention.

Important Files Changed

Filename Overview
apps/sim/lib/core/idempotency/service.ts Adds storeResultBody config flag; sets it to false on webhookIdempotency and pollingIdempotency. All call-sites verified to discard the duplicate-path return value. Logic is correct and well-documented.
apps/sim/lib/copilot/request/session/buffer.ts Activates the previously-dead ZREMRANGEBYRANK trim inside the appendEvents pipeline. Stop index 0, -config.eventLimit - 1 is semantically correct: when the set has ≤ eventLimit members the computed rank stays negative and Redis treats it as an empty range (no-op); when the set exceeds the limit, the oldest surplus members are evicted.
apps/sim/lib/copilot/request/session/buffer.test.ts Test correctly updated: the old assertion that zremrangebyrank is NOT called is replaced with a positive assertion that it IS called with the expected key and a numeric stop index. The mock's handling of negative stop indices correctly mirrors real Redis semantics.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Incoming webhook or polling event] --> B{atomicallyClaim}
    B -->|first time| C[execute operation]
    C --> D{storeResultBody?}
    D -->|true - default| E[store result + body in Redis]
    D -->|false - webhook/polling| F[store marker only: success + status]
    E --> G[return actual result]
    F --> G

    B -->|duplicate - completed| I[return existingResult.result - undefined if storeResultBody=false]
    B -->|duplicate - in-progress| J[waitForResult polls - returns undefined if storeResultBody=false]
    B -->|duplicate - failed| K[throw or retry]

    subgraph trim [appendEvents buffer.ts]
        L[pipeline.zadd] --> M[pipeline.zremrangebyrank 0 to -eventLimit-1 - keeps newest eventLimit entries]
        M --> N[pipeline.expire and set seq]
    end
Loading

Reviews (1): Last reviewed commit: "chore(redis): trim verbose comments on i..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants