Skip to content

fix(logs-cleanup): listing active workspaces into mem + download time streaming lims#4692

Open
icecrasher321 wants to merge 5 commits into
stagingfrom
improvement/cleanup-logs-mem-reads
Open

fix(logs-cleanup): listing active workspaces into mem + download time streaming lims#4692
icecrasher321 wants to merge 5 commits into
stagingfrom
improvement/cleanup-logs-mem-reads

Conversation

@icecrasher321
Copy link
Copy Markdown
Collaborator

@icecrasher321 icecrasher321 commented May 21, 2026

Summary

WIP

Fixes #(issue)

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Other: Performance + Memory Safety

Testing

Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link
Copy Markdown

vercel Bot commented May 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 21, 2026 6:01pm

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 21, 2026

PR Summary

Medium Risk
Touches several request/response handling paths (file parsing/uploads, CSV imports, media tools) and execution log persistence, so regressions could affect uploads/downloads and tooling reliability despite being primarily defensive size-limit work.

Overview
Introduces a new stream-limits utility (with tests) and wires it through file parsing, uploads, CSV import routes, and multiple tool proxy endpoints to preflight Content-Length, cap streamed reads, and return 413s instead of materializing oversized payloads.

Updates files/parse to validate file reference shape, cap per-request multi-file parsed output (and per-file remaining download bytes), and enforce download/parsed-output limits across external, cloud, and local sources.

Refactors retention job dispatch to page active workspaces (avoiding loading all into memory), adds per-job concurrencyKey, and scopes log cleanup key-reference checks by workspace.

Adds execution-log compaction to keep stored executionData under ~500KB by summarizing large inputs/outputs and persisting size/truncation metadata.

Reviewed by Cursor Bugbot for commit 0b28132. Configure here.

Comment thread apps/sim/lib/billing/cleanup-dispatcher.ts
Comment thread apps/sim/lib/billing/cleanup-dispatcher.ts
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 21, 2026

Greptile Summary

This PR introduces a new stream-limits.ts utility module providing safe stream/response reading with byte-cap enforcement (PayloadSizeLimitError), and applies it broadly across file parsing, tool execution, storage downloads, and HTTP request handling. It also fixes the cleanup-logs workspace scoping (narrowing the large-value reference check to the batch's own workspace IDs) and refactors the cleanup dispatcher to paginate active-workspace queries instead of loading all rows at once.

  • Stream size limits: New readStreamToBufferWithLimit, readResponseToBufferWithLimit, and related helpers are wired into tool response reading, DocuSign document downloads, Google Slides exports, file parsing routes, Blob/S3 downloads, and HTTP request tool responses, each with appropriate per-context byte caps.
  • Cleanup-logs query scoping: Reference check for large-value storage keys is now scoped to workspace_id = ANY(workspaceIds), avoiding a full-table scan and correctly reflecting that storage keys are workspace-local UUIDs.
  • Execution log compaction: ExecutionLogger now summarizes oversized executionData payloads through a three-tier fallback (summarize → minimal → metadata-only) before persisting, preventing large logs from exceeding the 500 KB stored-payload cap.
  • Cleanup dispatcher pagination: buildCleanupChunks now uses cursor-based pages (WORKSPACE_SCOPE_PAGE_SIZE = 500) rather than a single unbounded query, and maintains chunkCountByPlan across pages for globally unique chunk labels.

Confidence Score: 4/5

The stream-limiting changes are generally safe to merge; the cleanup dispatcher still carries previously flagged unresolved concerns around execution order and sequential enqueue performance.

The core stream-limit plumbing is well-structured and the cleanup-logs workspace scoping is correct. The previously noted issues in cleanup-dispatcher.ts (inline-runner precedence inversion, sequential chunk enqueue, always-extra housekeeping chunk) remain in the code and affect production cleanup job behavior.

apps/sim/lib/billing/cleanup-dispatcher.ts warrants another pass; apps/sim/lib/core/utils/stream-limits.ts has the body-already-read risk on the arrayBuffer→text fallback path.

Important Files Changed

Filename Overview
apps/sim/lib/billing/cleanup-dispatcher.ts Pagination added for workspace listing; multiple previously-flagged concerns remain: inline runner precedence over Trigger.dev, sequential job enqueue, always-extra housekeeping chunk
apps/sim/lib/core/utils/stream-limits.ts New utility providing PayloadSizeLimitError and safe stream/response reading helpers; readResponseToBufferWithLimit has a body-already-read risk on the arrayBuffer→text fallback path (previously flagged)
apps/sim/background/cleanup-logs.ts Workspace-scoped reference check added to filterLargeValueKeysWithoutRetainedReferences; correct given UUID-per-execution storage keys, improves query performance significantly
apps/sim/lib/logs/execution/logger.ts New compactExecutionDataForStorage with three-tier fallback handles oversized log payloads; logic is sound, recordStoredByteSize converges in ≤2 iterations
apps/sim/tools/index.ts Tool response body reading now size-capped at 10MB via readToolResponseBody; shouldRetryWithoutReadingBody consistently mirrors the outer retry check; handleResponseSizeLimitError added for user-friendly error messages
apps/sim/app/api/files/parse/route.ts Multi-file parsing switched from parallel to sequential with output budget tracking; validateFileReferenceShape uses untrimmed filePath.length inconsistently with trimmed prefix checks
apps/sim/lib/uploads/providers/s3/client.ts downloadFromS3 now accepts optional maxBytes, uses readNodeStreamToBufferWithLimit replacing custom stream accumulator; pre-checks ContentLength before reading body
apps/sim/lib/uploads/providers/blob/client.ts downloadFromBlob updated similarly to S3; removes streamToBuffer helper in favor of shared readNodeStreamToBufferWithLimit
apps/sim/lib/core/security/input-validation.server.ts Content-Length pre-check added to secureFetchWithPinnedIP; retryable statuses (429, 5xx) resolve without body instead of throwing PayloadSizeLimitError, allowing caller retry logic to proceed

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Tool Request] --> B{Internal Route?}
    B -- Yes --> C[fetch internal URL]
    B -- No --> D[secureFetchWithPinnedIP\nwith maxResponseBytes]
    C --> E{nullBody or\nshouldRetryWithout\nReadingBody?}
    D --> F{nullBody or\nshouldRetryWithout\nReadingBody?}
    E -- Yes --> G[cancel body\nResponse null]
    E -- No --> H[readToolResponseBody\nmax 10MB]
    F -- Yes --> I[cancel body\nResponse null]
    F -- No --> J[readToolResponseBody\nmax 10MB]
    G --> K{Retryable & not\nlast attempt?}
    I --> K
    H --> K
    J --> K
    K -- Yes --> L[sleep & retry]
    L --> B
    K -- No --> M{response.ok?}
    M -- No --> N[Parse error body\nfrom buffered response]
    M -- Yes --> O[Build success result]
    H -- PayloadSizeLimitError --> P[handleResponseSizeLimitError\nuser-friendly message]
    J -- PayloadSizeLimitError --> P
Loading

Reviews (3): Last reviewed commit: "address comments" | Re-trigger Greptile

Comment thread apps/sim/lib/billing/cleanup-dispatcher.ts Outdated
@icecrasher321
Copy link
Copy Markdown
Collaborator Author

@greptile

@icecrasher321
Copy link
Copy Markdown
Collaborator Author

bugbot run

Comment thread apps/sim/app/api/files/parse/route.ts Outdated
Comment thread apps/sim/background/cleanup-logs.ts
@icecrasher321
Copy link
Copy Markdown
Collaborator Author

@greptile

@icecrasher321
Copy link
Copy Markdown
Collaborator Author

bugbot run

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 0b28132. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant