improvement(execution, connectors): offload large function inputs, increase connector limits + better error propagation by icecrasher321 · Pull Request #5089 · simstudioai/sim

icecrasher321 · 2026-06-16T03:42:06Z

Summary

Fixes a class of 10 MB limit failures across workflow execution and KB connectors.

Function blocks: over-budget resolved block-output context values are offloaded to durable large-value refs and lazily re-read in the sandbox (sim.values.read), so a JS function can merge medium files without busting the 10 MB inter-block request-body cap (the original "Seedance" merge failure).
KB connectors — never silent: oversized files now surface as visible failed KB documents (with a reason) instead of being silently dropped — at listing time (GitHub/S3/Dropbox/OneDrive/SharePoint) and fetch time (GitLab/Azure/Google Drive via a shared ConnectorFileTooLargeError).
KB connectors — memory safety: unbounded response.text() downloads replaced with streaming readBodyWithLimit (cancels past the cap; closes a Dropbox OOM/DoS gap).
KB connectors — cap raised: per-file limit raised from a hardcoded 10 MB to the canonical 100 MB KB document limit (CONNECTOR_MAX_FILE_BYTES), except Google Drive's export path (Google's hard 10 MB export-API limit).
Sync engine: classifyExternalDoc classification, bulk skipDocuments (failed rows, excluded from the stuck-doc retry sweep), byte-bounded batch concurrency so the raised cap can't OOM the worker, and a metadata.fileSize ?? size fallback so skipped rows show the real size.

Type of Change

Bug fix

Testing

WiP

Checklist

Code follows project style guidelines
Self-reviewed my changes
Tests added/updated and passing
No new warnings introduced
I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

…onnector size limits Addresses a class of 10 MB limit failures: - executor/variables: offload over-budget function block-output context values to durable large-value refs (lazy `sim.values.read`) so JS function blocks can merge medium files without exceeding the 10 MB inter-block request-body cap. - connectors: stream downloads via `readBodyWithLimit` (memory-safe), and surface oversized files as visible `failed` KB documents instead of silently dropping them — listing-time for github/s3/dropbox/onedrive/sharepoint, fetch-time for gitlab/azure/google-drive via a shared `ConnectorFileTooLargeError`. Raise the per-file cap from a hardcoded 10 MB to the canonical 100 MB KB document limit (`CONNECTOR_MAX_FILE_BYTES`), except Google Drive's export path (Google's hard 10 MB export-API limit). - sync-engine: `classifyExternalDoc` + bulk `skipDocuments` (failed rows with a reason, excluded from retry), byte-bounded batch concurrency to cap peak worker memory at the raised cap, and a `metadata.fileSize ?? size` fallback.

# Conflicts: # apps/sim/connectors/utils.test.ts

vercel · 2026-06-16T03:42:11Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
docs	Ready	Preview, Comment	Jun 16, 2026 3:47am

cursor · 2026-06-16T04:06:26Z

PR Summary

Medium Risk
Changes touch workflow execution payload paths and connector sync memory/visibility behavior across many integrations; regressions could affect function runs or KB sync outcomes, but bounds and tests reduce OOM and silent-drop risk.

Overview
Addresses 10 MB request-body failures in workflow execution and silent drops / unsafe downloads in knowledge-base connector syncs.

Function blocks: When resolved block-output values would exceed a ~6 MB combined inline budget (data + display in the function request), the resolver offloads them to durable large-value refs and rewrites JS code to load via sim.values.read, with largeValueKeys updated for the route. Non-JS runtimes and executions without durable context still inline values.

KB connectors: Introduces shared CONNECTOR_MAX_FILE_BYTES (100 MB, aligned with manual KB uploads), readBodyWithLimit, markSkipped / stubOrSkipBySize, and ConnectorFileTooLargeError. File-storage connectors (Dropbox, OneDrive, SharePoint, Google Drive media, S3, GitHub listing, Azure DevOps, Zoom transcripts, etc.) stop filtering or truncating oversized files; they stream downloads with a cap and surface oversize as skippedReason instead of returning null. Google Workspace export stays on Google’s 10 MB API limit with export-specific handling.

Sync engine: Adds skippedReason on ExternalDocument, classifyExternalDoc (new skips → failed rows; already-indexed oversize → last-known-good), skipDocuments bulk inserts, and chunkOpsByByteBudget (64 MB in-flight) so raised per-file limits don’t OOM workers. Stuck-doc retry excludes rows with no storageKey.

Documentation in the memory-load-check skill is extended with KB connector size-handling guidance; tests cover utils, sync classification/chunking, and function-context offload.

^{Reviewed by Cursor Bugbot for commit 26cf668. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 26cf668. Configure here.}

cursor · 2026-06-16T04:08:48Z

-    let documents = supportedFiles.map(fileToStub)
+    let documents = candidateFiles.map((entry) =>
+      stubOrSkipBySize(fileToStub(entry), entry.size, MAX_FILE_SIZE)
+    )


maxFiles counts oversized skips

Medium Severity

Oversized files are now kept in connector listings as skipped stubs, but the same maxFiles / maxObjects counters still treat them like normal listed documents. A cap can be exhausted by failed skip rows before indexable files are ever listed, which regresses sync coverage compared to when oversized paths were filtered out before counting.

Additional Locations (2)

apps/sim/connectors/github/github.ts#L213-L215

apps/sim/connectors/s3/s3.ts#L523-L534

^{Reviewed by Cursor Bugbot for commit 26cf668. Configure here.}

cursor · 2026-06-16T04:08:48Z

+      logger.warn('Failed to offload oversized function context value; keeping inline', {
+        error: toError(error).message,
+      })
+      return null


Offload failure keeps oversized inline

Medium Severity

When a function block context value exceeds the inline budget, maybeOffloadInlineFunctionContextValue stores it via storeLargeValue. If that store fails, the code logs and returns null, so resolution falls back to inlining the full value anyway—recreating the ~10 MB request-body overflow this change is meant to prevent.

^{Reviewed by Cursor Bugbot for commit 26cf668. Configure here.}

greptile-apps · 2026-06-16T04:15:59Z

Greptile Summary

This PR fixes two classes of 10 MB limit failures: function blocks with large block-output context values now offload oversized refs to durable storage and read them lazily in the sandbox, while KB connector downloads are byte-capped with readBodyWithLimit, raised from 10 MB to the canonical 100 MB KB limit, and oversized files are surfaced as visible failed documents (with a reason) instead of being silently dropped.

Function block offload: A 6 MB inline budget is tracked per function-block resolution; values that would exceed it are stored via storeLargeValue and replaced with a sim.values.read(ref) call in generated code. The budget does not update when storeLargeValue fails (catch path), so a storage outage causes all subsequent large values to attempt offload, fail, and fall back inline.
KB connector size safety: CONNECTOR_MAX_FILE_BYTES is now shared across all connectors. The sync engine gains classifyExternalDoc, chunkOpsByByteBudget, and skipDocuments to batch-insert oversized files as content-less failed rows and exclude them from the stuck-document retry sweep via isNotNull(storageKey).

Confidence Score: 4/5

Safe to merge with awareness of two edge-case gaps that do not affect the happy path.

The connector streaming and skip-visibility changes are well-tested and correct across all nine connectors. The sync engine isNotNull(storageKey) retry guard is safe because addDocument always sets a storage key before writing the DB row. Two gaps exist on error paths: the offload catch block in the resolver does not charge the failed value footprint to the budget, so a storage outage leaves all large values inline; and a deferred update op that becomes too large at hydration time silently skips without incrementing any result counter.

apps/sim/executor/variables/resolver.ts (offload catch path) and apps/sim/lib/knowledge/connectors/sync-engine.ts (update-to-skip counter omission).

Important Files Changed

Filename	Overview
apps/sim/executor/variables/resolver.ts	Adds maybeOffloadInlineFunctionContextValue to offload oversized block-output refs to durable storage. Budget accounting is correct for the happy path but the catch block does not charge the footprint to the budget, so a storage outage causes all subsequent large values to retry offloading before falling back inline.
apps/sim/lib/knowledge/connectors/sync-engine.ts	Adds classifyExternalDoc, chunkOpsByByteBudget, skipDocuments, and buildSkippedDocumentRow to surface oversized files as visible failed rows. The isNotNull(storageKey) retry guard is correct. One counter omission: update ops that become too large at fetch time do not increment any result field.
apps/sim/connectors/utils.ts	Adds shared size-limit utilities: CONNECTOR_MAX_FILE_BYTES, markSkipped, stubOrSkipBySize, sizeLimitSkipReason, and ConnectorFileTooLargeError. Well-factored and tested.
apps/sim/connectors/google-drive/google-drive.ts	Keeps MAX_EXPORT_SIZE at 10 MB for the Workspace export path while raising the binary download cap to 100 MB. Handles the exportSizeLimitExceeded 403 as a typed skip. The 403 body is read via unbounded response.text() before the streaming guard.
apps/sim/connectors/dropbox/dropbox.ts	Replaces unbounded response.text() with readBodyWithLimit; raises cap to 100 MB; surfaces oversized files as failed rows via ConnectorFileTooLargeError catch. Logic is correct.
apps/sim/connectors/zoom/zoom.ts	Adds fileSize to stub metadata, streams VTT download with readBodyWithLimit, and surfaces oversized transcripts as failed rows. Correct.
apps/sim/connectors/types.ts	Adds optional skippedReason field to ExternalDocument with clear documentation. Non-breaking addition.
apps/sim/connectors/utils.test.ts	Adds unit tests for readBodyWithLimit, markSkipped, sizeLimitSkipReason, and ConnectorFileTooLargeError with good edge-case coverage.
apps/sim/lib/knowledge/connectors/sync-engine.test.ts	New tests for classifyExternalDoc and chunkOpsByByteBudget covering all classification paths and byte-budget batching edge cases.
apps/sim/executor/variables/resolver.test.ts	New test suite for the offload path covering single oversized value, budget-splitting, small inline case, no-execution-id guard, and non-JavaScript runtime guard.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Connector listDocuments] --> B{size > CONNECTOR_MAX_FILE_BYTES?}
    B -- yes --> C[stubOrSkipBySize - skippedReason set]
    B -- no --> D[Normal stub - contentDeferred=true]
    C --> E[classifyExternalDoc]
    D --> E
    E -- skip + new --> F[skipDocuments bulk insert - failed row storageKey=null]
    E -- skip + existing --> G[unchanged - keep last-known-good]
    E -- add/update --> H[chunkOpsByByteBudget 64MB + SYNC_BATCH_SIZE=5]
    H --> I[deferredOps - getDocument]
    I --> J{fullDoc.skippedReason?}
    J -- yes + add op --> K[push to skipExtDocs - skipDocuments called after]
    J -- yes + update op --> L[return null - no counter increment]
    J -- no --> M[addDocument / updateDocument - storageKey always set]
    M --> N[Retry sweep: isNotNull storageKey - excludes failed rows]

    subgraph Function Block Offload
        P[resolveTemplateCode] --> Q{inline footprint <= budget?}
        Q -- yes --> R[store inline - reduce budget]
        Q -- no --> S[storeLargeValue]
        S -- success --> T[LargeValueRef - sim.values.read in sandbox]
        S -- failure --> U[fallback inline - budget NOT updated]
    end

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[Connector listDocuments] --> B{size > CONNECTOR_MAX_FILE_BYTES?}
    B -- yes --> C[stubOrSkipBySize - skippedReason set]
    B -- no --> D[Normal stub - contentDeferred=true]
    C --> E[classifyExternalDoc]
    D --> E
    E -- skip + new --> F[skipDocuments bulk insert - failed row storageKey=null]
    E -- skip + existing --> G[unchanged - keep last-known-good]
    E -- add/update --> H[chunkOpsByByteBudget 64MB + SYNC_BATCH_SIZE=5]
    H --> I[deferredOps - getDocument]
    I --> J{fullDoc.skippedReason?}
    J -- yes + add op --> K[push to skipExtDocs - skipDocuments called after]
    J -- yes + update op --> L[return null - no counter increment]
    J -- no --> M[addDocument / updateDocument - storageKey always set]
    M --> N[Retry sweep: isNotNull storageKey - excludes failed rows]

    subgraph Function Block Offload
        P[resolveTemplateCode] --> Q{inline footprint <= budget?}
        Q -- yes --> R[store inline - reduce budget]
        Q -- no --> S[storeLargeValue]
        S -- success --> T[LargeValueRef - sim.values.read in sandbox]
        S -- failure --> U[fallback inline - budget NOT updated]
    end

Comments Outside Diff (2)

apps/sim/executor/variables/resolver.ts, line 1367-1373 (link)

Offload fallback silently leaves the budget un-consumed

When storeLargeValue throws, the catch block returns null (keeps the value inline — correct) but does not subtract the value's footprint from offloadState.inlineFootprintRemaining. All remaining large values in the same call then each separately attempt to offload, each fail, each log a warning, and each fall back to inline — so if durable storage is unavailable, every oversized block-ref ends up inlined anyway. The combined inline payload can then exceed the 10 MB request-body cap that this offload feature exists to prevent, restoring the exact failure mode being fixed.

A simple fix is to charge the footprint to the budget inside the catch so subsequent values know the budget is exhausted and skip the offload attempt entirely (still falling back inline, but without the repeated store calls and logs).
apps/sim/connectors/google-drive/google-drive.ts, line 416-422 (link)

Unbounded response.text() on the 403 body before the streaming guard

response.text() fully buffers the error body into memory before checking for exportSizeLimitExceeded. For Google's own export endpoint this is safe (error payloads are tiny), but it is slightly inconsistent with the PR's goal of never doing unbounded reads. If this pattern is ever copied to a less-trusted endpoint it would bypass the readBodyWithLimit guard. Consider response.text().catch(() => '').then(t => t.slice(0, 4096)) to stay within a small constant cap.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

_{Reviews (1): Last reviewed commit: "update skill" | Re-trigger Greptile}

greptile-apps · 2026-06-16T04:16:04Z

+            if (fullDoc?.skippedReason) {
+              if (op.type === 'add') {
+                skipExtDocs.push({
+                  ...op.extDoc,
+                  skippedReason: fullDoc.skippedReason,
+                  contentHash: fullDoc.contentHash ?? op.extDoc.contentHash,
+                  metadata: { ...op.extDoc.metadata, ...fullDoc.metadata },
+                })
+              }
+              return null
+            }


update op skipped at fetch time produces no result counter increment

When a deferred op.type === 'update' is hydrated and the freshly fetched document carries skippedReason (the file grew past the cap and is only discovered to be oversized at download time), the code correctly preserves the previously-indexed content (last-known-good) and returns null. However, this path increments neither result.docsUnchanged, result.docsFailed, nor any other counter. Every sync that exercises this branch will emit a total (docsAdded + docsUpdated + docsUnchanged + docsFailed) that is smaller than the number of documents seen, making the sync log stats non-auditable. Adding result.docsUnchanged++ before return null here would keep the counters accurate without changing behaviour.

icecrasher321 added 4 commits June 15, 2026 20:25

Merge branch 'staging' into improvement/ten-mb-lims

2019e88

# Conflicts: # apps/sim/connectors/utils.test.ts

fix zoom

66b0f58

update skill

26cf668

vercel Bot deployed to Preview June 16, 2026 03:47 View deployment

icecrasher321 changed the title ~~fix(execution): offload large function inputs~~ improvement(execution, connectors): offload large function inputs, increase connector limits + better error propagation Jun 16, 2026

icecrasher321 marked this pull request as ready for review June 16, 2026 04:06

cursor Bot reviewed Jun 16, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improvement(execution, connectors): offload large function inputs, increase connector limits + better error propagation#5089

improvement(execution, connectors): offload large function inputs, increase connector limits + better error propagation#5089
icecrasher321 wants to merge 4 commits into
stagingfrom
improvement/ten-mb-lims

icecrasher321 commented Jun 16, 2026

Uh oh!

vercel Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

cursor Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 16, 2026

Uh oh!

cursor Bot Jun 16, 2026

Uh oh!

greptile-apps Bot commented Jun 16, 2026 •

edited

Loading

Comments Outside Diff (2)

Uh oh!

greptile-apps Bot Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

icecrasher321 commented Jun 16, 2026

Summary

Type of Change

Testing

Checklist

Uh oh!

vercel Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 16, 2026

Choose a reason for hiding this comment

maxFiles counts oversized skips

Uh oh!

cursor Bot Jun 16, 2026

Choose a reason for hiding this comment

Offload failure keeps oversized inline

Uh oh!

greptile-apps Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Comments Outside Diff (2)

Uh oh!

greptile-apps Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jun 16, 2026 •

edited

Loading

cursor Bot commented Jun 16, 2026 •

edited

Loading

greptile-apps Bot commented Jun 16, 2026 •

edited

Loading