Skip to content

fix(snapshots): Chunk image uploads to avoid fd exhaustion and 413 errors#3249

Merged
NicoHinderling merged 2 commits intomasterfrom
fix/chunk-snapshot-uploads
Apr 15, 2026
Merged

fix(snapshots): Chunk image uploads to avoid fd exhaustion and 413 errors#3249
NicoHinderling merged 2 commits intomasterfrom
fix/chunk-snapshot-uploads

Conversation

@NicoHinderling
Copy link
Copy Markdown
Contributor

Uploading hundreds of images via build snapshots fails in two ways:

  1. Too many open files (EMFILE): All image file handles are opened upfront and held until the batch send completes. This exceeds OS file descriptor limits (~256 on macOS, ~1024 on Linux).
  2. 413 Payload Too Large: The objectstore client batches up to 1000 operations into a single HTTP request. With hundreds of images the multipart body exceeds the server's size limit.

Both failures were confirmed with a real 753-image upload. 734 out of 753 images failed with 413 errors.

This PR splits upload_images() into two phases:

  • Phase 1 (pre-process): Iterate all images to detect filename collisions, compute SHA-256 hashes, read sidecar metadata, and build manifest entries — without opening any files for upload.
  • Phase 2 (chunked upload): Upload in batches of 100 images. For each batch, open files, build a fresh session.many() builder, and send. This bounds both open file descriptors and HTTP request payload size.

Also includes minor code quality improvements: simplified scope extraction, iterator-based collision formatting, and removal of redundant bindings.

@NicoHinderling NicoHinderling marked this pull request as ready for review March 26, 2026 20:28
@NicoHinderling NicoHinderling requested review from a team and szokeasaurusrex as code owners March 26, 2026 20:28
Comment thread src/commands/build/snapshots.rs Outdated

const IMAGE_EXTENSIONS: &[&str] = &["png", "jpg", "jpeg"];
const MAX_PIXELS_PER_IMAGE: u64 = 40_000_000;
const UPLOAD_BATCH_SIZE: usize = 100;
Copy link
Copy Markdown
Contributor Author

@NicoHinderling NicoHinderling Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

open to your guys' thoughts, but through some testing, seems to be a good number

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once these two fixes are available, the custom batching can be removed again. Objectstore client batches internally:

Copy link
Copy Markdown
Contributor Author

NicoHinderling commented Mar 26, 2026

@NicoHinderling NicoHinderling force-pushed the fix/chunk-snapshot-uploads branch from 3b2f5fa to 938a884 Compare March 26, 2026 21:17
@NicoHinderling NicoHinderling changed the base branch from master to graphite-base/3249 March 26, 2026 21:28
@NicoHinderling NicoHinderling force-pushed the fix/chunk-snapshot-uploads branch from 938a884 to 835e271 Compare March 26, 2026 21:28
@NicoHinderling NicoHinderling changed the base branch from graphite-base/3249 to 03-26-perf_snapshots_parallelize_image_hashing_with_rayon March 26, 2026 21:29
@NicoHinderling NicoHinderling marked this pull request as draft March 26, 2026 21:33
@NicoHinderling
Copy link
Copy Markdown
Contributor Author

synced offline with @lcian , putting in draft mode since while fd limit is real, we on client side apparently shouldn't be handling the batching

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment thread src/commands/build/snapshots.rs Outdated
Base automatically changed from 03-26-perf_snapshots_parallelize_image_hashing_with_rayon to master March 27, 2026 15:02
@jan-auer
Copy link
Copy Markdown
Member

jan-auer commented Apr 2, 2026

Updated objectstore-client to 0.1.4. The fd limit is resolved and we also limit requests to 100MB internally now in addition to the preexisting 1000 ops / request limit. Upload with a provided sample succeeds now.

@NicoHinderling NicoHinderling force-pushed the fix/chunk-snapshot-uploads branch from 6333876 to f30b441 Compare April 10, 2026 20:20
@NicoHinderling NicoHinderling marked this pull request as ready for review April 13, 2026 21:18
Comment thread src/commands/build/snapshots.rs
NicoHinderling and others added 2 commits April 14, 2026 09:02
Wrap upload errors in anyhow::Error before formatting so {:#} walks the
full cause chain instead of showing only the top-level message.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@NicoHinderling NicoHinderling force-pushed the fix/chunk-snapshot-uploads branch from cfdf12b to ced473a Compare April 14, 2026 16:02
Copy link
Copy Markdown
Member

@lcian lcian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@NicoHinderling NicoHinderling merged commit 3d7f705 into master Apr 15, 2026
27 checks passed
@NicoHinderling NicoHinderling deleted the fix/chunk-snapshot-uploads branch April 15, 2026 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants