[adapters] Avoid merge backpressure in the output buffer by ryzhyk · Pull Request #6442 · feldera/feldera

ryzhyk · 2026-06-10T08:44:48Z

Two improvements to avoid merge backpressure in the output buffer. See commit messages for details.

Added a platform test, since (de)serialization of connector config involves the manager.

Describe Manual Test Plan

Checklist

Unit tests added/updated
Integration tests added/updated
Documentation updated
Changelog updated

Breaking Changes?

Mark if you think the answer is yes for any of these components:

OpenAPI / REST HTTP API / feldera-types / manager (What is a breaking change?)
Feldera SQL (Syntax, Semantics)
feldera-sqllib (incl. dependencies fxp, etc.) (What is a breaking change?)
Python SDK (What is a breaking change?)
fda (CLI arguments)
Adapters (including configuration)
Storage Format / Checkpoints
Others (specify)

Describe Incompatible Changes

max_output_buffer_size_records defaults to 10M.

This commit reduces the likelihood of the following situation: * A large transaction (e.g., backfill) produces a large output batch, say 1B records. This batch consists of `workers` spines, each with potentially dozens of batches. * These batches are pushed into a single spine inside the output buffer. Once the number of batches in the spine exceeds 128, backpressure kicks in. We've seen pipelines with 32 workers spend >1hr waiting for backpressure. We address this in two ways. First we insert all batches into the spine before waiting for backpressure. This is likely to trigger fewer larger merges. Second we postpone checking for backpressure until additional batches retrieved from the output queue are added to the buffer. This avoids waiting for the merger when the buffer is large enough to be sent to the connector. Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>

In the past, if the user set max_output_buffer_time_millis, but not max_output_buffer_size_records, then the output buffer would hold data for the specified duration regardless of its size. The user was required to also configure max_output_buffer_size_records to force a large buffer to be sent immediately. The downside of not setting max_output_buffer_size_records is that quickly adding more batches to an already large output buffer could cause expensive backpressure stalls. This commit changes the default to 10,000,000 records, meaning that once the buffer reaches this size it will be send immediately. The purpose of the output buffer is to avoid the small file problem for connectors such as Delta. In this type of use case, waiting for exactly max_output_buffer_time_millis is not a hard requirement, so the new default should be harmless. However it is a behavioral change, which I documented in the changelog. Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>

Signed-off-by: feldera-bot <feldera-bot@feldera.com>

mihaibudiu · 2026-06-10T16:00:40Z


        ## Unreleased

+        - The default value of `max_output_buffer_size_records` is now 10,000,000


I thought we were moving towards a world where sizes are expressed in bytes

blp · 2026-06-10T16:28:25Z

+        let batch = Arc::unwrap_or_clone(
+            batch
+                .as_any()
+                .downcast::<SerBatchImpl<
+                    TypedBatch<T::Key, T::Val, T::R, <T::InnerTrace as DynTrace>::Batch>,
+                    KD,
+                    VD,
+                >>()
+                .unwrap(),
+        );
+        self.batch
+            .inner_mut()
+            .insert_without_blocking(batch.batch.into_inner())


This is almost the same as insert(), so it might be worthwhile to factor out the common code.

blp · 2026-06-10T16:31:10Z

+    fn insert_without_blocking(&mut self, batch: impl Into<Arc<Self::Batch>>) -> bool {
+        self.data = Self::merge(
+            self,
+            batch.into().as_ref(),
+            &self.key_filter,
+            &self.value_filter,
+        )
+        .data;
+
+        false
+    }


I would change insert() to a call to this new function to avoid redundant code.

mythical-fred

Typo in commit subject: [adaprers] → [adapters]. Worth a quick git rebase -i fix before merge since Feldera enforces linear history on main.

Otherwise LGTM — clean decomposition, good commit messages explaining the why, solid platform test.

ryzhyk requested a review from blp June 10, 2026 08:44

ryzhyk added the connectors Issues related to the adapters/connectors crate label Jun 10, 2026

ryzhyk force-pushed the non_blocking_outpu_buffer branch from c186e35 to db5a23f Compare June 10, 2026 08:48

[ci] apply automatic fixes

a9446f9

Signed-off-by: feldera-bot <feldera-bot@feldera.com>

ryzhyk changed the title ~~[adapters] Avoid merge backpress in the output buffer~~ [adapters] Avoid merge backpressure in the output buffer Jun 10, 2026

mihaibudiu reviewed Jun 10, 2026

View reviewed changes

blp approved these changes Jun 10, 2026

View reviewed changes

mythical-fred reviewed Jun 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[adapters] Avoid merge backpressure in the output buffer#6442

[adapters] Avoid merge backpressure in the output buffer#6442
ryzhyk wants to merge 3 commits into
mainfrom
non_blocking_outpu_buffer

ryzhyk commented Jun 10, 2026 •

edited

Loading

Uh oh!

mihaibudiu Jun 10, 2026

Uh oh!

blp Jun 10, 2026

Uh oh!

blp Jun 10, 2026

Uh oh!

mythical-fred left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants


		## Unreleased

		- The default value of `max_output_buffer_size_records` is now 10,000,000

Conversation

ryzhyk commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe Manual Test Plan

Checklist

Breaking Changes?

Describe Incompatible Changes

Uh oh!

mihaibudiu Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

blp Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

blp Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

mythical-fred left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ryzhyk commented Jun 10, 2026 •

edited

Loading