[adapters] Auto-tune step size for the number of workers.

ryzhyk · ryzhyk · commit a1178aaefd2f · 2026-02-08T22:54:31.000Z
Step size is the number of records pushed to the circuit from each connector.
Our previous default of 10,000 records was selected before we introduced
splitters and accumulators, which break up large outputs across multiple steps.
Back then, a large input could easily explode causing performance and OOM
issues.

Nowadays, there is no real reason to keep input steps small. A reasonable
default is to ingest 10K records per worker thread, which approximates how we
split up the work within the circuit.

This commit keeps the old `max_batch_size` setting for backward compatibility.
When not specified, the new `max_worker_batch_size` setting is used to compute
max batch size as `max_worker_batch_size x num_workers`. The defautl value is
10,000, meaning that by default a pipeline with 8 workers will ingest 80K
records per connector per step.

Why not remove input step cap altogether and ingest all buffered data at once
(after all it's already kept in memory anyway)?

- The InputUpsert operator is not yet implemented as a splitter and processes
  the entire input in one step, leading to potentially large output batches
(expensive to sort!)

- Very large batches can increase input/output latency, leading to the sawtooth
  throughput patter, which users don't like.

The current solution is not ideal. We probably want to use batch size in bytes,
not records as a cap. We may also want to cap input size across all connectors
attached to a table, not per connector. Those improvements will require more
work.

Empirically, this commit improves ingestion speed 2x for pipelines with many
delta connectors.

Signed-off-by: Leonid Ryzhyk &lt;ryzhyk@gmail.com&gt;
diff --git a/crates/adapters/src/adhoc/table.rs b/crates/adapters/src/adhoc/table.rs
@@ -29,8 +29,7 @@ use datafusion::physical_plan::{
     DisplayAs, DisplayFormatType, ExecutionPlan, Partitioning, PlanProperties,
 };
 use feldera_types::config::{
-    ConnectorConfig, FormatConfig, InputEndpointConfig, TransportConfig, default_max_batch_size,
-    default_max_queued_records,
+    ConnectorConfig, FormatConfig, InputEndpointConfig, TransportConfig, default_max_queued_records,
 };
 use feldera_types::program_schema::SqlIdentifier;
 use feldera_types::serde_with_context::serde_config::{
@@ -287,7 +286,8 @@ impl DataSink for AdHocTableSink {
                 }),
                 index: None,
                 output_buffer_config: Default::default(),
-                max_batch_size: default_max_batch_size(),
+                max_batch_size: None,
+                max_worker_batch_size: None,
                 max_queued_records: default_max_queued_records(),
                 paused: false,
                 labels: vec![],
diff --git a/crates/adapters/src/controller.rs b/crates/adapters/src/controller.rs
@@ -159,7 +159,8 @@ pub use feldera_types::config::{
     RuntimeConfig, TransportConfig,
 };
 use feldera_types::config::{
-    FileBackendConfig, FtConfig, FtModel, OutputBufferConfig, StorageBackendConfig, SyncConfig,
+    DEFAULT_MAX_WORKER_BATCH_SIZE, FileBackendConfig, FtConfig, FtModel, OutputBufferConfig,
+    StorageBackendConfig, SyncConfig,
 };
 use feldera_types::constants::{STATE_FILE, STEPS_FILE};
 use feldera_types::format::json::{JsonFlavor, JsonParserConfig, JsonUpdateFormat};
@@ -4786,6 +4787,8 @@ pub struct ControllerInner {
     // The mutex is acquired from async context by actix and
     // from the sync context by the circuit thread.
     transaction_info: Mutex<TransactionInfo>,
+
+    /// Workers local to this host.
     workers: Range<usize>,
 
     /// Current transaction number.
@@ -4946,6 +4949,32 @@ impl ControllerInner {
         ))
     }
 
+    /// Compute max_batch_size for a connector.
+    ///
+    /// `max_batch_size` is a (soft) bound on the number of records ingested in one step from
+    /// the connector.
+    ///
+    /// If the connector config specifies a `max_batch_size`, it is used as is.
+    ///
+    /// Otherwise, `max_batch_size` is computed as the number of workers times `config.max_worker_batch_size` (if specified)
+    /// or `DEFAULT_MAX_WORKER_BATCH_SIZE` otherwise.
+    pub fn max_connector_batch_size(&self, connector_config: &ConnectorConfig) -> usize {
+        if let Some(max_batch_size) = connector_config.max_batch_size {
+            return max_batch_size as usize;
+        };
+
+        let num_local_workers = std::cmp::max(self.workers.len(), 1);
+
+        let max_worker_batch_size =
+            if let Some(max_worker_batch_size) = connector_config.max_worker_batch_size {
+                max_worker_batch_size as usize
+            } else {
+                DEFAULT_MAX_WORKER_BATCH_SIZE as usize
+            };
+
+        max_worker_batch_size * num_local_workers
+    }
+
     fn last_checkpoint(&self) -> LastCheckpoint {
         self.last_checkpoint.lock().unwrap().clone()
     }
@@ -6224,11 +6253,13 @@ impl InputProbe {
         connector_config: &ConnectorConfig,
         controller: Arc<ControllerInner>,
     ) -> Self {
+        let max_batch_size = controller.max_connector_batch_size(connector_config);
+
         Self {
             endpoint_id,
             endpoint_name: endpoint_name.to_owned(),
             controller,
-            max_batch_size: connector_config.max_batch_size as usize,
+            max_batch_size,
             transaction_in_progress: AtomicBool::new(false),
         }
     }
diff --git a/crates/adapters/src/server.rs b/crates/adapters/src/server.rs
@@ -72,9 +72,7 @@ use feldera_types::runtime_status::{
 use feldera_types::suspend::{SuspendError, SuspendableResponse};
 use feldera_types::time_series::TimeSeries;
 use feldera_types::{
-    checkpoint::CheckpointMetadata,
-    config::{TransportConfig, default_max_batch_size},
-    transport::http::HttpInputConfig,
+    checkpoint::CheckpointMetadata, config::TransportConfig, transport::http::HttpInputConfig,
 };
 use feldera_types::{query::AdhocQueryArgs, transport::http::SERVER_PORT_FILE};
 use futures::StreamExt;
@@ -1997,7 +1995,8 @@ async fn create_http_input_endpoint(
             format: Some(format),
             index: None,
             output_buffer_config: Default::default(),
-            max_batch_size: default_max_batch_size(),
+            max_batch_size: None,
+            max_worker_batch_size: None,
             max_queued_records: HttpInputTransport::default_max_buffered_records(),
             paused: false,
             labels: vec![],
@@ -2177,7 +2176,8 @@ async fn output_endpoint(
             )?),
             index: None,
             output_buffer_config: Default::default(),
-            max_batch_size: default_max_batch_size(),
+            max_batch_size: None,
+            max_worker_batch_size: None,
             max_queued_records: HttpOutputTransport::default_max_buffered_records(),
             paused: false,
             labels: vec![],
diff --git a/crates/adapters/src/transport/clock.rs b/crates/adapters/src/transport/clock.rs
@@ -69,7 +69,8 @@ pub fn now_endpoint_config(config: &PipelineConfig) -> InputEndpointConfig {
             }),
             index: None,
             output_buffer_config: OutputBufferConfig::default(),
-            max_batch_size: 1,
+            max_batch_size: Some(1),
+            max_worker_batch_size: None,
             // This must be >1; otherwise the controller will pause the connector after every input.
             max_queued_records: 2,
             paused: false,
diff --git a/crates/adapters/src/transport/kafka/ft/test.rs b/crates/adapters/src/transport/kafka/ft/test.rs
@@ -27,7 +27,7 @@ use feldera_macros::IsNone;
 use feldera_sqllib::{ByteArray, SqlString, Variant};
 use feldera_types::config::{
     ConnectorConfig, FormatConfig, FtModel, InputEndpointConfig, OutputBufferConfig,
-    TransportConfig, default_max_batch_size, default_max_queued_records,
+    TransportConfig, default_max_queued_records,
 };
 use feldera_types::deserialize_table_record;
 use feldera_types::program_schema::{ColumnType, Field, Relation, SqlIdentifier};
@@ -1396,7 +1396,8 @@ fn test_offset(
             }),
             index: None,
             output_buffer_config: OutputBufferConfig::default(),
-            max_batch_size: default_max_batch_size(),
+            max_batch_size: None,
+            max_worker_batch_size: None,
             max_queued_records: default_max_queued_records(),
             paused: false,
             labels: Vec::new(),
@@ -1835,7 +1836,8 @@ fn test_input_partition(
             }),
             index: None,
             output_buffer_config: OutputBufferConfig::default(),
-            max_batch_size: default_max_batch_size(),
+            max_batch_size: None,
+            max_worker_batch_size: None,
             max_queued_records: default_max_queued_records(),
             paused: false,
             labels: Vec::new(),
diff --git a/crates/feldera-types/src/config.rs b/crates/feldera-types/src/config.rs
@@ -43,13 +43,7 @@ pub const fn default_max_queued_records() -> u64 {
     1_000_000
 }
 
-/// Default maximum batch size for connectors, in records.
-///
-/// If you change this then update the comment on
-/// [ConnectorConfig::max_batch_size].
-pub const fn default_max_batch_size() -> u64 {
-    10_000
-}
+pub const DEFAULT_MAX_WORKER_BATCH_SIZE: u64 = 10_000;
 
 pub const DEFAULT_CLOCK_RESOLUTION_USECS: u64 = 1_000_000;
 
@@ -1368,22 +1362,35 @@ pub struct ConnectorConfig {
     #[serde(flatten)]
     pub output_buffer_config: OutputBufferConfig,
 
-    /// Maximum batch size, in records.
+    /// Maximum number of records from this connector to process in a single batch.
+    ///
+    /// When set, this caps how many records are taken from the connector’s input
+    /// buffer and pushed through the circuit at once.
     ///
-    /// This is the maximum number of records to process in one batch through
-    /// the circuit.  The time and space cost of processing a batch is
-    /// asymptotically superlinear in the size of the batch, but very small
-    /// batches are less efficient due to constant factors.
+    /// This is typically configured lower than `max_queued_records` to allow the
+    /// connector time to restart and refill its buffer while a batch is being
+    /// processed.
+    ///
+    /// Not all input adapters honor this limit.
+    ///
+    /// If this is not set, the batch size is derived from `max_worker_batch_size`.
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub max_batch_size: Option<u64>,
+
+    /// Maximum number of records processed per batch, per worker thread.
     ///
-    /// This should usually be less than `max_queued_records`, to give the
-    /// connector a round-trip time to restart and refill the buffer while
-    /// batches are being processed.
+    /// When `max_batch_size` is not set, this setting is used to cap
+    /// the number of records that can be taken from the connector’s input
+    /// buffer and pushed through the circuit at once.  The effective batch size is computed as:
+    /// `max_worker_batch_size × workers`.
     ///
-    /// Some input adapters might not honor this setting.
+    /// This provides an alternative to `max_batch_size` that automatically adjusts batch
+    /// size as the number of worker threads changes to maintain constant amount of
+    /// work per worker per batch.
     ///
-    /// The default is 10,000.
-    #[serde(default = "default_max_batch_size")]
-    pub max_batch_size: u64,
+    /// Defaults to 10,000 records per worker.
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub max_worker_batch_size: Option<u64>,
 
     /// Backpressure threshold.
     ///
diff --git a/crates/pipeline-manager/src/db/types/program.rs b/crates/pipeline-manager/src/db/types/program.rs
@@ -888,7 +888,8 @@ mod tests {
             format: None,
             index: None,
             output_buffer_config: Default::default(),
-            max_batch_size: 0,
+            max_batch_size: Some(0),
+            max_worker_batch_size: None,
             max_queued_records: 0,
             paused: false,
             labels: vec![],
diff --git a/docs.feldera.com/docs/connectors/index.mdx b/docs.feldera.com/docs/connectors/index.mdx
@@ -116,10 +116,31 @@ The following attributes are common to all connectors:
   circuit pauses execution until the backlog subsides.  By default,
   this is 1,000,000.
 
-* <a name="max_batch_size">`max_batch_size`</a> - For an input
-  connector, the approximate maximum number of records that the
-  pipeline will process in a single pipeline step.  By default, this
-  is 10,000.
+* <a name="max_batch_size">`max_batch_size`</a> - Maximum number of records from this connector to process in a single batch.
+
+  When set, this caps how many records are taken from the connector’s input
+  buffer and pushed through the circuit at once.
+
+  This is typically configured lower than `max_queued_records` to allow the
+  connector time to restart and refill its buffer while a batch is being
+  processed.
+
+  Not all input adapters honor this limit.
+
+  If this is not set, the batch size is derived from `max_worker_batch_size`.
+
+* <a name="max_worker_batch_size">`max_worker_batch_size`</a> - Maximum number of records processed per batch, per worker thread.
+
+  When `max_batch_size` is not set, this setting is used to cap
+  the number of records that can be taken from the connector’s input
+  buffer and pushed through the circuit at once.  The effective batch size is computed as:
+  `max_worker_batch_size × workers`.
+
+  This provides an alternative to `max_batch_size` that automatically adjusts batch
+  size as the number of worker threads changes to maintain constant amount of
+  work per worker per batch.
+
+  Defaults to 10,000 records per worker.
 
 * `index` – *(Output connectors only)* The name of an index created by a SQL
   CREATE INDEX statement that defines