Configure storage for datafusion instances used by connectors.

Some connectors (delta, iceberg) use datafusion to access the underlying table. Each connector creates its own datafusion session, which is probably the right thing to do, as they operate on completely disjoint resources. However today all these sessions are configured to evaluate queries in memory. This is usually ok, since most queries issued by connectors don't need much intermediate state. One exception is the ORDER BY query used by the delta connector in CDC mode. This ends up sorting a potentially very large array of all updates in a transaction log entry, in memory.

The solution is to allow datafusion to use storage, at long as storage is configured for the pipeline, similar to the datafusion session used to run ad hoc queries: https://github.com/feldera/feldera/blob/main/crates/adapters/src/adhoc.rs#L34

This will require the connector to access pipeline config to determine storage settings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configure storage for datafusion instances used by connectors. #6153

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Configure storage for datafusion instances used by connectors. #6153

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions