Allow Feast Spark materialization to use staging for large dataset spill (avoid driver OOM)

**Is your feature request related to a problem? Please describe.**

When running `store.materialize_incremental()` with a Spark-based offline store, Feast currently converts Spark DataFrames to Arrow tables using `toPandas()` or `collect()`.
This loads the entire dataset into driver memory, causing **`OutOfMemoryError`** or **`spark.driver.maxResultSize` exceeded** for large datasets.

Although Feast already supports a `staging` configuration for exporting data, it is **not yet used during materialization**. As a result, large-scale jobs cannot leverage staging to spill data to disk or remote storage.

**Describe the solution you'd like**

Enable the existing `staging` configuration to be used for **materialization** in the Spark offline store.
When enabled, Feast should write intermediate Spark DataFrames to the staging location (e.g. local disk or S3) before reading them back as Arrow tables for online ingestion.

Example configuration:

```yaml
offline_store:
  type: spark
  staging_location: s3://bucket/tmp/feast_arrow
  staging_allow_materialize: true
```

This would allow Feast to handle large datasets safely without driver OOM, by spilling intermediate data to the configured staging location.

 **Describe alternatives you've considered**

* Increasing driver memory (`spark.driver.memory`, `spark.driver.maxResultSize`) — only delays the problem.
* Using `toLocalIterator()` — too slow and still limited by memory.

 **Additional context**

During `feast materialize` with large Spark datasets, jobs fail due to driver OOM even though executors have available resources.
Example error:

```
Total size of serialized results (2.1 GiB) is bigger than spark.driver.maxResultSize (2.0 GiB)
Caused by: java.lang.OutOfMemoryError: Java heap space
```

Allowing materialization to use `staging` would make Feast’s Spark integration far more scalable and production-ready.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow Feast Spark materialization to use staging for large dataset spill (avoid driver OOM) #5671

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Allow Feast Spark materialization to use staging for large dataset spill (avoid driver OOM) #5671

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions