feat: Time-based chunked materialization to prevent OOM on dense datasets by alan-gauthier-jt · Pull Request #6277 · feast-dev/feast

alan-gauthier-jt · 2026-04-14T13:30:30Z

What this PR does / why we need it:

Problem

FeatureStore.materialize() and FeatureStore.materialize_incremental() load the full requested time range into memory in a single pass. On production deployments with:

Large time windows (multi-day or multi-week backfills)
High-frequency event timestamps (e.g. 10-minute ETL batches, sub-minute sensor data)
Limited worker memory

…this causes out-of-memory (OOM) crashes that are difficult to recover from, with no built-in workaround inside Feast itself. Users currently must write external orchestration scripts to work around this.

Solution

This PR introduces native time-based chunked materialization directly into the Feast SDK. When a chunk_size is configured, each feature view's time range is split into consecutive, non-overlapping windows of the given width, and materialize_single_feature_view is called once per window. This caps peak memory usage to the cost of a single chunk regardless of total range size.

Key design decisions:

Backward-compatible: no existing behaviour changes when chunk_size is not set (the default).
Three-tier priority for chunk size resolution: call-time argument > feature_store.yaml project default > no chunking.
Per-chunk registry.apply_materialization: each successfully materialized chunk immediately updates most_recent_end_time. A crash mid-run allows materialize_incremental to resume from the last committed chunk rather than reprocessing the entire range.
CONTINUE failure strategy: an opt-in chunk_failure_strategy: continue setting collects failed chunks and emits a summary warning instead of aborting, enabling partial-success backfills.
Watermark gap-safety: when CONTINUE is active, the watermark only advances through the contiguous prefix of successful chunks. If chunk N fails, chunks N+1, N+2, … still run (so the online store receives their data idempotently), but most_recent_end_time is not advanced past the failed window. The next materialize_incremental call therefore retries from the correct start time — no data is permanently lost.

Changes

File	Change
`feast/feature_view_utils.py`	New `generate_time_chunks(start, end, chunk_size)` generator — pure, tested, reusable
`feast/repo_config.py`	New `ChunkFailureStrategy` enum; extended `MaterializationConfig` with `chunk_size` and `chunk_failure_strategy` fields
`feast/feature_store.py`	New `_resolve_chunk_size()` helper; `chunk_size` parameter on both `materialize()` and `materialize_incremental()`; chunked inner loops with per-chunk checkpointing
`feast/cli/cli.py`	New `--chunk-hours`, `--chunk-minutes`, `--chunk-seconds` flags on both `feast materialize` and `feast materialize-incremental`

Usage examples

Python SDK

from datetime import timedelta

# Call-time override
fs.materialize(
    start_date=datetime(2026, 1, 1),
    end_date=datetime(2026, 3, 1),
    chunk_size=timedelta(hours=6),  # 240 chunks instead of one giant call
)

# Incremental with chunking
fs.materialize_incremental(
    end_date=datetime.utcnow(),
    chunk_size=timedelta(minutes=30),
)

`feature_store.yaml` project default

materialization:
  chunk_size: 21600          # 6 hours, in seconds
  chunk_failure_strategy: continue   # skip bad chunks, warn at end

CLI flags

feast materialize 2026-01-01T00:00:00 2026-03-01T00:00:00 --chunk-hours 6
feast materialize 2026-01-01T00:00:00 2026-03-01T00:00:00 --chunk-minutes 30
feast materialize-incremental 2026-03-25T00:00:00 --chunk-hours 12

Which issue(s) this PR fixes:

Fixes # (memory exhaustion / OOM during large-range materialization — no existing tracking issue; can be linked once opened)

Checks

I've made sure the tests are passing.
My commits are signed off (git commit -s)
My PR title follows conventional commits format

Testing Strategy

Unit tests
Integration tests
Manual tests
Testing is not required for this change

New unit tests added

tests/unit/test_feature_view_utils.py — 13 tests for generate_time_chunks:

Exact multiple of chunk size
Remainder chunk (last chunk smaller than chunk_size)
Chunk size larger than total range (single chunk)
Minute and second granularity
Contiguity and full-coverage invariants
Timezone-aware datetimes
Empty range (yields nothing)
Zero / negative chunk_size raises ValueError
Returns a lazy generator (not a list)

tests/unit/test_chunk_materialization.py — 15 tests for FeatureStore integration:

_resolve_chunk_size priority: call-time > config > None
No chunking → single materialize_single_feature_view call (backward-compatible)
N-hour chunk size → correct number of calls
Chunk boundaries are contiguous and cover the full range
Config chunk_size is used when no call-time override
Call-time override supersedes config
CONTINUE strategy: failed chunks are skipped with a warning; successful chunks are committed
CONTINUE strategy (all succeed): all chunks commit and watermark reaches end_date
STOP strategy (default): first failure re-raises immediately
registry.apply_materialization is called once per chunk (checkpointing)
materialize_incremental with chunk_size splits correctly
Regression: CONTINUE watermark does not advance past a failed chunk (data-loss prevention)

Misc

This feature is intentionally scoped to the existing local, spark and ray compute engines without any engine-specific changes — chunking is applied at the FeatureStore layer, above the engine abstraction, so all engines benefit automatically.
Parallel chunk execution (running multiple chunks concurrently) is a natural follow-up but is out of scope for this PR to keep the change reviewable and testable.
The chunk_size field in feature_store.yaml currently accepts a timedelta integer (total seconds). A human-readable string format (e.g. "6h", "30m") is a potential follow-up using a Pydantic validator.

Signed-off-by: Alan Gauthier <alan.gauthier@jobteaser.com>

alan-gauthier-jt requested a review from a team as a code owner April 14, 2026 13:30

alan-gauthier-jt changed the title ~~feat(materialization): add time chunks management~~ feat: Time-based chunked materialization to prevent OOM on dense datasets Apr 14, 2026

This comment was marked as resolved.

Sign in to view

alan-gauthier-jt force-pushed the chunked-materialize branch from 512f4b5 to 79e5b94 Compare April 14, 2026 13:54

This comment was marked as resolved.

Sign in to view

feat(materialization): add time chunks management

472f9c4

Signed-off-by: Alan Gauthier <alan.gauthier@jobteaser.com>

alan-gauthier-jt force-pushed the chunked-materialize branch from 79e5b94 to 472f9c4 Compare April 14, 2026 14:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Time-based chunked materialization to prevent OOM on dense datasets#6277

feat: Time-based chunked materialization to prevent OOM on dense datasets#6277
alan-gauthier-jt wants to merge 1 commit intofeast-dev:masterfrom
alan-gauthier-jt:chunked-materialize

alan-gauthier-jt commented Apr 14, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alan-gauthier-jt commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it:

Problem

Solution

Changes

Usage examples

Python SDK

feature_store.yaml project default

CLI flags

Which issue(s) this PR fixes:

Checks

Testing Strategy

New unit tests added

Misc

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alan-gauthier-jt commented Apr 14, 2026 •

edited

Loading

`feature_store.yaml` project default