feat: Time-based chunked materialization to prevent OOM on dense datasets#6277
Open
alan-gauthier-jt wants to merge 1 commit intofeast-dev:masterfrom
Open
feat: Time-based chunked materialization to prevent OOM on dense datasets#6277alan-gauthier-jt wants to merge 1 commit intofeast-dev:masterfrom
alan-gauthier-jt wants to merge 1 commit intofeast-dev:masterfrom
Conversation
512f4b5 to
79e5b94
Compare
Signed-off-by: Alan Gauthier <alan.gauthier@jobteaser.com>
79e5b94 to
472f9c4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
Problem
FeatureStore.materialize()andFeatureStore.materialize_incremental()load the full requested time range into memory in a single pass. On production deployments with:…this causes out-of-memory (OOM) crashes that are difficult to recover from, with no built-in workaround inside Feast itself. Users currently must write external orchestration scripts to work around this.
Solution
This PR introduces native time-based chunked materialization directly into the Feast SDK. When a
chunk_sizeis configured, each feature view's time range is split into consecutive, non-overlapping windows of the given width, andmaterialize_single_feature_viewis called once per window. This caps peak memory usage to the cost of a single chunk regardless of total range size.Key design decisions:
chunk_sizeis not set (the default).feature_store.yamlproject default > no chunking.registry.apply_materialization: each successfully materialized chunk immediately updatesmost_recent_end_time. A crash mid-run allowsmaterialize_incrementalto resume from the last committed chunk rather than reprocessing the entire range.CONTINUEfailure strategy: an opt-inchunk_failure_strategy: continuesetting collects failed chunks and emits a summary warning instead of aborting, enabling partial-success backfills.CONTINUEis active, the watermark only advances through the contiguous prefix of successful chunks. If chunk N fails, chunks N+1, N+2, … still run (so the online store receives their data idempotently), butmost_recent_end_timeis not advanced past the failed window. The nextmaterialize_incrementalcall therefore retries from the correct start time — no data is permanently lost.Changes
feast/feature_view_utils.pygenerate_time_chunks(start, end, chunk_size)generator — pure, tested, reusablefeast/repo_config.pyChunkFailureStrategyenum; extendedMaterializationConfigwithchunk_sizeandchunk_failure_strategyfieldsfeast/feature_store.py_resolve_chunk_size()helper;chunk_sizeparameter on bothmaterialize()andmaterialize_incremental(); chunked inner loops with per-chunk checkpointingfeast/cli/cli.py--chunk-hours,--chunk-minutes,--chunk-secondsflags on bothfeast materializeandfeast materialize-incrementalUsage examples
Python SDK
feature_store.yamlproject defaultCLI flags
Which issue(s) this PR fixes:
Fixes # (memory exhaustion / OOM during large-range materialization — no existing tracking issue; can be linked once opened)
Checks
git commit -s)Testing Strategy
New unit tests added
tests/unit/test_feature_view_utils.py— 13 tests forgenerate_time_chunks:chunk_size)chunk_sizeraisesValueErrortests/unit/test_chunk_materialization.py— 15 tests forFeatureStoreintegration:_resolve_chunk_sizepriority: call-time > config >Nonematerialize_single_feature_viewcall (backward-compatible)chunk_sizeis used when no call-time overrideCONTINUEstrategy: failed chunks are skipped with a warning; successful chunks are committedCONTINUEstrategy (all succeed): all chunks commit and watermark reachesend_dateSTOPstrategy (default): first failure re-raises immediatelyregistry.apply_materializationis called once per chunk (checkpointing)materialize_incrementalwithchunk_sizesplits correctlyCONTINUEwatermark does not advance past a failed chunk (data-loss prevention)Misc
local,sparkandraycompute engines without any engine-specific changes — chunking is applied at theFeatureStorelayer, above the engine abstraction, so all engines benefit automatically.chunk_sizefield infeature_store.yamlcurrently accepts atimedeltainteger (total seconds). A human-readable string format (e.g."6h","30m") is a potential follow-up using a Pydantic validator.