Skip to content

Proposal: Version-qualified feature refs in get_historical_features (offline store versioning) #6389

@Abhishek8108

Description

@Abhishek8108

Summary

Feature view versioning (#6101) added @v<N> syntax for version-qualified online reads and feast materialize --version v<N> for writing to version-specific online tables. The offline retrieval path (get_historical_features) has no equivalent: version qualifiers in feature refs are silently stripped during ibis prefix matching but never used to route to a different source table.

This means a team that pins their online serving to driver_stats@v2 cannot generate training data from the same v2 schema — breaking training-serving consistency, which is one of the primary motivations for versioning.

The gap

Online store (works today):

store.get_online_features(features=["driver_stats@v2:trips_today"], ...)
# → routes to version-specific table: {project}_driver_stats_v2

Offline store (silently ignores version today):

store.get_historical_features(entity_df=..., features=["driver_stats@v2:trips_today"])
# → routes to the same source table regardless of @v2
# → version tag stripped in _parse_feature_ref / ibis prefix matching

Design questions

I'd like to get your input on the right approach before writing any code, since there are a few viable directions with different tradeoffs:

Option A — Version-specific source tables (mirrors online store pattern)
When enable_online_feature_view_versioning is enabled and a versioned ref is used, route to a version-suffixed offline source table (e.g. {project}_driver_stats_v2). Requires that versioned offline tables exist — either pre-created by an upstream pipeline or written by a future feast materialize-offline --version v2.

Pros: mirrors online store semantics exactly, strong training-serving consistency guarantee.
Cons: offline source data is usually externally managed; teams would need to maintain versioned source tables alongside their online tables, which may be impractical.

Option B — Schema-level version resolution (read from current table, project v schema)
Keep reading from the same source table, but apply the schema of the pinned version when selecting columns. Useful when the underlying data hasn't changed but the feature view's field definitions have evolved.

Pros: no need to maintain separate offline tables, works with externally managed sources.
Cons: weaker guarantee — if the source data itself changed between versions, this doesn't help.

Option C — Raise VersionedOfflineReadNotSupported (explicit error instead of silent ignore)
At minimum, detect @v<N> refs in get_historical_features and raise a clear error rather than silently stripping the version tag. This prevents silent training-serving skew today while the longer-term design is decided.

Questions for you

  1. Is offline store versioning on the roadmap for Improved feature view and model versioning #2728, or is it intentionally out of scope?
  2. If in scope, does Option A or B better match the intended semantics?
  3. Would Option C (explicit error) be a useful interim step regardless?

Happy to take this forward once there's a design direction. I've been working through the Snowflake online store versioning (#6380) and the type_map offline fixes (#6388) so I have reasonable familiarity with both sides of the retrieval path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions