Skip to content

Support Data Labeling and LabelViews #5456

@franciscojavierarceo

Description

@franciscojavierarceo

Is your feature request related to a problem? Please describe.
Historically, Feast has played a key role in feature development. Particularly around dataset preparation for model development and feature serving for online inference.

Pictorially, you can think of it like this:

Image

Yet labels are the core piece of a training dataset that makes model training successful. Without labels, features are a waste of time (excluding semi/self-supervised learning).

Given the work with compute engine, my proposal is to expand Feast to include the entire Training Dataset preparation life cycle which would include labels and their correction.

A proof of concept was developed in the UI to highlight educate users about this here: #5410

We should expand this properly so that users can define a LabelView in the online store that can be used to store labels explicitly.

Describe the solution you'd like
A LabelView that can be used to write data to the online and offline store.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered. It could look something like:

customer = Entity(name="customer_id", dtype=Int64)

# 2) Point to your label data in e.g. Parquet
label_source = FileSource(
    path="gs://my-bucket/churn_labels/*.parquet",
    event_timestamp_column="label_timestamp",
    created_timestamp_column="created_ts",
)

# 3) Declare the LabelView
customer_churn = LabelView(
    name="customer_churn",
    entities=[customer],
    schema=[
        Field(name="churned", dtype=ValueType.BOOL),
        Field(name="risk_score", dtype=ValueType.FLOAT),
    ],
    batch_source=label_source,
    ttl=timedelta(days=90),
    description="Customer churn flag and risk score for training/monitoring.",
)

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions