Skip to content

Commit 7279c75

Browse files
authored
feat: Feast-MLflow Integration (feast-dev#6235)
* mlflow-feast integration Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED refine MLflow integration, UI runs endpoint, and lockfiles Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED fix issues Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED handle fallback Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED added model to features table Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED address CI Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED add docs Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED fix-lint Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED update docs Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED update-logic Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED add-integration-test Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED mlflowclient wrapper Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED address-reviews Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED update docs Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED fix-lint Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED fix-auto-log Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED update feastMlflowClient Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED changing init Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED fixing docs Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED remove doc Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED update format Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED fix-test Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED update docs Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED addressed reviews Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED fix: update .secrets.baseline to remove pragma-allowlisted entry Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED ui-fallback Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED update requirements Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED * fix-CI Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED * fix-lint Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED * fix: remove unused imports in ui_server.py Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
1 parent 7ab3502 commit 7279c75

40 files changed

Lines changed: 9372 additions & 4366 deletions

.secrets.baseline

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1539,5 +1539,5 @@
15391539
}
15401540
]
15411541
},
1542-
"generated_at": "2026-05-14T10:20:01Z"
1542+
"generated_at": "2026-05-20T07:55:48Z"
15431543
}

docs/reference/mlflow.md

Lines changed: 347 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,347 @@
1+
# MLflow Integration
2+
3+
Feast provides **native integration** with [MLflow](https://mlflow.org/) for automatic feature lineage tracking alongside ML experiments. When enabled, every feature retrieval is logged to the active MLflow run.
4+
5+
## Overview
6+
7+
- **Which features did this model use?** -- auto-logged on every `get_historical_features()` / `get_online_features()` call
8+
- **Which feature service should I use to serve this model?** -- resolved from model URI via `store.mlflow.resolve_features()`
9+
- **Can I reproduce the exact training data?** -- entity DataFrame saved as an MLflow artifact
10+
- **Which models break if I change a feature view?** -- reverse index via the Feast UI `/api/mlflow-feature-usage` endpoint
11+
- **When was the feature store last updated?** -- `feast apply` and `feast materialize` logged to a separate ops experiment
12+
13+
### Capabilities
14+
15+
| Capability | How |
16+
|---|---|
17+
| Auto-log feature metadata | Tags on every retrieval inside an active MLflow run |
18+
| Entity DataFrame archival | `entity_df.parquet` artifact for full reproducibility |
19+
| Model registration with lineage | `feast.feature_service` tag propagated to model versions |
20+
| Training-to-prediction linkage | `store.mlflow.load_model()` links prediction runs back to training runs |
21+
| Model-to-feature resolution | Map any model URI back to its Feast feature service |
22+
| Operation audit trail | `feast apply` / `feast materialize` logged to `{project}-feast-ops` |
23+
| `store.mlflow` API | Single entry point — zero `import mlflow`, zero client objects |
24+
| Feast UI integration | Per-feature-view usage stats and registered model associations |
25+
26+
## Installation
27+
28+
MLflow is an optional dependency:
29+
30+
```bash
31+
pip install feast[mlflow]
32+
```
33+
34+
## Configuration
35+
36+
Add the `mlflow` section to your `feature_store.yaml`:
37+
38+
```yaml
39+
project: my_project
40+
registry: data/registry.db
41+
provider: local
42+
online_store:
43+
type: sqlite
44+
path: data/online_store.db
45+
46+
mlflow:
47+
enabled: true
48+
tracking_uri: http://127.0.0.1:5000 # optional, falls back to MLFLOW_TRACKING_URI env var
49+
auto_log: true # default
50+
auto_log_entity_df: false # default
51+
entity_df_max_rows: 100000 # default
52+
log_operations: false # default
53+
ops_experiment_suffix: "-feast-ops" # default
54+
```
55+
56+
### Configuration options
57+
58+
| Option | Type | Default | Description |
59+
|--------|------|---------|-------------|
60+
| `enabled` | bool | `false` | Master switch for the entire integration |
61+
| `tracking_uri` | string | *(none)* | MLflow tracking server URI. Falls back to `MLFLOW_TRACKING_URI` env var, then MLflow default (`./mlruns`) |
62+
| `auto_log` | bool | `true` | Automatically log feature metadata on every retrieval when an active MLflow run exists |
63+
| `auto_log_entity_df` | bool | `false` | Save the entity DataFrame as `entity_df.parquet` artifact on historical retrieval |
64+
| `entity_df_max_rows` | int | `100000` | Skip entity DataFrame artifact upload for DataFrames exceeding this limit |
65+
| `log_operations` | bool | `false` | Log `feast apply` and `feast materialize` to a separate MLflow experiment |
66+
| `ops_experiment_suffix` | string | `"-feast-ops"` | Suffix appended to project name for the operations experiment |
67+
68+
### Tracking URI resolution
69+
70+
The tracking URI is resolved in this order:
71+
72+
1. `tracking_uri` field in `feature_store.yaml`
73+
2. `MLFLOW_TRACKING_URI` environment variable
74+
3. MLflow's default (`./mlruns` local directory)
75+
76+
This means you can omit `tracking_uri` from the YAML and set `MLFLOW_TRACKING_URI` in your environment instead, or it would be pulled from `./mlruns` automatically when both are not set.
77+
78+
## What gets logged
79+
80+
### Tags on retrieval runs
81+
82+
When `auto_log: true` and an active MLflow run exists, each `get_historical_features()` or `get_online_features()` call records:
83+
84+
| Tag | Example | Description |
85+
|-----|---------|-------------|
86+
| `feast.project` | `my_project` | Feast project name |
87+
| `feast.retrieval_type` | `historical` / `online` | Type of feature retrieval |
88+
| `feast.feature_service` | `driver_activity_v1` | Auto-resolved feature service name (if matched) |
89+
| `feast.feature_views` | `driver_hourly_stats` | Comma-separated feature view names |
90+
| `feast.feature_refs` | `driver_hourly_stats:conv_rate,...` | All feature references |
91+
| `feast.entity_count` | `200` | Number of entities in the request |
92+
| `feast.feature_count` | `5` | Number of features retrieved |
93+
94+
### Metrics
95+
96+
| Metric | Example | Description |
97+
|--------|---------|-------------|
98+
| `feast.job_submission_sec` | `0.4321` | Feature retrieval duration in seconds |
99+
100+
### Artifacts
101+
102+
When `auto_log_entity_df: true` and the entity DataFrame has fewer than `entity_df_max_rows` rows:
103+
104+
| Artifact | Description |
105+
|----------|-------------|
106+
| `entity_df.parquet` | Full entity DataFrame used in the retrieval |
107+
108+
When a model is logged via `store.mlflow.log_model()`:
109+
110+
| Artifact | Description |
111+
|----------|-------------|
112+
| `feast_features.json` | JSON list of feature references the model was trained on |
113+
114+
### Entity DataFrame metadata
115+
116+
Regardless of `auto_log_entity_df`, the following metadata is logged when present:
117+
118+
| Tag / Param | When | Description |
119+
|-------------|------|-------------|
120+
| `feast.entity_df_type` | Always | `dataframe`, `sql`, or `range` |
121+
| `feast.entity_df_rows` | DataFrame input | Row count |
122+
| `feast.entity_df_columns` | DataFrame input | Column names |
123+
| `feast.entity_df_query` | SQL input | The SQL query string |
124+
| `feast.start_date` / `feast.end_date` | Range-based input | Date range |
125+
126+
### Operation logs
127+
128+
When `log_operations: true`, `feast apply` and `feast materialize` create self-contained runs in the `{project}{ops_experiment_suffix}` experiment (default: `my_project-feast-ops`):
129+
130+
**Apply runs:**
131+
132+
| Tag / Metric | Example |
133+
|--------------|---------|
134+
| `feast.operation` | `apply` |
135+
| `feast.project` | `my_project` |
136+
| `feast.feature_views_changed` | `driver_hourly_stats,order_stats` |
137+
| `feast.feature_services_changed` | `driver_activity_v1` |
138+
| `feast.entities_changed` | `driver,restaurant` |
139+
| `feast.apply.feature_views_count` | `2` |
140+
| `feast.apply.feature_services_count` | `1` |
141+
| `feast.apply.entities_count` | `2` |
142+
143+
**Materialize runs:**
144+
145+
| Tag / Metric | Example |
146+
|--------------|---------|
147+
| `feast.operation` | `materialize` / `materialize_incremental` |
148+
| `feast.project` | `my_project` |
149+
| `feast.materialize.feature_views` | `driver_hourly_stats` |
150+
| `feast.materialize.start_date` | `2024-01-01T00:00:00` |
151+
| `feast.materialize.end_date` | `2024-01-02T00:00:00` |
152+
| `feast.materialize.duration_sec` | `12.3456` |
153+
154+
## Usage
155+
156+
### Automatic logging (zero code)
157+
158+
With the configuration above, feature metadata is logged automatically whenever there is an active MLflow run. No explicit `import mlflow` is needed — just use `store.mlflow`:
159+
160+
```python
161+
from feast import FeatureStore
162+
163+
store = FeatureStore(".")
164+
165+
with store.mlflow.start_run(run_name="my_training"):
166+
training_df = store.get_historical_features(
167+
features=store.get_feature_service("driver_activity_v1"),
168+
entity_df=entity_df,
169+
).to_df()
170+
# The run is now tagged with feast.feature_refs, feast.feature_views, etc.
171+
172+
model = train(training_df)
173+
store.mlflow.log_model(model, "model")
174+
```
175+
176+
No extra code needed — the tags are written automatically.
177+
178+
### `store.mlflow` API (recommended)
179+
180+
`store.mlflow` is the primary way to interact with the Feast–MLflow integration. It provides Feast-enhanced versions of common MLflow operations, and delegates everything else to the raw `mlflow` module:
181+
182+
```python
183+
from feast import FeatureStore
184+
from sklearn.linear_model import LogisticRegression
185+
186+
store = FeatureStore(".")
187+
188+
# Training
189+
with store.mlflow.start_run(run_name="v1_training"):
190+
df = store.get_historical_features(
191+
features=store.get_feature_service("driver_activity_v1"),
192+
entity_df=entity_df,
193+
).to_df()
194+
195+
model = LogisticRegression().fit(X, y)
196+
store.mlflow.log_model(model, "model") # Feast-enhanced: saves feast_features.json
197+
train_run_id = store.mlflow.active_run_id
198+
199+
# Register model (auto-tags version with feast.feature_service)
200+
store.mlflow.register_model(f"runs:/{train_run_id}/model", "driver_model")
201+
202+
# Prediction (auto-links to training run)
203+
with store.mlflow.start_run(run_name="prediction"):
204+
model = store.mlflow.load_model("models:/driver_model/1")
205+
online_features = store.get_online_features(
206+
features=store.get_feature_service("driver_activity_v1"),
207+
entity_rows=[{"driver_id": 1001}],
208+
)
209+
predictions = model.predict(...)
210+
```
211+
212+
### `feast.mlflow` module API (alternative)
213+
214+
For users who prefer a module-level import, `feast.mlflow` is a **drop-in replacement for `import mlflow`** that delegates to the same `store.mlflow` client under the hood:
215+
216+
```python
217+
import feast.mlflow
218+
from feast import FeatureStore
219+
220+
store = FeatureStore(".") # auto-registers with feast.mlflow
221+
222+
with feast.mlflow.start_run(run_name="training"):
223+
df = store.get_historical_features(...).to_df()
224+
feast.mlflow.log_params({"lr": "0.01"}) # plain passthrough
225+
feast.mlflow.log_metrics({"f1": 0.85}) # plain passthrough
226+
feast.mlflow.log_model(model, "model") # Feast-enhanced
227+
```
228+
229+
#### Store resolution
230+
231+
`feast.mlflow` resolves its `FeatureStore` in this order:
232+
233+
1. **Explicit `feast.mlflow.init(store)`** — if called, overrides everything
234+
2. **Auto-registered** — the most recently created `FeatureStore` with `mlflow.enabled=true` registers itself automatically
235+
3. **Auto-discovery** — falls back to `FeatureStore(".")` from the current directory
236+
237+
In most cases, simply creating a `FeatureStore(...)` is enough — no `init()` needed.
238+
239+
#### Error handling
240+
241+
`feast.mlflow` raises clear errors on first use if something is misconfigured:
242+
243+
| Condition | Error |
244+
|-----------|-------|
245+
| No `feature_store.yaml` in cwd and no store created | `RuntimeError` with guidance to call `feast.mlflow.init(store)` |
246+
| `mlflow.enabled` is not set to `true` | `RuntimeError` with guidance to set `mlflow.enabled=true` |
247+
| `mlflow` pip package not installed | `ImportError` with guidance to run `pip install feast[mlflow]` |
248+
249+
When `mlflow.enabled` is `false` (or omitted), `store.mlflow` returns `None`, allowing callers to guard with `if store.mlflow:`. The `feast.mlflow` module raises `RuntimeError` only when you attempt to use it without an enabled store.
250+
251+
### Feast-enhanced functions
252+
253+
These functions add automatic Feast tagging and lineage on top of their MLflow counterparts:
254+
255+
| Function | Enhancement |
256+
|----------|-------------|
257+
| `store.mlflow.start_run(run_name, tags)` | Auto-tags run with `feast.project` |
258+
| `store.mlflow.log_model(model, path, flavor)` | Auto-attaches `feast_features.json` artifact |
259+
| `store.mlflow.register_model(model_uri, name)` | Auto-tags model version with `feast.feature_service` |
260+
| `store.mlflow.load_model(model_uri)` | Auto-tags prediction run with training lineage |
261+
262+
**Supported model flavors for `log_model()`:** `sklearn`, `pytorch`, `xgboost`, `lightgbm`, `tensorflow`, `keras`, `pyfunc`.
263+
264+
### Feast-only functions
265+
266+
These are unique to the Feast integration and have no `mlflow` equivalent:
267+
268+
| Function | Description |
269+
|----------|-------------|
270+
| `store.mlflow.resolve_features(model_uri)` | Resolve model URI to Feast feature service name |
271+
| `store.mlflow.get_training_entity_df(run_id, ...)` | Recover entity DataFrame from a past MLflow run |
272+
| `store.mlflow.log_training_dataset(df, dataset_name)` | Log a training DataFrame as an MLflow dataset input |
273+
| `store.mlflow.active_run_id` | Current active MLflow run ID (or `None`) |
274+
| `store.mlflow.client` | The underlying `MlflowClient` instance for advanced queries |
275+
| `feast.mlflow.init(store)` | Explicitly bind `feast.mlflow` module to a `FeatureStore` (optional) |
276+
277+
### Passthrough behavior
278+
279+
The `feast.mlflow` module delegates any attribute not listed above to the raw `mlflow` module. This means you can use `feast.mlflow` as a drop-in replacement for `import mlflow`:
280+
281+
```python
282+
feast.mlflow.log_params(params) # passes through to mlflow.log_params
283+
feast.mlflow.log_metrics(metrics)
284+
feast.mlflow.set_tag("env", "staging")
285+
feast.mlflow.MlflowClient()
286+
```
287+
288+
`store.mlflow` does **not** have this passthrough — it only exposes the Feast-enhanced and Feast-only methods listed above. To access raw `mlflow` functions from `store.mlflow`, use the escape hatches:
289+
290+
```python
291+
store.mlflow.client.log_param(run_id, "lr", "0.01") # via MlflowClient instance
292+
store.mlflow.mlflow.log_params(params) # via raw mlflow module
293+
```
294+
295+
### Resolve a model back to its feature service
296+
297+
```python
298+
from feast import FeatureStore
299+
300+
store = FeatureStore(".")
301+
fs_name = store.mlflow.resolve_features("models:/driver_model/1")
302+
# Returns: "driver_activity_v1"
303+
```
304+
305+
Resolution order:
306+
1. Model version tag `feast.feature_service` (set by `register_model()`)
307+
2. Training run tag `feast.feature_service` (set by auto-logging)
308+
309+
### Reproduce training from a past run
310+
311+
```python
312+
from feast import FeatureStore
313+
314+
store = FeatureStore(".")
315+
316+
entity_df = store.mlflow.get_training_entity_df(run_id="abc123")
317+
318+
with store.mlflow.start_run(run_name="retrain_v2"):
319+
new_df = store.get_historical_features(
320+
features=store.get_feature_service("driver_activity_v1"),
321+
entity_df=entity_df,
322+
).to_df()
323+
model = train(new_df)
324+
store.mlflow.log_model(model, "model")
325+
```
326+
327+
This requires `auto_log_entity_df: true` to have been enabled when the original run was recorded.
328+
329+
## Feast UI integration
330+
331+
The Feast UI server exposes three API endpoints that aggregate data from MLflow:
332+
333+
| Endpoint | Description |
334+
|----------|-------------|
335+
| `/api/mlflow-runs` | All Feast-tagged MLflow runs with linked registered models |
336+
| `/api/mlflow-feature-usage` | Per-feature-view usage stats (run count, last used, associated models) |
337+
| `/api/mlflow-feature-models` | Reverse index of feature refs to registered models |
338+
339+
The feature view detail page in the Feast UI displays:
340+
- **MLflow Training Runs** count and **Last Used** date in the header stats
341+
- An **MLflow Usage** panel showing training run count, relative last-used time, and a table of registered models that depend on the feature view
342+
343+
Start the Feast UI with:
344+
345+
```bash
346+
feast ui --host 127.0.0.1 --port 8888
347+
```

0 commit comments

Comments
 (0)