feat: Feast-MLflow Integration by Vperiodt · Pull Request #6235 · feast-dev/feast

Vperiodt · 2026-04-08T08:02:51Z

What this PR does / why we need it:

final_mlflow_demo.mp4

Auto-logging: Feature retrieval metadata is tagged on the active MLflow run (feast.feature_refs, feast.feature_views, feast.feature_service, feast.entity_count, etc.)
Entity DataFrame archival: Optionally saves the training entity DataFrame as an MLflow artifact (entity_df.parquet) for full reproducibility
Model-to-feature-service resolution: resolve_feature_service_from_model_uri() maps any MLflow model URI back to its Feast feature service enabling serving pipelines to auto-discover which features a model needs
Entity DataFrame reconstruction: get_entity_df_from_mlflow_run() rebuilds the exact entity DataFrame from a past run's artifacts, enabling training reproducibility
Configuration : Controlled entirely via feature_store.yaml under a new mlflow: block

Which issue(s) this PR fixes:

Checks

I've made sure the tests are passing.
My commits are signed off (git commit -s)
My PR title follows conventional commits format

Testing Strategy

Unit tests
Integration tests
Manual tests
Testing is not required for this change

Misc

devin-ai-integration

Devin Review found 1 new potential issue.

View 19 additional findings in Devin Review.

devin-ai-integration · 2026-04-15T11:01:00Z

+                    with tempfile.TemporaryDirectory() as tmp_dir:
+                        path = os.path.join(tmp_dir, "entity_df.parquet")
+                        entity_df.to_parquet(path, index=False)
+                        mlflow.log_artifact(path)


🟡 Mixing global mlflow.log_artifact() with explicit-URI client causes artifact/metadata to target different servers

In _auto_log_entity_df_info, tags and params are logged via a locally-created MlflowClient(tracking_uri=tracking_uri) (lines 304, 309-318), but the entity DataFrame artifact is uploaded via the global mlflow.log_artifact(path) (line 327). The global function uses the tracking URI set by mlflow.set_tracking_uri(), not the explicit tracking_uri from the config. If the global tracking URI is changed by another library in the process, or if _init_mlflow_tracking failed silently (caught by except Exception at sdk/python/feast/feature_store.py:249), the artifact would be uploaded to a different server than where the tags/params were logged, splitting metadata across two servers.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration

Devin Review found 2 new potential issues.

View 20 additional findings in Devin Review.

devin-ai-integration · 2026-04-16T12:54:56Z

+        try:
+            import mlflow
+
+            tracking_uri = mlflow_cfg.tracking_uri or "http://127.0.0.1:5000"


🔴 UI server ignores MLFLOW_TRACKING_URI env var, falls back to hardcoded localhost

In ui_server.py, both the /api/mlflow-runs and /api/mlflow-feature-models endpoints resolve the MLflow tracking URI using mlflow_cfg.tracking_uri or "http://127.0.0.1:5000". This reads the raw tracking_uri field from the config and falls back to a hardcoded localhost URL, completely bypassing the MLFLOW_TRACKING_URI environment variable. In contrast, the rest of the codebase (feature_store.py:243, feature_store.py:325, feature_store.py:1696, feature_store.py:2885) correctly calls mlflow_cfg.get_tracking_uri() which checks the env var via sdk/python/feast/mlflow_integration/config.py:19-29. When a user sets MLFLOW_TRACKING_URI without setting tracking_uri in YAML (which is a very common deployment pattern, and documented in the PR's own docs at docs/reference/mlflow.md:51), the UI endpoints will incorrectly connect to http://127.0.0.1:5000 instead of the env-var-specified server.

Suggested change

tracking_uri = mlflow_cfg.tracking_uri or "http://127.0.0.1:5000"

tracking_uri = mlflow_cfg.get_tracking_uri() or "http://127.0.0.1:5000"

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-16T12:54:57Z

+        try:
+            import mlflow
+
+            tracking_uri = mlflow_cfg.tracking_uri or "http://127.0.0.1:5000"


🔴 Second UI endpoint also ignores MLFLOW_TRACKING_URI env var

Same issue as in the /api/mlflow-runs endpoint: the /api/mlflow-feature-models endpoint at sdk/python/feast/ui_server.py:234 uses mlflow_cfg.tracking_uri or "http://127.0.0.1:5000" instead of mlflow_cfg.get_tracking_uri(), causing the MLFLOW_TRACKING_URI environment variable to be ignored.

Was this helpful? React with 👍 or 👎 to provide feedback.

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

devin-ai-integration

Devin Review found 3 new potential issues.

View 22 additional findings in Devin Review.

devin-ai-integration · 2026-04-20T09:29:23Z

            )
+
+            # Emit MLflow event for materialization (Phase 7)
+            _mat_duration = time.monotonic() - _retrieval_start if '_retrieval_start' in dir() else 0


🟡 Undefined _retrieval_start variable in materialize_incremental causes bogus duration or NameError

In materialize_incremental, the code at line 2115 references _retrieval_start which is never defined in that method scope. The expression '_retrieval_start' in dir() checks local variable names but dir() without arguments returns module-level names, not local variables -- so the check is unreliable. If _retrieval_start is not in the returned list, _mat_duration defaults to 0 (acceptable but meaningless). However, if it IS in the name list (e.g., a module-level _retrieval_start existed from another context), it would attempt time.monotonic() - _retrieval_start on a potentially unrelated value, resulting in a wrong duration or a NameError/TypeError at runtime.

Suggested change

_mat_duration = time.monotonic() - _retrieval_start if '_retrieval_start' in dir() else 0

_mat_duration = 0

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-20T09:29:24Z

+        if tracking_uri:
+            mlflow.set_tracking_uri(tracking_uri)
+
+        experiment_name = f"{project}{ops_experiment_suffix}"
+        mlflow.set_experiment(experiment_name)


🔴 log_apply_to_mlflow and log_materialize_to_mlflow mutate global MLflow tracking URI and experiment, corrupting concurrent user runs

Both log_apply_to_mlflow (sdk/python/feast/mlflow_integration/logger.py:185-189) and log_materialize_to_mlflow (sdk/python/feast/mlflow_integration/logger.py:251-255) call mlflow.set_tracking_uri(tracking_uri) and mlflow.set_experiment(experiment_name) globally. These mutate process-wide global state. If a user has an active MLflow run in a different experiment (e.g. during feast apply inside a training script), these calls redirect subsequent MLflow operations to the ops experiment. While the functions try to restore mlflow.set_experiment(project) afterward (line 224, 274), the tracking URI is never restored. In the exception path, if mlflow.start_run() raises (line 202/258), the experiment remains set to the ops experiment. This corrupts any concurrent or subsequent user operations in the same process.

Prompt for agents

The functions log_apply_to_mlflow and log_materialize_to_mlflow in sdk/python/feast/mlflow_integration/logger.py mutate process-wide global state by calling mlflow.set_tracking_uri() and mlflow.set_experiment(). This corrupts any concurrent user MLflow runs in the same process. The fix should save and restore the original tracking URI and experiment before/after the ops logging, or preferably use the MlflowClient API exclusively (which already takes tracking_uri as a parameter) instead of the global mlflow module-level functions. The MlflowClient.create_run() method allows creating runs in specific experiments without mutating global state. This pattern is already used in log_feature_retrieval_to_mlflow which correctly uses client = mlflow.MlflowClient(tracking_uri=tracking_uri) without touching global state.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-20T09:29:25Z

+    def mlflow(self):
+        """Get the MLflow configuration."""
+        if not self._mlflow:
+            if isinstance(self.mlflow_config, Dict):
+                from feast.mlflow_integration.config import MlflowConfig
+
+                self._mlflow = MlflowConfig(**self.mlflow_config)
+            elif self.mlflow_config:
+                self._mlflow = self.mlflow_config
+        return self._mlflow


🟡 mlflow property returns falsy None on subsequent calls when MlflowConfig(enabled=False) is set

The mlflow property in RepoConfig (sdk/python/feast/repo_config.py:497-506) uses if not self._mlflow: as the guard. When the config is a valid MlflowConfig object with enabled=False, self._mlflow will be a truthy MlflowConfig instance (Pydantic models are truthy), so this works correctly on first access. However, if mlflow_config is None (the default), self._mlflow stays None forever and the property returns None -- which is fine. The real issue is: if mlflow_config is an empty dict {}, MlflowConfig(**{}) creates a valid config with enabled=False, and self._mlflow becomes truthy. But if mlflow_config is set to some falsy-like Pydantic object (unlikely but possible with custom subclasses), the if not self._mlflow guard would re-create it every time. This is the same pattern as openlineage property so it's consistent, but not actually a bug for standard usage.

Was this helpful? React with 👍 or 👎 to provide feedback.

franciscojavierarceo · 2026-04-20T13:42:35Z

+- **Online feature retrieval** -- `get_online_features()` tags the run with the same metadata
+- **Entity DataFrame archival** -- optionally saves the training entity DataFrame as an MLflow artifact for full reproducibility
+- **Execution context tagging** -- tags runs with where they ran (workbench, KFP pipeline, feature server, or standalone)
+- **Operation logging** -- optionally logs `feast apply` and `feast materialize` to a separate MLflow experiment


not sure this is actually useful

actually, this is not updated one, thought of this one for backtracking from mllfow run to the workbench experiment but now it is omitted

franciscojavierarceo · 2026-04-20T13:42:52Z

+- **Entity DataFrame archival** -- optionally saves the training entity DataFrame as an MLflow artifact for full reproducibility
+- **Execution context tagging** -- tags runs with where they ran (workbench, KFP pipeline, feature server, or standalone)
+- **Operation logging** -- optionally logs `feast apply` and `feast materialize` to a separate MLflow experiment
+- **Model-to-Feature resolution** -- map any MLflow model URI back to its Feast feature service


franciscojavierarceo · 2026-04-20T13:43:00Z

+- **Execution context tagging** -- tags runs with where they ran (workbench, KFP pipeline, feature server, or standalone)
+- **Operation logging** -- optionally logs `feast apply` and `feast materialize` to a separate MLflow experiment
+- **Model-to-Feature resolution** -- map any MLflow model URI back to its Feast feature service
+- **Training reproducibility** -- reconstruct the exact entity DataFrame from a past MLflow run


franciscojavierarceo · 2026-04-20T13:43:06Z

+- **Operation logging** -- optionally logs `feast apply` and `feast materialize` to a separate MLflow experiment
+- **Model-to-Feature resolution** -- map any MLflow model URI back to its Feast feature service
+- **Training reproducibility** -- reconstruct the exact entity DataFrame from a past MLflow run
+- **Training-to-prediction linkage** -- `FeastMlflowClient.load_model()` links prediction runs back to their training runs


franciscojavierarceo · 2026-04-20T13:43:17Z

+- **Model-to-Feature resolution** -- map any MLflow model URI back to its Feast feature service
+- **Training reproducibility** -- reconstruct the exact entity DataFrame from a past MLflow run
+- **Training-to-prediction linkage** -- `FeastMlflowClient.load_model()` links prediction runs back to their training runs
+- **Feast MLflow Client** -- a thin wrapper that eliminates direct `import mlflow` in user code


makes sense

franciscojavierarceo · 2026-04-20T13:46:33Z

+    _consecutive_failures = 0
+
+
+def log_feature_retrieval_to_mlflow(


i guess if we have this enabled why do we need to even have the wrapper? can't we full handle this for the user?

I guess my question is, can't we just hide all of the mlflow client usage to the user?

the auto-logging in logger.py fully handled for users who already use mlflow.start_run() or client.start_run() considering our FeastMlflowClient wrapper which is for users who want to avoid import mlflow entirely and get additional Feast-specific behavior on top ex: logging a model, registering it, loading it for inference etc soFeastMlflowClient is for users who want the extra lineage features without touching MLflow directly

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

devin-ai-integration

Devin Review found 2 new potential issues.

View 10 additional findings in Devin Review.

devin-ai-integration · 2026-04-20T14:35:11Z

🟡 get_online_features_async missing MLflow auto-logging unlike its sync counterpart

The sync get_online_features method at sdk/python/feast/feature_store.py:2942-2997 includes MLflow auto-logging of feature retrieval metadata, but the async counterpart get_online_features_async at sdk/python/feast/feature_store.py:3001-3047 has no MLflow logging at all. Any feature retrieval through the async path will silently skip MLflow tracking, breaking the "every retrieval is logged" contract described in the docs. This is inconsistent since the methods serve the same purpose, and users switching between sync and async will lose telemetry.

(Refers to lines 3037-3047)

Prompt for agents

The get_online_features_async method at feature_store.py:3001-3047 is missing the MLflow auto-logging that was added to the sync get_online_features method at lines 2954-2997. The same pattern of logging feature refs, entity count, duration, and feature service name should be added after the await call in get_online_features_async. Note that the logging code itself is synchronous (just HTTP calls to MLflow), so it can be called directly after the await. Follow the same pattern as get_online_features: capture _retrieval_start before the provider call, then after the response, wrap the MLflow logging in a try/except.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-20T14:35:12Z

+                    if self.config.mlflow.auto_log_entity_df:
+                        self._auto_log_entity_df_info(
+                            entity_df, start_date=start_date, end_date=end_date
+                        )


🟡 Entity DataFrame metadata is only logged when auto_log_entity_df is True, contradicting documented behavior

The documentation at docs/reference/mlflow.md:116-124 explicitly states: "Regardless of auto_log_entity_df, the following metadata is logged when present" — listing feast.entity_df_type, feast.entity_df_rows, feast.entity_df_columns, etc. However, _auto_log_entity_df_info (which logs ALL entity df metadata including both lightweight tags/params and the heavier parquet artifact) is only called when self.config.mlflow.auto_log_entity_df is True at sdk/python/feast/feature_store.py:1753. With the default configuration (auto_log_entity_df: false), no entity df metadata is logged at all. The intent per the docs is that the lightweight metadata (type, row count, column names) should always be logged, and only the artifact upload should be gated by auto_log_entity_df.

Prompt for agents

The _auto_log_entity_df_info method at feature_store.py:314-370 handles both lightweight metadata logging (entity_df_type, entity_df_rows, entity_df_columns tags/params) and the heavier artifact upload (entity_df.parquet). Currently it is only called when auto_log_entity_df is True (line 1753), but the documentation states that metadata should always be logged regardless of auto_log_entity_df. The fix should split the logic: always call the metadata logging portion when auto_log is enabled (i.e., move the metadata logging part outside the auto_log_entity_df guard), and only gate the artifact upload (parquet file) on auto_log_entity_df. One approach is to refactor _auto_log_entity_df_info to accept a flag controlling whether to upload the artifact, or split it into two methods: one for metadata (always called) and one for the artifact (called only when auto_log_entity_df is True).

Was this helpful? React with 👍 or 👎 to provide feedback.

This comment was marked as resolved.

Sign in to view

Vperiodt force-pushed the feast-mlflow branch from 0e6296b to a63d529 Compare April 9, 2026 12:05

Vperiodt marked this pull request as ready for review April 9, 2026 12:06

Vperiodt requested a review from a team as a code owner April 9, 2026 12:06

This comment was marked as resolved.

Sign in to view

Vperiodt force-pushed the feast-mlflow branch from 2f79660 to 4aa9d50 Compare April 14, 2026 08:16

This comment was marked as resolved.

Sign in to view

Vperiodt force-pushed the feast-mlflow branch from 7ceaa94 to 08a8f91 Compare April 14, 2026 14:08

This comment was marked as resolved.

Sign in to view

Vperiodt changed the title ~~Feast-MLflow Integration~~ feat: Feast-MLflow Integration Apr 15, 2026

devin-ai-integration bot reviewed Apr 15, 2026

View reviewed changes

Vperiodt force-pushed the feast-mlflow branch from 6683cb5 to dba7e99 Compare April 16, 2026 12:48

devin-ai-integration bot reviewed Apr 16, 2026

View reviewed changes

Vperiodt added 12 commits April 20, 2026 14:51

mlflow-feast integration

902e6b2

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

refine MLflow integration, UI runs endpoint, and lockfiles

f539cdc

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

fix issues

e93f1e6

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

handle fallback

8fceece

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

added model to features table

1b5350f

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

address CI

ca51316

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

add docs

3dc1239

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

fix-lint

3e0bd9c

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

update docs

33da29c

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

update-logic

7088198

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

add-integration-test

0d7f916

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

mlflowclient wrapper

76c4363

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

Vperiodt force-pushed the feast-mlflow branch from e933f27 to 76c4363 Compare April 20, 2026 09:25

devin-ai-integration bot reviewed Apr 20, 2026

View reviewed changes

franciscojavierarceo reviewed Apr 20, 2026

View reviewed changes

Vperiodt added 2 commits April 20, 2026 19:17

address-reviews

67c647f

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

update docs

304e851

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

This comment was marked as resolved.

Sign in to view

fix-lint

90fc9a6

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

devin-ai-integration bot reviewed Apr 20, 2026

View reviewed changes

	tracking_uri = mlflow_cfg.tracking_uri or "http://127.0.0.1:5000"
	tracking_uri = mlflow_cfg.get_tracking_uri() or "http://127.0.0.1:5000"

	_mat_duration = time.monotonic() - _retrieval_start if '_retrieval_start' in dir() else 0
	_mat_duration = 0

		_consecutive_failures = 0


		def log_feature_retrieval_to_mlflow(

Conversation

Vperiodt commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it:

Which issue(s) this PR fixes:

Checks

Testing Strategy

Misc

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Vperiodt Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Apr 20, 2026

Choose a reason for hiding this comment

Vperiodt commented Apr 8, 2026 •

edited

Loading

devin-ai-integration bot Apr 15, 2026 •

edited

Loading

devin-ai-integration bot Apr 16, 2026 •

edited

Loading

Vperiodt Apr 20, 2026 •

edited

Loading