Skip to content

feat: Feast-MLflow Integration#6235

Open
Vperiodt wants to merge 15 commits intofeast-dev:masterfrom
Vperiodt:feast-mlflow
Open

feat: Feast-MLflow Integration#6235
Vperiodt wants to merge 15 commits intofeast-dev:masterfrom
Vperiodt:feast-mlflow

Conversation

@Vperiodt
Copy link
Copy Markdown
Contributor

@Vperiodt Vperiodt commented Apr 8, 2026

What this PR does / why we need it:

final_mlflow_demo.mp4
  • Auto-logging: Feature retrieval metadata is tagged on the active MLflow run (feast.feature_refs, feast.feature_views, feast.feature_service, feast.entity_count, etc.)
  • Entity DataFrame archival: Optionally saves the training entity DataFrame as an MLflow artifact (entity_df.parquet) for full reproducibility
  • Model-to-feature-service resolution: resolve_feature_service_from_model_uri() maps any MLflow model URI back to its Feast feature service enabling serving pipelines to auto-discover which features a model needs
  • Entity DataFrame reconstruction: get_entity_df_from_mlflow_run() rebuilds the exact entity DataFrame from a past run's artifacts, enabling training reproducibility
  • Configuration : Controlled entirely via feature_store.yaml under a new mlflow: block

Which issue(s) this PR fixes:

Checks

  • I've made sure the tests are passing.
  • My commits are signed off (git commit -s)
  • My PR title follows conventional commits format

Testing Strategy

  • Unit tests
  • Integration tests
  • Manual tests
  • Testing is not required for this change

Misc


Open with Devin

github-advanced-security[bot]

This comment was marked as resolved.

@Vperiodt Vperiodt marked this pull request as ready for review April 9, 2026 12:06
@Vperiodt Vperiodt requested a review from a team as a code owner April 9, 2026 12:06
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@Vperiodt Vperiodt changed the title Feast-MLflow Integration feat: Feast-MLflow Integration Apr 15, 2026
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 19 additional findings in Devin Review.

Open in Devin Review

Comment thread sdk/python/feast/feature_store.py Outdated
with tempfile.TemporaryDirectory() as tmp_dir:
path = os.path.join(tmp_dir, "entity_df.parquet")
entity_df.to_parquet(path, index=False)
mlflow.log_artifact(path)
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Mixing global mlflow.log_artifact() with explicit-URI client causes artifact/metadata to target different servers

In _auto_log_entity_df_info, tags and params are logged via a locally-created MlflowClient(tracking_uri=tracking_uri) (lines 304, 309-318), but the entity DataFrame artifact is uploaded via the global mlflow.log_artifact(path) (line 327). The global function uses the tracking URI set by mlflow.set_tracking_uri(), not the explicit tracking_uri from the config. If the global tracking URI is changed by another library in the process, or if _init_mlflow_tracking failed silently (caught by except Exception at sdk/python/feast/feature_store.py:249), the artifact would be uploaded to a different server than where the tags/params were logged, splitting metadata across two servers.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 20 additional findings in Devin Review.

Open in Devin Review

Comment thread sdk/python/feast/ui_server.py Outdated
try:
import mlflow

tracking_uri = mlflow_cfg.tracking_uri or "http://127.0.0.1:5000"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 UI server ignores MLFLOW_TRACKING_URI env var, falls back to hardcoded localhost

In ui_server.py, both the /api/mlflow-runs and /api/mlflow-feature-models endpoints resolve the MLflow tracking URI using mlflow_cfg.tracking_uri or "http://127.0.0.1:5000". This reads the raw tracking_uri field from the config and falls back to a hardcoded localhost URL, completely bypassing the MLFLOW_TRACKING_URI environment variable. In contrast, the rest of the codebase (feature_store.py:243, feature_store.py:325, feature_store.py:1696, feature_store.py:2885) correctly calls mlflow_cfg.get_tracking_uri() which checks the env var via sdk/python/feast/mlflow_integration/config.py:19-29. When a user sets MLFLOW_TRACKING_URI without setting tracking_uri in YAML (which is a very common deployment pattern, and documented in the PR's own docs at docs/reference/mlflow.md:51), the UI endpoints will incorrectly connect to http://127.0.0.1:5000 instead of the env-var-specified server.

Suggested change
tracking_uri = mlflow_cfg.tracking_uri or "http://127.0.0.1:5000"
tracking_uri = mlflow_cfg.get_tracking_uri() or "http://127.0.0.1:5000"
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread sdk/python/feast/ui_server.py Outdated
try:
import mlflow

tracking_uri = mlflow_cfg.tracking_uri or "http://127.0.0.1:5000"
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Second UI endpoint also ignores MLFLOW_TRACKING_URI env var

Same issue as in the /api/mlflow-runs endpoint: the /api/mlflow-feature-models endpoint at sdk/python/feast/ui_server.py:234 uses mlflow_cfg.tracking_uri or "http://127.0.0.1:5000" instead of mlflow_cfg.get_tracking_uri(), causing the MLFLOW_TRACKING_URI environment variable to be ignored.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Vperiodt added 12 commits April 20, 2026 14:51
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 new potential issues.

View 22 additional findings in Devin Review.

Open in Devin Review

Comment thread sdk/python/feast/feature_store.py Outdated
)

# Emit MLflow event for materialization (Phase 7)
_mat_duration = time.monotonic() - _retrieval_start if '_retrieval_start' in dir() else 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Undefined _retrieval_start variable in materialize_incremental causes bogus duration or NameError

In materialize_incremental, the code at line 2115 references _retrieval_start which is never defined in that method scope. The expression '_retrieval_start' in dir() checks local variable names but dir() without arguments returns module-level names, not local variables -- so the check is unreliable. If _retrieval_start is not in the returned list, _mat_duration defaults to 0 (acceptable but meaningless). However, if it IS in the name list (e.g., a module-level _retrieval_start existed from another context), it would attempt time.monotonic() - _retrieval_start on a potentially unrelated value, resulting in a wrong duration or a NameError/TypeError at runtime.

Suggested change
_mat_duration = time.monotonic() - _retrieval_start if '_retrieval_start' in dir() else 0
_mat_duration = 0
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +185 to +189
if tracking_uri:
mlflow.set_tracking_uri(tracking_uri)

experiment_name = f"{project}{ops_experiment_suffix}"
mlflow.set_experiment(experiment_name)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 log_apply_to_mlflow and log_materialize_to_mlflow mutate global MLflow tracking URI and experiment, corrupting concurrent user runs

Both log_apply_to_mlflow (sdk/python/feast/mlflow_integration/logger.py:185-189) and log_materialize_to_mlflow (sdk/python/feast/mlflow_integration/logger.py:251-255) call mlflow.set_tracking_uri(tracking_uri) and mlflow.set_experiment(experiment_name) globally. These mutate process-wide global state. If a user has an active MLflow run in a different experiment (e.g. during feast apply inside a training script), these calls redirect subsequent MLflow operations to the ops experiment. While the functions try to restore mlflow.set_experiment(project) afterward (line 224, 274), the tracking URI is never restored. In the exception path, if mlflow.start_run() raises (line 202/258), the experiment remains set to the ops experiment. This corrupts any concurrent or subsequent user operations in the same process.

Prompt for agents
The functions log_apply_to_mlflow and log_materialize_to_mlflow in sdk/python/feast/mlflow_integration/logger.py mutate process-wide global state by calling mlflow.set_tracking_uri() and mlflow.set_experiment(). This corrupts any concurrent user MLflow runs in the same process.

The fix should save and restore the original tracking URI and experiment before/after the ops logging, or preferably use the MlflowClient API exclusively (which already takes tracking_uri as a parameter) instead of the global mlflow module-level functions. The MlflowClient.create_run() method allows creating runs in specific experiments without mutating global state. This pattern is already used in log_feature_retrieval_to_mlflow which correctly uses client = mlflow.MlflowClient(tracking_uri=tracking_uri) without touching global state.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +497 to +506
def mlflow(self):
"""Get the MLflow configuration."""
if not self._mlflow:
if isinstance(self.mlflow_config, Dict):
from feast.mlflow_integration.config import MlflowConfig

self._mlflow = MlflowConfig(**self.mlflow_config)
elif self.mlflow_config:
self._mlflow = self.mlflow_config
return self._mlflow
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 mlflow property returns falsy None on subsequent calls when MlflowConfig(enabled=False) is set

The mlflow property in RepoConfig (sdk/python/feast/repo_config.py:497-506) uses if not self._mlflow: as the guard. When the config is a valid MlflowConfig object with enabled=False, self._mlflow will be a truthy MlflowConfig instance (Pydantic models are truthy), so this works correctly on first access. However, if mlflow_config is None (the default), self._mlflow stays None forever and the property returns None -- which is fine. The real issue is: if mlflow_config is an empty dict {}, MlflowConfig(**{}) creates a valid config with enabled=False, and self._mlflow becomes truthy. But if mlflow_config is set to some falsy-like Pydantic object (unlikely but possible with custom subclasses), the if not self._mlflow guard would re-create it every time. This is the same pattern as openlineage property so it's consistent, but not actually a bug for standard usage.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread docs/reference/mlflow.md Outdated
- **Online feature retrieval** -- `get_online_features()` tags the run with the same metadata
- **Entity DataFrame archival** -- optionally saves the training entity DataFrame as an MLflow artifact for full reproducibility
- **Execution context tagging** -- tags runs with where they ran (workbench, KFP pipeline, feature server, or standalone)
- **Operation logging** -- optionally logs `feast apply` and `feast materialize` to a separate MLflow experiment
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure this is actually useful

Copy link
Copy Markdown
Contributor Author

@Vperiodt Vperiodt Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, this is not updated one, thought of this one for backtracking from mllfow run to the workbench experiment but now it is omitted

Comment thread docs/reference/mlflow.md Outdated
- **Entity DataFrame archival** -- optionally saves the training entity DataFrame as an MLflow artifact for full reproducibility
- **Execution context tagging** -- tags runs with where they ran (workbench, KFP pipeline, feature server, or standalone)
- **Operation logging** -- optionally logs `feast apply` and `feast materialize` to a separate MLflow experiment
- **Model-to-Feature resolution** -- map any MLflow model URI back to its Feast feature service
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

Comment thread docs/reference/mlflow.md Outdated
- **Execution context tagging** -- tags runs with where they ran (workbench, KFP pipeline, feature server, or standalone)
- **Operation logging** -- optionally logs `feast apply` and `feast materialize` to a separate MLflow experiment
- **Model-to-Feature resolution** -- map any MLflow model URI back to its Feast feature service
- **Training reproducibility** -- reconstruct the exact entity DataFrame from a past MLflow run
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

Comment thread docs/reference/mlflow.md Outdated
- **Operation logging** -- optionally logs `feast apply` and `feast materialize` to a separate MLflow experiment
- **Model-to-Feature resolution** -- map any MLflow model URI back to its Feast feature service
- **Training reproducibility** -- reconstruct the exact entity DataFrame from a past MLflow run
- **Training-to-prediction linkage** -- `FeastMlflowClient.load_model()` links prediction runs back to their training runs
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice

Comment thread docs/reference/mlflow.md Outdated
- **Model-to-Feature resolution** -- map any MLflow model URI back to its Feast feature service
- **Training reproducibility** -- reconstruct the exact entity DataFrame from a past MLflow run
- **Training-to-prediction linkage** -- `FeastMlflowClient.load_model()` links prediction runs back to their training runs
- **Feast MLflow Client** -- a thin wrapper that eliminates direct `import mlflow` in user code
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

_consecutive_failures = 0


def log_feature_retrieval_to_mlflow(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess if we have this enabled why do we need to even have the wrapper? can't we full handle this for the user?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my question is, can't we just hide all of the mlflow client usage to the user?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the auto-logging in logger.py fully handled for users who already use mlflow.start_run() or client.start_run() considering our FeastMlflowClient wrapper which is for users who want to avoid import mlflow entirely and get additional Feast-specific behavior on top ex: logging a model, registering it, loading it for inference etc soFeastMlflowClient is for users who want the extra lineage features without touching MLflow directly

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
devin-ai-integration[bot]

This comment was marked as resolved.

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 10 additional findings in Devin Review.

Open in Devin Review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 get_online_features_async missing MLflow auto-logging unlike its sync counterpart

The sync get_online_features method at sdk/python/feast/feature_store.py:2942-2997 includes MLflow auto-logging of feature retrieval metadata, but the async counterpart get_online_features_async at sdk/python/feast/feature_store.py:3001-3047 has no MLflow logging at all. Any feature retrieval through the async path will silently skip MLflow tracking, breaking the "every retrieval is logged" contract described in the docs. This is inconsistent since the methods serve the same purpose, and users switching between sync and async will lose telemetry.

(Refers to lines 3037-3047)

Prompt for agents
The get_online_features_async method at feature_store.py:3001-3047 is missing the MLflow auto-logging that was added to the sync get_online_features method at lines 2954-2997. The same pattern of logging feature refs, entity count, duration, and feature service name should be added after the await call in get_online_features_async. Note that the logging code itself is synchronous (just HTTP calls to MLflow), so it can be called directly after the await. Follow the same pattern as get_online_features: capture _retrieval_start before the provider call, then after the response, wrap the MLflow logging in a try/except.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +1753 to +1756
if self.config.mlflow.auto_log_entity_df:
self._auto_log_entity_df_info(
entity_df, start_date=start_date, end_date=end_date
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Entity DataFrame metadata is only logged when auto_log_entity_df is True, contradicting documented behavior

The documentation at docs/reference/mlflow.md:116-124 explicitly states: "Regardless of auto_log_entity_df, the following metadata is logged when present" — listing feast.entity_df_type, feast.entity_df_rows, feast.entity_df_columns, etc. However, _auto_log_entity_df_info (which logs ALL entity df metadata including both lightweight tags/params and the heavier parquet artifact) is only called when self.config.mlflow.auto_log_entity_df is True at sdk/python/feast/feature_store.py:1753. With the default configuration (auto_log_entity_df: false), no entity df metadata is logged at all. The intent per the docs is that the lightweight metadata (type, row count, column names) should always be logged, and only the artifact upload should be gated by auto_log_entity_df.

Prompt for agents
The _auto_log_entity_df_info method at feature_store.py:314-370 handles both lightweight metadata logging (entity_df_type, entity_df_rows, entity_df_columns tags/params) and the heavier artifact upload (entity_df.parquet). Currently it is only called when auto_log_entity_df is True (line 1753), but the documentation states that metadata should always be logged regardless of auto_log_entity_df.

The fix should split the logic: always call the metadata logging portion when auto_log is enabled (i.e., move the metadata logging part outside the auto_log_entity_df guard), and only gate the artifact upload (parquet file) on auto_log_entity_df. One approach is to refactor _auto_log_entity_df_info to accept a flag controlling whether to upload the artifact, or split it into two methods: one for metadata (always called) and one for the artifact (called only when auto_log_entity_df is True).
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants