Conversation
| with tempfile.TemporaryDirectory() as tmp_dir: | ||
| path = os.path.join(tmp_dir, "entity_df.parquet") | ||
| entity_df.to_parquet(path, index=False) | ||
| mlflow.log_artifact(path) |
There was a problem hiding this comment.
🟡 Mixing global mlflow.log_artifact() with explicit-URI client causes artifact/metadata to target different servers
In _auto_log_entity_df_info, tags and params are logged via a locally-created MlflowClient(tracking_uri=tracking_uri) (lines 304, 309-318), but the entity DataFrame artifact is uploaded via the global mlflow.log_artifact(path) (line 327). The global function uses the tracking URI set by mlflow.set_tracking_uri(), not the explicit tracking_uri from the config. If the global tracking URI is changed by another library in the process, or if _init_mlflow_tracking failed silently (caught by except Exception at sdk/python/feast/feature_store.py:249), the artifact would be uploaded to a different server than where the tags/params were logged, splitting metadata across two servers.
Was this helpful? React with 👍 or 👎 to provide feedback.
| try: | ||
| import mlflow | ||
|
|
||
| tracking_uri = mlflow_cfg.tracking_uri or "http://127.0.0.1:5000" |
There was a problem hiding this comment.
🔴 UI server ignores MLFLOW_TRACKING_URI env var, falls back to hardcoded localhost
In ui_server.py, both the /api/mlflow-runs and /api/mlflow-feature-models endpoints resolve the MLflow tracking URI using mlflow_cfg.tracking_uri or "http://127.0.0.1:5000". This reads the raw tracking_uri field from the config and falls back to a hardcoded localhost URL, completely bypassing the MLFLOW_TRACKING_URI environment variable. In contrast, the rest of the codebase (feature_store.py:243, feature_store.py:325, feature_store.py:1696, feature_store.py:2885) correctly calls mlflow_cfg.get_tracking_uri() which checks the env var via sdk/python/feast/mlflow_integration/config.py:19-29. When a user sets MLFLOW_TRACKING_URI without setting tracking_uri in YAML (which is a very common deployment pattern, and documented in the PR's own docs at docs/reference/mlflow.md:51), the UI endpoints will incorrectly connect to http://127.0.0.1:5000 instead of the env-var-specified server.
| tracking_uri = mlflow_cfg.tracking_uri or "http://127.0.0.1:5000" | |
| tracking_uri = mlflow_cfg.get_tracking_uri() or "http://127.0.0.1:5000" |
Was this helpful? React with 👍 or 👎 to provide feedback.
| try: | ||
| import mlflow | ||
|
|
||
| tracking_uri = mlflow_cfg.tracking_uri or "http://127.0.0.1:5000" |
There was a problem hiding this comment.
🔴 Second UI endpoint also ignores MLFLOW_TRACKING_URI env var
Same issue as in the /api/mlflow-runs endpoint: the /api/mlflow-feature-models endpoint at sdk/python/feast/ui_server.py:234 uses mlflow_cfg.tracking_uri or "http://127.0.0.1:5000" instead of mlflow_cfg.get_tracking_uri(), causing the MLFLOW_TRACKING_URI environment variable to be ignored.
Was this helpful? React with 👍 or 👎 to provide feedback.
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
| ) | ||
|
|
||
| # Emit MLflow event for materialization (Phase 7) | ||
| _mat_duration = time.monotonic() - _retrieval_start if '_retrieval_start' in dir() else 0 |
There was a problem hiding this comment.
🟡 Undefined _retrieval_start variable in materialize_incremental causes bogus duration or NameError
In materialize_incremental, the code at line 2115 references _retrieval_start which is never defined in that method scope. The expression '_retrieval_start' in dir() checks local variable names but dir() without arguments returns module-level names, not local variables -- so the check is unreliable. If _retrieval_start is not in the returned list, _mat_duration defaults to 0 (acceptable but meaningless). However, if it IS in the name list (e.g., a module-level _retrieval_start existed from another context), it would attempt time.monotonic() - _retrieval_start on a potentially unrelated value, resulting in a wrong duration or a NameError/TypeError at runtime.
| _mat_duration = time.monotonic() - _retrieval_start if '_retrieval_start' in dir() else 0 | |
| _mat_duration = 0 |
Was this helpful? React with 👍 or 👎 to provide feedback.
| if tracking_uri: | ||
| mlflow.set_tracking_uri(tracking_uri) | ||
|
|
||
| experiment_name = f"{project}{ops_experiment_suffix}" | ||
| mlflow.set_experiment(experiment_name) |
There was a problem hiding this comment.
🔴 log_apply_to_mlflow and log_materialize_to_mlflow mutate global MLflow tracking URI and experiment, corrupting concurrent user runs
Both log_apply_to_mlflow (sdk/python/feast/mlflow_integration/logger.py:185-189) and log_materialize_to_mlflow (sdk/python/feast/mlflow_integration/logger.py:251-255) call mlflow.set_tracking_uri(tracking_uri) and mlflow.set_experiment(experiment_name) globally. These mutate process-wide global state. If a user has an active MLflow run in a different experiment (e.g. during feast apply inside a training script), these calls redirect subsequent MLflow operations to the ops experiment. While the functions try to restore mlflow.set_experiment(project) afterward (line 224, 274), the tracking URI is never restored. In the exception path, if mlflow.start_run() raises (line 202/258), the experiment remains set to the ops experiment. This corrupts any concurrent or subsequent user operations in the same process.
Prompt for agents
The functions log_apply_to_mlflow and log_materialize_to_mlflow in sdk/python/feast/mlflow_integration/logger.py mutate process-wide global state by calling mlflow.set_tracking_uri() and mlflow.set_experiment(). This corrupts any concurrent user MLflow runs in the same process.
The fix should save and restore the original tracking URI and experiment before/after the ops logging, or preferably use the MlflowClient API exclusively (which already takes tracking_uri as a parameter) instead of the global mlflow module-level functions. The MlflowClient.create_run() method allows creating runs in specific experiments without mutating global state. This pattern is already used in log_feature_retrieval_to_mlflow which correctly uses client = mlflow.MlflowClient(tracking_uri=tracking_uri) without touching global state.
Was this helpful? React with 👍 or 👎 to provide feedback.
| def mlflow(self): | ||
| """Get the MLflow configuration.""" | ||
| if not self._mlflow: | ||
| if isinstance(self.mlflow_config, Dict): | ||
| from feast.mlflow_integration.config import MlflowConfig | ||
|
|
||
| self._mlflow = MlflowConfig(**self.mlflow_config) | ||
| elif self.mlflow_config: | ||
| self._mlflow = self.mlflow_config | ||
| return self._mlflow |
There was a problem hiding this comment.
🟡 mlflow property returns falsy None on subsequent calls when MlflowConfig(enabled=False) is set
The mlflow property in RepoConfig (sdk/python/feast/repo_config.py:497-506) uses if not self._mlflow: as the guard. When the config is a valid MlflowConfig object with enabled=False, self._mlflow will be a truthy MlflowConfig instance (Pydantic models are truthy), so this works correctly on first access. However, if mlflow_config is None (the default), self._mlflow stays None forever and the property returns None -- which is fine. The real issue is: if mlflow_config is an empty dict {}, MlflowConfig(**{}) creates a valid config with enabled=False, and self._mlflow becomes truthy. But if mlflow_config is set to some falsy-like Pydantic object (unlikely but possible with custom subclasses), the if not self._mlflow guard would re-create it every time. This is the same pattern as openlineage property so it's consistent, but not actually a bug for standard usage.
Was this helpful? React with 👍 or 👎 to provide feedback.
| - **Online feature retrieval** -- `get_online_features()` tags the run with the same metadata | ||
| - **Entity DataFrame archival** -- optionally saves the training entity DataFrame as an MLflow artifact for full reproducibility | ||
| - **Execution context tagging** -- tags runs with where they ran (workbench, KFP pipeline, feature server, or standalone) | ||
| - **Operation logging** -- optionally logs `feast apply` and `feast materialize` to a separate MLflow experiment |
There was a problem hiding this comment.
not sure this is actually useful
There was a problem hiding this comment.
actually, this is not updated one, thought of this one for backtracking from mllfow run to the workbench experiment but now it is omitted
| - **Entity DataFrame archival** -- optionally saves the training entity DataFrame as an MLflow artifact for full reproducibility | ||
| - **Execution context tagging** -- tags runs with where they ran (workbench, KFP pipeline, feature server, or standalone) | ||
| - **Operation logging** -- optionally logs `feast apply` and `feast materialize` to a separate MLflow experiment | ||
| - **Model-to-Feature resolution** -- map any MLflow model URI back to its Feast feature service |
| - **Execution context tagging** -- tags runs with where they ran (workbench, KFP pipeline, feature server, or standalone) | ||
| - **Operation logging** -- optionally logs `feast apply` and `feast materialize` to a separate MLflow experiment | ||
| - **Model-to-Feature resolution** -- map any MLflow model URI back to its Feast feature service | ||
| - **Training reproducibility** -- reconstruct the exact entity DataFrame from a past MLflow run |
| - **Operation logging** -- optionally logs `feast apply` and `feast materialize` to a separate MLflow experiment | ||
| - **Model-to-Feature resolution** -- map any MLflow model URI back to its Feast feature service | ||
| - **Training reproducibility** -- reconstruct the exact entity DataFrame from a past MLflow run | ||
| - **Training-to-prediction linkage** -- `FeastMlflowClient.load_model()` links prediction runs back to their training runs |
| - **Model-to-Feature resolution** -- map any MLflow model URI back to its Feast feature service | ||
| - **Training reproducibility** -- reconstruct the exact entity DataFrame from a past MLflow run | ||
| - **Training-to-prediction linkage** -- `FeastMlflowClient.load_model()` links prediction runs back to their training runs | ||
| - **Feast MLflow Client** -- a thin wrapper that eliminates direct `import mlflow` in user code |
| _consecutive_failures = 0 | ||
|
|
||
|
|
||
| def log_feature_retrieval_to_mlflow( |
There was a problem hiding this comment.
i guess if we have this enabled why do we need to even have the wrapper? can't we full handle this for the user?
There was a problem hiding this comment.
I guess my question is, can't we just hide all of the mlflow client usage to the user?
There was a problem hiding this comment.
the auto-logging in logger.py fully handled for users who already use mlflow.start_run() or client.start_run() considering our FeastMlflowClient wrapper which is for users who want to avoid import mlflow entirely and get additional Feast-specific behavior on top ex: logging a model, registering it, loading it for inference etc soFeastMlflowClient is for users who want the extra lineage features without touching MLflow directly
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
There was a problem hiding this comment.
🟡 get_online_features_async missing MLflow auto-logging unlike its sync counterpart
The sync get_online_features method at sdk/python/feast/feature_store.py:2942-2997 includes MLflow auto-logging of feature retrieval metadata, but the async counterpart get_online_features_async at sdk/python/feast/feature_store.py:3001-3047 has no MLflow logging at all. Any feature retrieval through the async path will silently skip MLflow tracking, breaking the "every retrieval is logged" contract described in the docs. This is inconsistent since the methods serve the same purpose, and users switching between sync and async will lose telemetry.
(Refers to lines 3037-3047)
Prompt for agents
The get_online_features_async method at feature_store.py:3001-3047 is missing the MLflow auto-logging that was added to the sync get_online_features method at lines 2954-2997. The same pattern of logging feature refs, entity count, duration, and feature service name should be added after the await call in get_online_features_async. Note that the logging code itself is synchronous (just HTTP calls to MLflow), so it can be called directly after the await. Follow the same pattern as get_online_features: capture _retrieval_start before the provider call, then after the response, wrap the MLflow logging in a try/except.
Was this helpful? React with 👍 or 👎 to provide feedback.
| if self.config.mlflow.auto_log_entity_df: | ||
| self._auto_log_entity_df_info( | ||
| entity_df, start_date=start_date, end_date=end_date | ||
| ) |
There was a problem hiding this comment.
🟡 Entity DataFrame metadata is only logged when auto_log_entity_df is True, contradicting documented behavior
The documentation at docs/reference/mlflow.md:116-124 explicitly states: "Regardless of auto_log_entity_df, the following metadata is logged when present" — listing feast.entity_df_type, feast.entity_df_rows, feast.entity_df_columns, etc. However, _auto_log_entity_df_info (which logs ALL entity df metadata including both lightweight tags/params and the heavier parquet artifact) is only called when self.config.mlflow.auto_log_entity_df is True at sdk/python/feast/feature_store.py:1753. With the default configuration (auto_log_entity_df: false), no entity df metadata is logged at all. The intent per the docs is that the lightweight metadata (type, row count, column names) should always be logged, and only the artifact upload should be gated by auto_log_entity_df.
Prompt for agents
The _auto_log_entity_df_info method at feature_store.py:314-370 handles both lightweight metadata logging (entity_df_type, entity_df_rows, entity_df_columns tags/params) and the heavier artifact upload (entity_df.parquet). Currently it is only called when auto_log_entity_df is True (line 1753), but the documentation states that metadata should always be logged regardless of auto_log_entity_df.
The fix should split the logic: always call the metadata logging portion when auto_log is enabled (i.e., move the metadata logging part outside the auto_log_entity_df guard), and only gate the artifact upload (parquet file) on auto_log_entity_df. One approach is to refactor _auto_log_entity_df_info to accept a flag controlling whether to upload the artifact, or split it into two methods: one for metadata (always called) and one for the artifact (called only when auto_log_entity_df is True).
Was this helpful? React with 👍 or 👎 to provide feedback.
What this PR does / why we need it:
final_mlflow_demo.mp4
Auto-logging: Feature retrieval metadata is tagged on the active MLflow run (feast.feature_refs, feast.feature_views, feast.feature_service, feast.entity_count, etc.)Entity DataFrame archival: Optionally saves the training entity DataFrame as an MLflow artifact (entity_df.parquet) for full reproducibilityModel-to-feature-service resolution: resolve_feature_service_from_model_uri() maps any MLflow model URI back to its Feast feature service enabling serving pipelines to auto-discover which features a model needsEntity DataFrame reconstruction: get_entity_df_from_mlflow_run() rebuilds the exact entity DataFrame from a past run's artifacts, enabling training reproducibilityConfiguration: Controlled entirely via feature_store.yaml under a new mlflow: blockWhich issue(s) this PR fixes:
Checks
git commit -s)Testing Strategy
Misc