feat: Data Quality Monitoring added in feast UI#6422
Conversation
There was a problem hiding this comment.
Pull request overview
Adds Data Quality Monitoring (DQM) across the Feast stack, including a new Monitoring section in the Feast UI, new monitoring REST endpoints + CLI, and multi-backend offline-store support for computing/storing monitoring metrics (plus metrics/audit logging enhancements).
Changes:
- UI: Adds Monitoring pages (dashboard, feature detail, feature tab) and react-query hooks for monitoring endpoints.
- SDK/Backend: Adds monitoring compute/storage abstractions to
OfflineStoreand implements them for multiple backends; adds monitoring REST router andfeast monitorCLI. - Ops/Docs: Adds operator CRD + repo-config mapping for DQM config, expands metrics/audit logging, and adds monitoring docs + quickstart references.
Reviewed changes
Copilot reviewed 55 out of 59 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| ui/src/queries/useMonitoringApi.ts | New react-query hooks and fetch helpers for monitoring endpoints |
| ui/src/pages/Sidebar.tsx | Adds “Monitoring” item to the sidebar |
| ui/src/pages/monitoring/Index.tsx | Monitoring landing page with tabs + filters + compute action |
| ui/src/pages/monitoring/FeatureViewMetricsPanel.tsx | Feature-view aggregate metrics panel/table |
| ui/src/pages/monitoring/FeatureServiceMetricsPanel.tsx | Feature-service aggregate metrics panel/table |
| ui/src/pages/monitoring/FeatureMetricsTable.tsx | Feature metrics table with inline mini-histograms |
| ui/src/pages/monitoring/FeatureMetricsDetail.tsx | Feature-level detail view (distribution + stats + null-rate timeline) |
| ui/src/pages/monitoring/components/StatsPanel.tsx | Stats panel for a single feature metric (with baseline comparison) |
| ui/src/pages/monitoring/components/MetricsFilters.tsx | Filters UI for monitoring queries |
| ui/src/pages/monitoring/components/HistogramChart.tsx | SVG histogram rendering for numeric/categorical features |
| ui/src/pages/features/FeatureMonitoringTab.tsx | Adds Monitoring tab content on feature detail pages |
| ui/src/pages/features/FeatureInstance.tsx | Adds “Monitoring” tab to feature instance navigation/routes |
| ui/src/FeastUISansProviders.tsx | Wires Monitoring routes and Monitoring context into the UI app |
| ui/src/contexts/MonitoringContext.ts | New context for monitoring API base URL and enable flag |
| ui/package-lock.json | Updates UI package lock (including version bump) |
| sdk/python/tests/unit/monitoring/test_metrics_calculator.py | Unit tests for metrics calculator + NaN/Inf sanitization |
| sdk/python/tests/unit/monitoring/init.py | Adds unit test package init for monitoring |
| sdk/python/tests/integration/monitoring/init.py | Adds integration test package init for monitoring |
| sdk/python/feast/repo_config.py | Adds DqmConfig and dqm repo config field |
| sdk/python/feast/monitoring/monitoring_utils.py | Shared monitoring constants + helpers for normalization/aggregation |
| sdk/python/feast/monitoring/metrics_calculator.py | PyArrow/NumPy fallback metrics calculator |
| sdk/python/feast/monitoring/dqm_job_manager.py | DQM job persistence/status manager using offline store storage |
| sdk/python/feast/monitoring/init.py | Exposes monitoring public API symbols |
| sdk/python/feast/metrics.py | Adds offline retrieval metrics + structured audit logging helpers |
| sdk/python/feast/infra/offline_stores/offline_store.py | Adds monitoring compute/storage abstract methods; adds offline retrieval instrumentation |
| sdk/python/feast/infra/offline_stores/duckdb.py | Implements monitoring compute + parquet-backed storage for DuckDB |
| sdk/python/feast/infra/offline_stores/dask.py | Implements monitoring compute + parquet-backed storage for Dask |
| sdk/python/feast/infra/offline_stores/contrib/spark_offline_store/spark.py | Implements monitoring compute + SparkSQL storage for Spark |
| sdk/python/feast/infra/offline_stores/contrib/oracle_offline_store/oracle.py | Implements monitoring compute + Oracle storage via MERGE |
| sdk/python/feast/infra/feature_servers/base_config.py | Adds new metrics config flags: offline_features + audit_logging |
| sdk/python/feast/feature_server.py | Emits online audit logs around get-online-features calls |
| sdk/python/feast/cli/monitor.py | Adds feast monitor run CLI for batch/log monitoring compute |
| sdk/python/feast/cli/cli.py | Registers the new monitor CLI command group |
| sdk/python/feast/api/registry/rest/monitoring.py | Adds FastAPI router for monitoring compute/read endpoints |
| sdk/python/feast/api/registry/rest/init.py | Registers monitoring router with the registry REST API |
| Makefile | Avoids recreating .venv in CI install target |
| infra/feast-operator/internal/controller/services/services_types.go | Adds DQM YAML config struct to operator repo config |
| infra/feast-operator/internal/controller/services/repo_config.go | Maps operator DQM spec to repo config YAML |
| infra/feast-operator/internal/controller/services/repo_config_test.go | Tests operator repo config YAML includes dqm.auto_baseline |
| infra/feast-operator/docs/api/markdown/ref.md | Documents operator DQM config API fields |
| infra/feast-operator/dist/install.yaml | Updates CRD schema with spec.dqm.autoBaseline |
| infra/feast-operator/config/samples/v1_featurestore_serving.yaml | Documents new metrics flags in sample config |
| infra/feast-operator/config/crd/bases/feast.dev_featurestores.yaml | Updates CRD base schema with DQM config |
| infra/feast-operator/api/v1/zz_generated.deepcopy.go | Adds deepcopy support for DQM config |
| infra/feast-operator/api/v1/featurestore_types.go | Adds dqm field + type to operator API |
| docs/SUMMARY.md | Adds links to monitoring quickstart and how-to guide |
| docs/reference/feature-servers/python-feature-server.md | Documents offline retrieval metrics + audit logging |
| docs/how-to-guides/feature-monitoring.md | New how-to guide for feature monitoring |
| .secrets.baseline | Updates secrets baseline for new notebook content |
Files not reviewed (2)
- infra/feast-operator/api/v1/zz_generated.deepcopy.go: Language not supported
- ui/package-lock.json: Language not supported
Comments suppressed due to low confidence (3)
ui/src/FeastUISansProviders.tsx:161
- The routing JSX appears malformed (nested duplicate
/p/:projectName/*Routeblocks and inconsistent indentation), suggesting one of the<Route>elements isn’t being properly closed before sibling routes are declared. This will either fail compilation or produce an unexpected route hierarchy; please re-check the<Route>nesting and ensure each opened<Route>is closed before adding siblings likedata-set/,permissions/,monitoring/, etc.
This issue also appears on line 221 of the same file.
ui/src/FeastUISansProviders.tsx:226
- The provider closing tags are unbalanced here:
</FeatureFlagsContext.Provider>is present but there is no corresponding<FeatureFlagsContext.Provider>opening tag in this file, andDataModeContext.Provider(opened above) is never closed. This will break compilation and/or context propagation—please fix the provider nesting and ensure every opened provider is properly closed.
ui/src/queries/useMonitoringApi.ts:223 useComputeMetricsPOST to/monitoring/computealso ignoresfetchOptions/credentials used elsewhere in the UI. If the registry server is protected via cookies or auth headers, the compute call may fail. Consider passing through the same headers/credentials strategy used byrestFetchfor consistency.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const qs = buildQueryString(params); | ||
| const res = await fetch(`${baseUrl}${path}${qs}`); | ||
| if (!res.ok) { | ||
| throw new Error(`Failed to fetch ${path}: ${res.status} ${res.statusText}`); | ||
| } | ||
| const text = await res.text(); | ||
| const sanitized = text.replace(/:\s*NaN/g, ": null").replace(/:\s*Infinity/g, ": null").replace(/:\s*-Infinity/g, ": null"); | ||
| return JSON.parse(sanitized); | ||
| }; |
| { | ||
| onSuccess: () => { | ||
| queryClient.invalidateQueries("monitoring-features"); | ||
| queryClient.invalidateQueries("monitoring-feature-views"); | ||
| queryClient.invalidateQueries("monitoring-feature-services"); | ||
| }, | ||
| }, |
| const useBaselineMetrics = ( | ||
| project: string, | ||
| featureViewName?: string, | ||
| featureName?: string, | ||
| dataSourceType?: string, | ||
| ) => { | ||
| const { apiBaseUrl, enabled } = useContext(MonitoringContext); | ||
| return useQuery<FeatureMetric[]>( | ||
| ["monitoring-baseline", project, featureViewName, featureName], | ||
| () => | ||
| fetchMonitoring<FeatureMetric[]>( | ||
| apiBaseUrl, | ||
| "/monitoring/metrics/baseline", | ||
| { | ||
| project, | ||
| feature_view_name: featureViewName, | ||
| feature_name: featureName, | ||
| data_source_type: dataSourceType, | ||
| }, | ||
| ), | ||
| { staleTime: STALE_TIME, enabled, retry: 1 }, |
| const hasError = | ||
| featureQuery.isError && fvQuery.isError && fsQuery.isError; | ||
| const hasData = | ||
| (featureQuery.data && featureQuery.data.length > 0) || | ||
| (fvQuery.data && fvQuery.data.length > 0); |
| <h4 style={{ fontSize: 14, fontWeight: 600, marginBottom: 8 }}> | ||
| Null Rate Over Time | ||
| </h4> | ||
| <svg width={chartWidth} height={chartHeight + 20} role="img"> | ||
| <polyline | ||
| points={polyline} |
| if job_type == "auto_compute": | ||
| result = monitoring_service.auto_compute( | ||
| project=project, | ||
| feature_view_name=job.get("feature_view_name"), | ||
| ) | ||
| elif job_type == "baseline": | ||
| result = monitoring_service.compute_baseline( | ||
| project=project, | ||
| feature_view_name=job.get("feature_view_name"), | ||
| feature_names=params.get("feature_names"), | ||
| ) | ||
| elif job_type == "compute": | ||
| result = monitoring_service.compute_metrics( | ||
| project=project, | ||
| feature_view_name=job.get("feature_view_name"), | ||
| feature_names=params.get("feature_names"), | ||
| start_date=date.fromisoformat(params["start_date"]) | ||
| if params.get("start_date") | ||
| else None, | ||
| end_date=date.fromisoformat(params["end_date"]) | ||
| if params.get("end_date") | ||
| else None, | ||
| granularity=params.get("granularity", "daily"), | ||
| ) |
| float_array = pc.cast(valid, pa.float64()) | ||
| result["mean"] = _safe_float(pc.mean(float_array).as_py()) # type: ignore[attr-defined] | ||
| result["stddev"] = _safe_float(pc.stddev(float_array, ddof=1).as_py()) # type: ignore[attr-defined] | ||
|
|
||
| min_max = pc.min_max(float_array) # type: ignore[attr-defined] | ||
| result["min_val"] = min_max["min"].as_py() | ||
| result["max_val"] = min_max["max"].as_py() | ||
|
|
||
| quantiles = pc.quantile(float_array, q=[0.50, 0.75, 0.90, 0.95, 0.99]) # type: ignore[attr-defined] | ||
| q_values = quantiles.to_pylist() | ||
| result["p50"] = q_values[0] | ||
| result["p75"] = q_values[1] | ||
| result["p90"] = q_values[2] | ||
| result["p95"] = q_values[3] | ||
| result["p99"] = q_values[4] | ||
|
|
| @router.post("/monitoring/compute", tags=["Monitoring"]) | ||
| async def compute_metrics(request: ComputeMetricsRequest): | ||
| """Submit a DQM job to compute and store metrics. Returns job_id.""" | ||
| if request.granularity not in VALID_GRANULARITIES: | ||
| raise HTTPException( | ||
| status_code=400, | ||
| detail=f"Invalid granularity '{request.granularity}'. " | ||
| f"Must be one of {VALID_GRANULARITIES}", | ||
| ) | ||
|
|
||
| store = _get_store() | ||
| if request.feature_view_name: | ||
| fv = store.registry.get_feature_view( | ||
| name=request.feature_view_name, project=request.project | ||
| ) | ||
| assert_permissions(fv, actions=[AuthzedAction.UPDATE]) | ||
|
|
||
| svc = _get_monitoring_service() |
| const fetchMonitoring = async <T>( | ||
| baseUrl: string, | ||
| path: string, | ||
| params: Record<string, string | undefined>, | ||
| ): Promise<T> => { | ||
| const qs = buildQueryString(params); | ||
| const res = await fetch(`${baseUrl}${path}${qs}`); | ||
| if (!res.ok) { | ||
| throw new Error(`Failed to fetch ${path}: ${res.status} ${res.statusText}`); | ||
| } |
| enabled: boolean; | ||
| } | ||
|
|
||
| const MonitoringContext = React.createContext<MonitoringConfig>({ |
There was a problem hiding this comment.
all monitoring hooks fire even when DQM isn’t configured. Consider defaulting to false and enabling only when DQM is present. Monitoring nav is always shown with no gating on monitoringConfig.enabled. Hide it when monitoring is disabled.
There was a problem hiding this comment.
After discussions we are keeping it always and showing the empty state with requesting to enable monitoring, when monitoring is not enabled.
d30a755 to
a107a99
Compare
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
# [0.64.0](v0.63.0...v0.64.0) (2026-06-13) ### Bug Fixes * Add async_supported property to RedisOnlineStore ([9b088fe](9b088fe)) * Add missing feast init templates to operator CRD and enhance persistence documentation ([1941d4d](1941d4d)) * Allow to publish from reference branch ([5458ec8](5458ec8)) * API calls list ([4203eb7](4203eb7)) * **bigquery:** Enable list inference for parquet loads in offline_write_batch ([9243497](9243497)), closes [#5845](#5845) * Bump grpcio dependencies ([07b4782](07b4782)) * **compute-engine/local:** Honor field_mapping on join keys in dedup + join nodes ([#6395](#6395)) ([bd01824](bd01824)) * **dynamodb:** Avoid tag race condition by using diff-based tag updates ([#6479](#6479)) ([bad2b7d](bad2b7d)), closes [#6418](#6418) * **dynamodb:** Fix mypy type for _build_projection_expression return ([217b4da](217b4da)) * Fix intermittent async test failures for DynamoDB and Redis ([63c5eb1](63c5eb1)) * Fix mongodb blog title ([57d28d4](57d28d4)) * Fix shared SQL registry crash - avoid unnecessary UDF deserialization in proto cache building ([ac588d7](ac588d7)) * Fix SparkRetrievalJob.persist() failing for SparkSource ([209d7cd](209d7cd)) * Fixed formatting and image for mongo blog ([#6377](#6377)) ([f8389fb](f8389fb)) * Fixes for ray source ([7f592a4](7f592a4)) * **go:** skip registry refresh when cache_ttl_seconds <= 0 ([97ed40c](97ed40c)) * Handle array of strings columns in Athena materialization ([#6324](#6324)) ([4ed0278](4ed0278)) * make milvus VARCHAR max_length configurable, remove hardcoded 512 limit ([3b98c22](3b98c22)) * **operator:** Set appProtocol: grpc on registry gRPC Service ([#6367](#6367)) ([c9ae2b4](c9ae2b4)) * PyJWT 2.10+ added validation that rejects empty HMAC keys ([e756ffe](e756ffe)) * RemoteOnlineStore sends all features in a single HTTP request ([8f187dd](8f187dd)) * Remove registry proto dump to enforce RBAC and add permission checks to Commit/Refresh RPCs ([328431f](328431f)) * Remove selector migration job - no longer needed ([51c325e](51c325e)) * replace broken .claude skill symlink with correct relative path ([4541690](4541690)) * Replace selector label strip patch with migration Job for upgrade-safe selector uniqueness ([00dea50](00dea50)) * Scope feature view name conflict check to current project in file-based registry ([#6369](#6369)) ([a4fde83](a4fde83)), closes [#6209](#6209) * **snowflake:** Stop double-quoting connection identifiers ([#6462](#6462)) ([e914d59](e914d59)) * **spark:** S3/GCS PyArrow filesystem resolution for staging paths ([#6442](#6442)) ([ae50414](ae50414)) * **trino:** Clean up temporary entity tables after retrieval ([#6381](#6381)) ([d86b13d](d86b13d)), closes [#6306](#6306) * Update go-feature-server base image to Go 1.25 and fix operator Dockerfile COPY permissions ([86ef0bc](86ef0bc)) ### Features * [Backend] Data Quality Monitoring with native compute, multi-backend support, REST API, CLI ([#6202](#6202)) ([5458c37](5458c37)) * Add apache flink compute engine ([#6476](#6476)) ([9636d6a](9636d6a)) * Add demo noteboooks for users ([e362173](e362173)) * Add enabled/disabled toggle for feature views ([#6401](#6401)) ([5f1fa0d](5f1fa0d)), closes [#6395](#6395) * Add Label View to init template ([ec272d5](ec272d5)) * Add mTLS support to remote registry gRPC client ([#6474](#6474)) ([c9602d8](c9602d8)) * Add Prometheus gauges for FeatureStore installation telemetry ([#6354](#6354)) ([1b681b7](1b681b7)) * Adds registry REST API endpoints for managing entities, data sources, and feature views ([#6413](#6413)) ([f77bd1d](f77bd1d)) * Allow CRUD on entities, data sources, and feature views from UI ([#6412](#6412)) ([2321c07](2321c07)) * Allow default openlineage configuration ([#6467](#6467)) ([276b6df](276b6df)) * **bigquery:** Support DATE-type event timestamp columns ([#6362](#6362)) ([753dee5](753dee5)), closes [#2530](#2530) * **cli:** Add `feast projects delete` command (closes [#5095](#5095)) ([#6318](#6318)) ([1a4b96c](1a4b96c)) * Data Quality Monitoring added in feast UI ([#6422](#6422)) ([fa271be](fa271be)) * **dynamodb:** Use ProjectionExpression when requested_features is set ([0adc906](0adc906)), closes [#6058](#6058) * Enhance DataSource and FeatureView modals with error handling and submission states ([96d7169](96d7169)) * Expose registry endpoints on feature server for MCP access ([f77981c](f77981c)) * Feast First-Class LabelView Implementation ([#6292](#6292)) ([c0e7e5d](c0e7e5d)) * Feast-MLflow Integration ([#6235](#6235)) ([7279c75](7279c75)) * Operational metrics for offline store and SOX metrics for both ([#6340](#6340)) ([65b1b80](65b1b80)) * Pre-compute feature service ([8011550](8011550)) * REST API-backed UI for RBAC compatibility and per-page lazy loading ([#6414](#6414)) ([6ae80af](6ae80af)) * Support non-string map key types ([#6382](#6382)) ([#6383](#6383)) ([728aa2e](728aa2e)) * Update FeatureStore CRD with DRA Fields ([01241e4](01241e4)) ### Performance Improvements * Cache feature view resolution in get_online_features to reduce per-request overhead ([55c2f18](55c2f18)) * Optimize feature serving latency with batched async Redis, cached checks fix ([103809a](103809a)) * Replace MessageToDict with optimized custom dict builder ([#6015](#6015)) ([9902064](9902064))
What this PR does / why we need it:
Adds a Data Quality Monitoring UI to the Feast web interface. Users can view feature-level metrics (distributions, null rates, statistics), feature view aggregates, and feature service health — all from a new Monitoring sidebar section.
Key additions:
react-query) for all monitoring REST endpointsWhich issue(s) this PR fixes:
Part of the Feast monitoring initiative — provides the UI counterpart for the monitoring backend APIs.
Other PR that needs to be merged first
#6202
Screenshots
Checks
git commit -s)Testing Strategy