Skip to content

Commit d0b45bb

Browse files
committed
docs: Monitoring User Guide
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
1 parent 567dfae commit d0b45bb

3 files changed

Lines changed: 1171 additions & 0 deletions

File tree

docs/SUMMARY.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@
5656
* [RAG Fine Tuning with Feast and Milvus](../examples/rag-retriever/README.md)
5757
* [MCP - AI Agent Example](../examples/mcp_feature_store/README.md)
5858
* [Feast-Powered AI Agent](../examples/agent_feature_store/README.md)
59+
* [Feature Quality Monitoring Quickstart](../examples/monitoring/monitoring-quickstart.ipynb)
5960

6061
## How-to Guides
6162

@@ -81,6 +82,7 @@
8182
* [Adding or reusing tests](how-to-guides/adding-or-reusing-tests.md)
8283
* [Starting Feast servers in TLS(SSL) Mode](how-to-guides/starting-feast-servers-tls-mode.md)
8384
* [Importing Features from dbt](how-to-guides/dbt-integration.md)
85+
* [Feature Quality Monitoring](how-to-guides/feature-monitoring.md)
8486

8587
## Reference
8688

Lines changed: 368 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,368 @@
1+
# Feature Quality Monitoring
2+
3+
## Overview
4+
5+
Feast's data quality monitoring system computes, stores, and serves statistical metrics for every registered feature. It gives you visibility into feature health — distributions, null rates, percentiles, histograms — across batch data and feature serving logs.
6+
7+
This guide covers:
8+
9+
1. [Prerequisites](#1-prerequisites)
10+
2. [Auto-baseline on registration](#2-auto-baseline-on-registration)
11+
3. [Scheduled monitoring with the CLI](#3-scheduled-monitoring-with-the-cli)
12+
4. [Monitoring feature serving logs](#4-monitoring-feature-serving-logs)
13+
5. [Reading metrics via REST API](#5-reading-metrics-via-rest-api)
14+
6. [On-demand exploration (transient compute)](#6-on-demand-exploration)
15+
7. [Integrating with orchestrators](#7-integrating-with-orchestrators)
16+
8. [Supported backends](#8-supported-backends)
17+
18+
## 1. Prerequisites
19+
20+
Monitoring works with any supported offline store backend. No additional infrastructure or configuration is needed — monitoring tables are created automatically on first use.
21+
22+
**Minimum setup:**
23+
24+
- A Feast project with at least one feature view and a configured offline store
25+
- Feast SDK installed (`pip install feast`)
26+
27+
**For serving log monitoring:**
28+
29+
- At least one feature service with `logging_config` set (see [step 4](#4-monitoring-feature-serving-logs))
30+
31+
## 2. Auto-baseline on registration
32+
33+
When you run `feast apply` to register new features, Feast automatically queues baseline metric computation:
34+
35+
```bash
36+
$ feast apply
37+
Applying changes...
38+
Created feature view 'driver_stats' with 3 features
39+
→ Queued baseline metrics computation (DQM job: abc-123)
40+
Done!
41+
```
42+
43+
The baseline reads all available source data and stores the resulting statistics with `is_baseline=TRUE`. This serves as the reference distribution for future drift detection.
44+
45+
Baseline computation is:
46+
- **Non-blocking**`feast apply` returns immediately; computation runs asynchronously
47+
- **Idempotent** — only features without existing baselines are computed; re-running `feast apply` won't recompute existing baselines
48+
49+
## 3. Scheduled monitoring with the CLI
50+
51+
### Auto mode (recommended for production)
52+
53+
Schedule a single daily job that computes all granularities automatically:
54+
55+
```bash
56+
feast monitor run
57+
```
58+
59+
This detects the latest event timestamp in the source data and computes metrics for 5 time windows:
60+
61+
| Granularity | Window |
62+
|-------------|--------|
63+
| `daily` | Last 1 day |
64+
| `weekly` | Last 7 days |
65+
| `biweekly` | Last 14 days |
66+
| `monthly` | Last 30 days |
67+
| `quarterly` | Last 90 days |
68+
69+
No date arguments needed. One scheduled job produces all granularities.
70+
71+
### Targeting a specific feature view
72+
73+
```bash
74+
feast monitor run --feature-view driver_stats
75+
```
76+
77+
### Explicit date range and granularity
78+
79+
```bash
80+
feast monitor run \
81+
--feature-view driver_stats \
82+
--start-date 2025-01-01 \
83+
--end-date 2025-01-07 \
84+
--granularity weekly
85+
```
86+
87+
### Setting a manual baseline
88+
89+
```bash
90+
feast monitor run \
91+
--feature-view driver_stats \
92+
--start-date 2025-01-01 \
93+
--end-date 2025-03-31 \
94+
--granularity daily \
95+
--set-baseline
96+
```
97+
98+
### CLI reference
99+
100+
```
101+
Usage: feast monitor run [OPTIONS]
102+
103+
Options:
104+
-p, --project TEXT Feast project name (defaults to feature_store.yaml)
105+
-v, --feature-view TEXT Feature view name (omit for all)
106+
-f, --feature-name TEXT Feature name(s), repeatable (omit for all)
107+
--start-date TEXT Start date YYYY-MM-DD (omit for auto-detect)
108+
--end-date TEXT End date YYYY-MM-DD (omit for auto-detect)
109+
-g, --granularity One of: daily, weekly, biweekly, monthly, quarterly
110+
--set-baseline Mark this computation as baseline
111+
--source-type One of: batch, log, all (default: batch)
112+
--help Show this message and exit.
113+
```
114+
115+
## 4. Monitoring feature serving logs
116+
117+
If your feature services have logging configured, you can compute metrics from the actual features served to models in production.
118+
119+
### Setting up feature service logging
120+
121+
In your feature definitions:
122+
123+
```python
124+
from feast import FeatureService, LoggingConfig
125+
from feast.infra.offline_stores.contrib.postgres_offline_store.postgres_source import (
126+
PostgreSQLLoggingDestination,
127+
)
128+
129+
driver_service = FeatureService(
130+
name="driver_service",
131+
features=[driver_stats_fv],
132+
logging_config=LoggingConfig(
133+
destination=PostgreSQLLoggingDestination(table_name="feast_driver_logs"),
134+
sample_rate=1.0,
135+
),
136+
)
137+
```
138+
139+
### Computing log metrics
140+
141+
**Auto mode (all feature services with logging):**
142+
143+
```bash
144+
feast monitor run --source-type log
145+
```
146+
147+
**Specific feature service:**
148+
149+
```bash
150+
feast monitor run --source-type log --feature-view driver_service
151+
```
152+
153+
**Both batch and log in one run:**
154+
155+
```bash
156+
feast monitor run --source-type all
157+
```
158+
159+
Log metrics are stored with `data_source_type="log"` alongside batch metrics in the same monitoring tables. Feature names from the log schema (e.g., `driver_stats__conv_rate`) are automatically normalized back to their original names (`conv_rate`) and associated with the correct feature view — enabling batch-vs-log comparison and drift detection.
160+
161+
### Via REST API
162+
163+
```bash
164+
# Compute log metrics
165+
POST /monitoring/compute/log
166+
{
167+
"project": "my_project",
168+
"feature_service_name": "driver_service",
169+
"granularity": "daily"
170+
}
171+
172+
# Auto-compute all log metrics
173+
POST /monitoring/auto_compute/log
174+
{
175+
"project": "my_project"
176+
}
177+
```
178+
179+
## 5. Reading metrics via REST API
180+
181+
All read endpoints support cascading filters: `project``feature_service_name``feature_view_name``feature_name``granularity``data_source_type`.
182+
183+
### Per-feature metrics
184+
185+
```
186+
GET /monitoring/metrics/features?project=my_project&feature_view_name=driver_stats&granularity=daily
187+
```
188+
189+
**Response:**
190+
191+
```json
192+
[
193+
{
194+
"project_id": "my_project",
195+
"feature_view_name": "driver_stats",
196+
"feature_name": "conv_rate",
197+
"feature_type": "numeric",
198+
"metric_date": "2025-03-26",
199+
"granularity": "daily",
200+
"data_source_type": "batch",
201+
"row_count": 15000,
202+
"null_count": 12,
203+
"null_rate": 0.0008,
204+
"mean": 0.523,
205+
"stddev": 0.189,
206+
"min_val": 0.001,
207+
"max_val": 0.998,
208+
"p50": 0.51,
209+
"p75": 0.68,
210+
"p90": 0.82,
211+
"p95": 0.89,
212+
"p99": 0.96,
213+
"histogram": {
214+
"bins": [0.0, 0.05, 0.1, "..."],
215+
"counts": [120, 340, 560, "..."],
216+
"bin_width": 0.05
217+
}
218+
}
219+
]
220+
```
221+
222+
### Per-feature-view aggregates
223+
224+
```
225+
GET /monitoring/metrics/feature_views?project=my_project&feature_view_name=driver_stats
226+
```
227+
228+
### Per-feature-service aggregates
229+
230+
```
231+
GET /monitoring/metrics/feature_services?project=my_project&feature_service_name=driver_service
232+
```
233+
234+
### Baseline
235+
236+
```
237+
GET /monitoring/metrics/baseline?project=my_project&feature_view_name=driver_stats
238+
```
239+
240+
### Time-series (for trend charts)
241+
242+
```
243+
GET /monitoring/metrics/timeseries?project=my_project&feature_name=conv_rate&granularity=daily&start_date=2025-01-01&end_date=2025-03-31
244+
```
245+
246+
### Filtering batch vs. log metrics
247+
248+
Add `data_source_type=batch` or `data_source_type=log` to any read endpoint:
249+
250+
```
251+
GET /monitoring/metrics/features?project=my_project&data_source_type=log
252+
```
253+
254+
### Full endpoint reference
255+
256+
| Method | Endpoint | Description |
257+
|--------|----------|-------------|
258+
| `POST` | `/monitoring/compute` | Submit batch DQM job |
259+
| `POST` | `/monitoring/auto_compute` | Auto-detect dates, all granularities |
260+
| `POST` | `/monitoring/compute/transient` | On-demand compute (not stored) |
261+
| `POST` | `/monitoring/compute/log` | Compute from serving logs |
262+
| `POST` | `/monitoring/auto_compute/log` | Auto-detect log dates, all granularities |
263+
| `GET` | `/monitoring/jobs/{job_id}` | DQM job status |
264+
| `GET` | `/monitoring/metrics/features` | Per-feature metrics |
265+
| `GET` | `/monitoring/metrics/feature_views` | Per-view aggregates |
266+
| `GET` | `/monitoring/metrics/feature_services` | Per-service aggregates |
267+
| `GET` | `/monitoring/metrics/baseline` | Baseline metrics |
268+
| `GET` | `/monitoring/metrics/timeseries` | Time-series data |
269+
270+
## 6. On-demand exploration
271+
272+
When you need metrics for an arbitrary date range (e.g., "show me the distribution for Jan 5 to Jan 20"), use the transient compute endpoint. It reads source data for the exact range, computes fresh statistics, and returns them directly without storing.
273+
274+
```bash
275+
POST /monitoring/compute/transient
276+
{
277+
"project": "my_project",
278+
"feature_view_name": "driver_stats",
279+
"feature_names": ["conv_rate"],
280+
"start_date": "2025-01-05",
281+
"end_date": "2025-01-20"
282+
}
283+
```
284+
285+
This is necessary because pre-computed histograms from different date ranges have different bin edges and cannot be merged losslessly.
286+
287+
## 7. Integrating with orchestrators
288+
289+
### Airflow
290+
291+
```python
292+
from airflow.operators.bash import BashOperator
293+
294+
monitor_task = BashOperator(
295+
task_id="feast_monitor",
296+
bash_command="feast monitor run",
297+
cwd="/path/to/feast/repo",
298+
)
299+
```
300+
301+
### Kubeflow Pipelines (KFP)
302+
303+
```python
304+
from kfp import dsl
305+
306+
@dsl.component(base_image="feast-image:latest")
307+
def monitor_features():
308+
import subprocess
309+
subprocess.run(["feast", "monitor", "run"], check=True, cwd="/feast/repo")
310+
```
311+
312+
### Cron
313+
314+
```cron
315+
# Daily at 2:00 AM UTC
316+
0 2 * * * cd /path/to/feast/repo && feast monitor run >> /var/log/feast-monitor.log 2>&1
317+
```
318+
319+
### Monitoring both batch and log in one job
320+
321+
```bash
322+
feast monitor run --source-type all
323+
```
324+
325+
## 8. Supported backends
326+
327+
Monitoring works natively with all offline stores that serve as compute engines for Feast materialization:
328+
329+
| Backend | Compute | Storage |
330+
|---------|---------|---------|
331+
| PostgreSQL | SQL push-down | `INSERT ON CONFLICT` |
332+
| Snowflake | SQL push-down | `MERGE` with `VARIANT` JSON |
333+
| BigQuery | SQL push-down | `MERGE` into BQ tables |
334+
| Redshift | SQL push-down | `MERGE` via Data API |
335+
| Spark | SparkSQL push-down | Parquet tables |
336+
| Oracle | SQL via Ibis | `MERGE` from `DUAL` |
337+
| DuckDB | In-memory SQL | Parquet files |
338+
| Dask | PyArrow compute | Parquet files |
339+
340+
Backends not listed above fall back to Python-based computation — the offline store's `pull_all_from_table_or_query()` returns a PyArrow Table, and metrics are computed using `pyarrow.compute` and `numpy`.
341+
342+
## What metrics are computed
343+
344+
**Per-feature (full profile):**
345+
346+
| Metric | Numeric | Categorical |
347+
|--------|:-------:|:-----------:|
348+
| row_count, null_count, null_rate | Yes | Yes |
349+
| mean, stddev, min, max | Yes ||
350+
| p50, p75, p90, p95, p99 | Yes ||
351+
| histogram (JSONB) | Binned distribution | Top-N values with counts |
352+
353+
**Per-feature-view and per-feature-service (aggregate summaries):**
354+
355+
| Metric | Description |
356+
|--------|-------------|
357+
| total_row_count | Total rows in the view |
358+
| total_features | Number of features |
359+
| features_with_nulls | Count of features with any nulls |
360+
| avg_null_rate, max_null_rate | Aggregate null rate statistics |
361+
362+
## RBAC
363+
364+
Monitoring respects Feast's existing RBAC:
365+
366+
- **Compute operations** (`POST /monitoring/compute`, `/auto_compute`, `/compute/log`, `/auto_compute/log`) require `AuthzedAction.UPDATE`
367+
- **Transient compute** (`POST /monitoring/compute/transient`) requires `AuthzedAction.DESCRIBE`
368+
- **Read operations** (`GET /monitoring/metrics/*`) require `AuthzedAction.DESCRIBE`

0 commit comments

Comments
 (0)