|
| 1 | +# Feature Quality Monitoring |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +Feast's data quality monitoring system computes, stores, and serves statistical metrics for every registered feature. It gives you visibility into feature health — distributions, null rates, percentiles, histograms — across batch data and feature serving logs. |
| 6 | + |
| 7 | +This guide covers: |
| 8 | + |
| 9 | +1. [Prerequisites](#1-prerequisites) |
| 10 | +2. [Auto-baseline on registration](#2-auto-baseline-on-registration) |
| 11 | +3. [Scheduled monitoring with the CLI](#3-scheduled-monitoring-with-the-cli) |
| 12 | +4. [Monitoring feature serving logs](#4-monitoring-feature-serving-logs) |
| 13 | +5. [Reading metrics via REST API](#5-reading-metrics-via-rest-api) |
| 14 | +6. [On-demand exploration (transient compute)](#6-on-demand-exploration) |
| 15 | +7. [Integrating with orchestrators](#7-integrating-with-orchestrators) |
| 16 | +8. [Supported backends](#8-supported-backends) |
| 17 | + |
| 18 | +## 1. Prerequisites |
| 19 | + |
| 20 | +Monitoring works with any supported offline store backend. No additional infrastructure or configuration is needed — monitoring tables are created automatically on first use. |
| 21 | + |
| 22 | +**Minimum setup:** |
| 23 | + |
| 24 | +- A Feast project with at least one feature view and a configured offline store |
| 25 | +- Feast SDK installed (`pip install feast`) |
| 26 | + |
| 27 | +**For serving log monitoring:** |
| 28 | + |
| 29 | +- At least one feature service with `logging_config` set (see [step 4](#4-monitoring-feature-serving-logs)) |
| 30 | + |
| 31 | +## 2. Auto-baseline on registration |
| 32 | + |
| 33 | +When you run `feast apply` to register new features, Feast automatically queues baseline metric computation: |
| 34 | + |
| 35 | +```bash |
| 36 | +$ feast apply |
| 37 | +Applying changes... |
| 38 | +Created feature view 'driver_stats' with 3 features |
| 39 | + → Queued baseline metrics computation (DQM job: abc-123) |
| 40 | +Done! |
| 41 | +``` |
| 42 | + |
| 43 | +The baseline reads all available source data and stores the resulting statistics with `is_baseline=TRUE`. This serves as the reference distribution for future drift detection. |
| 44 | + |
| 45 | +Baseline computation is: |
| 46 | +- **Non-blocking** — `feast apply` returns immediately; computation runs asynchronously |
| 47 | +- **Idempotent** — only features without existing baselines are computed; re-running `feast apply` won't recompute existing baselines |
| 48 | + |
| 49 | +## 3. Scheduled monitoring with the CLI |
| 50 | + |
| 51 | +### Auto mode (recommended for production) |
| 52 | + |
| 53 | +Schedule a single daily job that computes all granularities automatically: |
| 54 | + |
| 55 | +```bash |
| 56 | +feast monitor run |
| 57 | +``` |
| 58 | + |
| 59 | +This detects the latest event timestamp in the source data and computes metrics for 5 time windows: |
| 60 | + |
| 61 | +| Granularity | Window | |
| 62 | +|-------------|--------| |
| 63 | +| `daily` | Last 1 day | |
| 64 | +| `weekly` | Last 7 days | |
| 65 | +| `biweekly` | Last 14 days | |
| 66 | +| `monthly` | Last 30 days | |
| 67 | +| `quarterly` | Last 90 days | |
| 68 | + |
| 69 | +No date arguments needed. One scheduled job produces all granularities. |
| 70 | + |
| 71 | +### Targeting a specific feature view |
| 72 | + |
| 73 | +```bash |
| 74 | +feast monitor run --feature-view driver_stats |
| 75 | +``` |
| 76 | + |
| 77 | +### Explicit date range and granularity |
| 78 | + |
| 79 | +```bash |
| 80 | +feast monitor run \ |
| 81 | + --feature-view driver_stats \ |
| 82 | + --start-date 2025-01-01 \ |
| 83 | + --end-date 2025-01-07 \ |
| 84 | + --granularity weekly |
| 85 | +``` |
| 86 | + |
| 87 | +### Setting a manual baseline |
| 88 | + |
| 89 | +```bash |
| 90 | +feast monitor run \ |
| 91 | + --feature-view driver_stats \ |
| 92 | + --start-date 2025-01-01 \ |
| 93 | + --end-date 2025-03-31 \ |
| 94 | + --granularity daily \ |
| 95 | + --set-baseline |
| 96 | +``` |
| 97 | + |
| 98 | +### CLI reference |
| 99 | + |
| 100 | +``` |
| 101 | +Usage: feast monitor run [OPTIONS] |
| 102 | +
|
| 103 | +Options: |
| 104 | + -p, --project TEXT Feast project name (defaults to feature_store.yaml) |
| 105 | + -v, --feature-view TEXT Feature view name (omit for all) |
| 106 | + -f, --feature-name TEXT Feature name(s), repeatable (omit for all) |
| 107 | + --start-date TEXT Start date YYYY-MM-DD (omit for auto-detect) |
| 108 | + --end-date TEXT End date YYYY-MM-DD (omit for auto-detect) |
| 109 | + -g, --granularity One of: daily, weekly, biweekly, monthly, quarterly |
| 110 | + --set-baseline Mark this computation as baseline |
| 111 | + --source-type One of: batch, log, all (default: batch) |
| 112 | + --help Show this message and exit. |
| 113 | +``` |
| 114 | + |
| 115 | +## 4. Monitoring feature serving logs |
| 116 | + |
| 117 | +If your feature services have logging configured, you can compute metrics from the actual features served to models in production. |
| 118 | + |
| 119 | +### Setting up feature service logging |
| 120 | + |
| 121 | +In your feature definitions: |
| 122 | + |
| 123 | +```python |
| 124 | +from feast import FeatureService, LoggingConfig |
| 125 | +from feast.infra.offline_stores.contrib.postgres_offline_store.postgres_source import ( |
| 126 | + PostgreSQLLoggingDestination, |
| 127 | +) |
| 128 | + |
| 129 | +driver_service = FeatureService( |
| 130 | + name="driver_service", |
| 131 | + features=[driver_stats_fv], |
| 132 | + logging_config=LoggingConfig( |
| 133 | + destination=PostgreSQLLoggingDestination(table_name="feast_driver_logs"), |
| 134 | + sample_rate=1.0, |
| 135 | + ), |
| 136 | +) |
| 137 | +``` |
| 138 | + |
| 139 | +### Computing log metrics |
| 140 | + |
| 141 | +**Auto mode (all feature services with logging):** |
| 142 | + |
| 143 | +```bash |
| 144 | +feast monitor run --source-type log |
| 145 | +``` |
| 146 | + |
| 147 | +**Specific feature service:** |
| 148 | + |
| 149 | +```bash |
| 150 | +feast monitor run --source-type log --feature-view driver_service |
| 151 | +``` |
| 152 | + |
| 153 | +**Both batch and log in one run:** |
| 154 | + |
| 155 | +```bash |
| 156 | +feast monitor run --source-type all |
| 157 | +``` |
| 158 | + |
| 159 | +Log metrics are stored with `data_source_type="log"` alongside batch metrics in the same monitoring tables. Feature names from the log schema (e.g., `driver_stats__conv_rate`) are automatically normalized back to their original names (`conv_rate`) and associated with the correct feature view — enabling batch-vs-log comparison and drift detection. |
| 160 | + |
| 161 | +### Via REST API |
| 162 | + |
| 163 | +```bash |
| 164 | +# Compute log metrics |
| 165 | +POST /monitoring/compute/log |
| 166 | +{ |
| 167 | + "project": "my_project", |
| 168 | + "feature_service_name": "driver_service", |
| 169 | + "granularity": "daily" |
| 170 | +} |
| 171 | + |
| 172 | +# Auto-compute all log metrics |
| 173 | +POST /monitoring/auto_compute/log |
| 174 | +{ |
| 175 | + "project": "my_project" |
| 176 | +} |
| 177 | +``` |
| 178 | + |
| 179 | +## 5. Reading metrics via REST API |
| 180 | + |
| 181 | +All read endpoints support cascading filters: `project` → `feature_service_name` → `feature_view_name` → `feature_name` → `granularity` → `data_source_type`. |
| 182 | + |
| 183 | +### Per-feature metrics |
| 184 | + |
| 185 | +``` |
| 186 | +GET /monitoring/metrics/features?project=my_project&feature_view_name=driver_stats&granularity=daily |
| 187 | +``` |
| 188 | + |
| 189 | +**Response:** |
| 190 | + |
| 191 | +```json |
| 192 | +[ |
| 193 | + { |
| 194 | + "project_id": "my_project", |
| 195 | + "feature_view_name": "driver_stats", |
| 196 | + "feature_name": "conv_rate", |
| 197 | + "feature_type": "numeric", |
| 198 | + "metric_date": "2025-03-26", |
| 199 | + "granularity": "daily", |
| 200 | + "data_source_type": "batch", |
| 201 | + "row_count": 15000, |
| 202 | + "null_count": 12, |
| 203 | + "null_rate": 0.0008, |
| 204 | + "mean": 0.523, |
| 205 | + "stddev": 0.189, |
| 206 | + "min_val": 0.001, |
| 207 | + "max_val": 0.998, |
| 208 | + "p50": 0.51, |
| 209 | + "p75": 0.68, |
| 210 | + "p90": 0.82, |
| 211 | + "p95": 0.89, |
| 212 | + "p99": 0.96, |
| 213 | + "histogram": { |
| 214 | + "bins": [0.0, 0.05, 0.1, "..."], |
| 215 | + "counts": [120, 340, 560, "..."], |
| 216 | + "bin_width": 0.05 |
| 217 | + } |
| 218 | + } |
| 219 | +] |
| 220 | +``` |
| 221 | + |
| 222 | +### Per-feature-view aggregates |
| 223 | + |
| 224 | +``` |
| 225 | +GET /monitoring/metrics/feature_views?project=my_project&feature_view_name=driver_stats |
| 226 | +``` |
| 227 | + |
| 228 | +### Per-feature-service aggregates |
| 229 | + |
| 230 | +``` |
| 231 | +GET /monitoring/metrics/feature_services?project=my_project&feature_service_name=driver_service |
| 232 | +``` |
| 233 | + |
| 234 | +### Baseline |
| 235 | + |
| 236 | +``` |
| 237 | +GET /monitoring/metrics/baseline?project=my_project&feature_view_name=driver_stats |
| 238 | +``` |
| 239 | + |
| 240 | +### Time-series (for trend charts) |
| 241 | + |
| 242 | +``` |
| 243 | +GET /monitoring/metrics/timeseries?project=my_project&feature_name=conv_rate&granularity=daily&start_date=2025-01-01&end_date=2025-03-31 |
| 244 | +``` |
| 245 | + |
| 246 | +### Filtering batch vs. log metrics |
| 247 | + |
| 248 | +Add `data_source_type=batch` or `data_source_type=log` to any read endpoint: |
| 249 | + |
| 250 | +``` |
| 251 | +GET /monitoring/metrics/features?project=my_project&data_source_type=log |
| 252 | +``` |
| 253 | + |
| 254 | +### Full endpoint reference |
| 255 | + |
| 256 | +| Method | Endpoint | Description | |
| 257 | +|--------|----------|-------------| |
| 258 | +| `POST` | `/monitoring/compute` | Submit batch DQM job | |
| 259 | +| `POST` | `/monitoring/auto_compute` | Auto-detect dates, all granularities | |
| 260 | +| `POST` | `/monitoring/compute/transient` | On-demand compute (not stored) | |
| 261 | +| `POST` | `/monitoring/compute/log` | Compute from serving logs | |
| 262 | +| `POST` | `/monitoring/auto_compute/log` | Auto-detect log dates, all granularities | |
| 263 | +| `GET` | `/monitoring/jobs/{job_id}` | DQM job status | |
| 264 | +| `GET` | `/monitoring/metrics/features` | Per-feature metrics | |
| 265 | +| `GET` | `/monitoring/metrics/feature_views` | Per-view aggregates | |
| 266 | +| `GET` | `/monitoring/metrics/feature_services` | Per-service aggregates | |
| 267 | +| `GET` | `/monitoring/metrics/baseline` | Baseline metrics | |
| 268 | +| `GET` | `/monitoring/metrics/timeseries` | Time-series data | |
| 269 | + |
| 270 | +## 6. On-demand exploration |
| 271 | + |
| 272 | +When you need metrics for an arbitrary date range (e.g., "show me the distribution for Jan 5 to Jan 20"), use the transient compute endpoint. It reads source data for the exact range, computes fresh statistics, and returns them directly without storing. |
| 273 | + |
| 274 | +```bash |
| 275 | +POST /monitoring/compute/transient |
| 276 | +{ |
| 277 | + "project": "my_project", |
| 278 | + "feature_view_name": "driver_stats", |
| 279 | + "feature_names": ["conv_rate"], |
| 280 | + "start_date": "2025-01-05", |
| 281 | + "end_date": "2025-01-20" |
| 282 | +} |
| 283 | +``` |
| 284 | + |
| 285 | +This is necessary because pre-computed histograms from different date ranges have different bin edges and cannot be merged losslessly. |
| 286 | + |
| 287 | +## 7. Integrating with orchestrators |
| 288 | + |
| 289 | +### Airflow |
| 290 | + |
| 291 | +```python |
| 292 | +from airflow.operators.bash import BashOperator |
| 293 | + |
| 294 | +monitor_task = BashOperator( |
| 295 | + task_id="feast_monitor", |
| 296 | + bash_command="feast monitor run", |
| 297 | + cwd="/path/to/feast/repo", |
| 298 | +) |
| 299 | +``` |
| 300 | + |
| 301 | +### Kubeflow Pipelines (KFP) |
| 302 | + |
| 303 | +```python |
| 304 | +from kfp import dsl |
| 305 | + |
| 306 | +@dsl.component(base_image="feast-image:latest") |
| 307 | +def monitor_features(): |
| 308 | + import subprocess |
| 309 | + subprocess.run(["feast", "monitor", "run"], check=True, cwd="/feast/repo") |
| 310 | +``` |
| 311 | + |
| 312 | +### Cron |
| 313 | + |
| 314 | +```cron |
| 315 | +# Daily at 2:00 AM UTC |
| 316 | +0 2 * * * cd /path/to/feast/repo && feast monitor run >> /var/log/feast-monitor.log 2>&1 |
| 317 | +``` |
| 318 | + |
| 319 | +### Monitoring both batch and log in one job |
| 320 | + |
| 321 | +```bash |
| 322 | +feast monitor run --source-type all |
| 323 | +``` |
| 324 | + |
| 325 | +## 8. Supported backends |
| 326 | + |
| 327 | +Monitoring works natively with all offline stores that serve as compute engines for Feast materialization: |
| 328 | + |
| 329 | +| Backend | Compute | Storage | |
| 330 | +|---------|---------|---------| |
| 331 | +| PostgreSQL | SQL push-down | `INSERT ON CONFLICT` | |
| 332 | +| Snowflake | SQL push-down | `MERGE` with `VARIANT` JSON | |
| 333 | +| BigQuery | SQL push-down | `MERGE` into BQ tables | |
| 334 | +| Redshift | SQL push-down | `MERGE` via Data API | |
| 335 | +| Spark | SparkSQL push-down | Parquet tables | |
| 336 | +| Oracle | SQL via Ibis | `MERGE` from `DUAL` | |
| 337 | +| DuckDB | In-memory SQL | Parquet files | |
| 338 | +| Dask | PyArrow compute | Parquet files | |
| 339 | + |
| 340 | +Backends not listed above fall back to Python-based computation — the offline store's `pull_all_from_table_or_query()` returns a PyArrow Table, and metrics are computed using `pyarrow.compute` and `numpy`. |
| 341 | + |
| 342 | +## What metrics are computed |
| 343 | + |
| 344 | +**Per-feature (full profile):** |
| 345 | + |
| 346 | +| Metric | Numeric | Categorical | |
| 347 | +|--------|:-------:|:-----------:| |
| 348 | +| row_count, null_count, null_rate | Yes | Yes | |
| 349 | +| mean, stddev, min, max | Yes | — | |
| 350 | +| p50, p75, p90, p95, p99 | Yes | — | |
| 351 | +| histogram (JSONB) | Binned distribution | Top-N values with counts | |
| 352 | + |
| 353 | +**Per-feature-view and per-feature-service (aggregate summaries):** |
| 354 | + |
| 355 | +| Metric | Description | |
| 356 | +|--------|-------------| |
| 357 | +| total_row_count | Total rows in the view | |
| 358 | +| total_features | Number of features | |
| 359 | +| features_with_nulls | Count of features with any nulls | |
| 360 | +| avg_null_rate, max_null_rate | Aggregate null rate statistics | |
| 361 | + |
| 362 | +## RBAC |
| 363 | + |
| 364 | +Monitoring respects Feast's existing RBAC: |
| 365 | + |
| 366 | +- **Compute operations** (`POST /monitoring/compute`, `/auto_compute`, `/compute/log`, `/auto_compute/log`) require `AuthzedAction.UPDATE` |
| 367 | +- **Transient compute** (`POST /monitoring/compute/transient`) requires `AuthzedAction.DESCRIBE` |
| 368 | +- **Read operations** (`GET /monitoring/metrics/*`) require `AuthzedAction.DESCRIBE` |
0 commit comments