Skip to content

Commit 65b1b80

Browse files
authored
feat: Operational metrics for offline store and SOX metrics for both (#6340)
* feat: Operational metrics for offline store and SOX metrics for both Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com> * fix: Resolve comments from review Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com> * feat: System metrics API requests to prometheus Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com> * fix: Reviewers comment fixed Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com> * chore: Removed operational metrics GET request from prometheus Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com> --------- Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
1 parent 34d2d3e commit 65b1b80

8 files changed

Lines changed: 1568 additions & 617 deletions

File tree

docs/reference/feature-servers/python-feature-server.md

Lines changed: 71 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -352,11 +352,14 @@ feature_server:
352352
push: true # push request counters
353353
materialization: true # materialization counters & duration
354354
freshness: true # feature freshness gauges
355+
offline_features: true # offline store retrieval counters & latency
356+
audit_logging: false # structured JSON audit logs (see below)
355357
```
356358

357359
Any category set to `false` will emit no metrics and start no background
358360
threads (e.g., setting `freshness: false` prevents the registry polling
359-
thread from starting). All categories default to `true`.
361+
thread from starting). All categories default to `true` except
362+
`audit_logging`, which defaults to `false`.
360363

361364
### Available metrics
362365

@@ -375,6 +378,9 @@ thread from starting). All categories default to `true`.
375378
| `feast_materialization_result_total` | Counter | `feature_view`, `status` | `materialization` | Materialization runs (success/failure) |
376379
| `feast_materialization_duration_seconds` | Histogram | `feature_view` | `materialization` | Materialization duration per feature view |
377380
| `feast_feature_freshness_seconds` | Gauge | `feature_view`, `project` | `freshness` | Seconds since last materialization |
381+
| `feast_offline_store_request_total` | Counter | `method`, `status` | `offline_features` | Total offline store retrieval requests |
382+
| `feast_offline_store_request_latency_seconds` | Histogram | `method` | `offline_features` | Latency of offline store retrieval operations |
383+
| `feast_offline_store_row_count` | Histogram | `method` | `offline_features` | Rows returned by offline store retrieval |
378384

379385
### Per-ODFV transformation metrics
380386

@@ -405,6 +411,70 @@ The `odfv_name` label lets you filter or group by individual ODFV,
405411
and the `mode` label (`python`, `pandas`, `substrait`) lets you compare
406412
transformation engines.
407413

414+
### Audit logging
415+
416+
Feast can emit structured JSON audit log entries for every online and offline
417+
feature retrieval. These are written via the standard `feast.audit` Python
418+
logger, so you can route them to a dedicated file, SIEM, or log aggregator
419+
independently of application logs.
420+
421+
Audit logging is **disabled by default**. Enable it in `feature_store.yaml`:
422+
423+
```yaml
424+
feature_server:
425+
type: local
426+
metrics:
427+
enabled: true
428+
audit_logging: true
429+
```
430+
431+
**Online audit log** (emitted per `/get-online-features` call):
432+
433+
```json
434+
{
435+
"event": "online_feature_request",
436+
"timestamp": "2026-05-11T08:30:00.123456+00:00",
437+
"requestor_id": "user@example.com",
438+
"entity_keys": ["driver_id"],
439+
"entity_count": 3,
440+
"feature_views": ["driver_hourly_stats"],
441+
"feature_count": 3,
442+
"status": "success",
443+
"latency_ms": 12.34
444+
}
445+
```
446+
447+
**Offline audit log** (emitted per `RetrievalJob.to_arrow()` call):
448+
449+
```json
450+
{
451+
"event": "offline_feature_retrieval",
452+
"timestamp": "2026-05-11T08:31:00.456789+00:00",
453+
"method": "to_arrow",
454+
"start_time": "2026-05-11T08:30:59.226789+00:00",
455+
"end_time": "2026-05-11T08:31:00.456789+00:00",
456+
"feature_views": ["driver_hourly_stats"],
457+
"feature_count": 3,
458+
"row_count": 500,
459+
"status": "success",
460+
"duration_ms": 1230.0
461+
}
462+
```
463+
464+
The `requestor_id` field in online audit logs is populated from the
465+
security manager's current user when authentication is configured, and
466+
falls back to `"anonymous"` otherwise.
467+
468+
To route audit logs to a separate file:
469+
470+
```python
471+
import logging
472+
473+
handler = logging.FileHandler("/var/log/feast/audit.log")
474+
handler.setFormatter(logging.Formatter("%(message)s"))
475+
logging.getLogger("feast.audit").addHandler(handler)
476+
```
477+
408478
### Scraping with Prometheus
409479

410480
```yaml

infra/feast-operator/config/samples/v1_featurestore_serving.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,8 @@ spec:
2626
push: true # push/write request counters
2727
materialization: true # materialization counters and duration histograms
2828
freshness: false # feature freshness gauges (can be expensive at scale)
29-
# Example: when a future SDK adds "registry_sync", enable it here
30-
# registry_sync: false
29+
offline_features: true # offline store retrieval counters, latency, row count
30+
audit_logging: false # structured JSON audit logs via the feast.audit logger
3131
offlinePushBatching:
3232
enabled: true
3333
batchSize: 1000 # max rows per offline write batch

sdk/python/feast/feature_server.py

Lines changed: 66 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -152,28 +152,71 @@ class ChatRequest(BaseModel):
152152
messages: List[ChatMessage]
153153

154154

155-
def _resolve_feature_counts(
155+
def _parse_feature_info(
156156
features: Union[List[str], "feast.FeatureService"],
157157
) -> tuple:
158-
"""Return (feature_count, feature_view_count) from the resolved features.
158+
"""Return ``(feature_view_names, feature_count)`` from resolved features.
159159
160160
``features`` is either a list of ``"feature_view:feature"`` strings or
161161
a ``FeatureService`` with ``feature_view_projections``.
162+
163+
Returns:
164+
(fv_names, feat_count) where fv_names is a list of unique feature
165+
view name strings and feat_count is the total number of features.
162166
"""
163167
from feast.feature_service import FeatureService
168+
from feast.utils import _parse_feature_ref
164169

165170
if isinstance(features, FeatureService):
166171
projections = features.feature_view_projections
167-
fv_count = len(projections)
172+
fv_names = [p.name for p in projections]
168173
feat_count = sum(len(p.features) for p in projections)
169174
elif isinstance(features, list):
170175
feat_count = len(features)
171-
fv_names = {ref.split(":")[0].split("@")[0] for ref in features if ":" in ref}
172-
fv_count = len(fv_names)
176+
fv_names = list({_parse_feature_ref(ref)[0] for ref in features if ":" in ref})
173177
else:
178+
fv_names = []
174179
feat_count = 0
175-
fv_count = 0
176-
return str(feat_count), str(fv_count)
180+
return fv_names, feat_count
181+
182+
183+
def _resolve_feature_counts(
184+
features: Union[List[str], "feast.FeatureService"],
185+
) -> tuple:
186+
"""Return ``(feature_count_str, feature_view_count_str)`` for Prometheus labels."""
187+
fv_names, feat_count = _parse_feature_info(features)
188+
return str(feat_count), str(len(fv_names))
189+
190+
191+
def _emit_online_audit(
192+
request: GetOnlineFeaturesRequest,
193+
features: Union[List[str], "feast.FeatureService"],
194+
entity_count: int,
195+
status: str,
196+
latency_ms: float,
197+
):
198+
"""Best-effort audit log emission for online feature requests."""
199+
try:
200+
from feast.permissions.security_manager import get_security_manager
201+
202+
requestor_id = "anonymous"
203+
sm = get_security_manager()
204+
if sm and sm.current_user:
205+
requestor_id = sm.current_user.username or "anonymous"
206+
207+
fv_names, feat_count = _parse_feature_info(features)
208+
209+
feast_metrics.emit_online_audit_log(
210+
requestor_id=requestor_id,
211+
entity_keys=list(request.entities.keys()),
212+
entity_count=entity_count,
213+
feature_views=fv_names,
214+
feature_count=feat_count,
215+
status=status,
216+
latency_ms=latency_ms,
217+
)
218+
except Exception:
219+
logger.warning("Failed to emit online audit log", exc_info=True)
177220

178221

179222
async def _get_features(
@@ -390,11 +433,22 @@ async def get_online_features(request: GetOnlineFeaturesRequest) -> Any:
390433
include_feature_view_version_metadata=request.include_feature_view_version_metadata,
391434
)
392435

393-
if store._get_provider().async_supported.online.read:
394-
response = await store.get_online_features_async(**read_params) # type: ignore
395-
else:
396-
response = await run_in_threadpool(
397-
lambda: store.get_online_features(**read_params) # type: ignore
436+
audit_start_ms = time.monotonic() * 1000
437+
audit_status = "success"
438+
try:
439+
if store._get_provider().async_supported.online.read:
440+
response = await store.get_online_features_async(**read_params) # type: ignore
441+
else:
442+
response = await run_in_threadpool(
443+
lambda: store.get_online_features(**read_params) # type: ignore
444+
)
445+
except Exception:
446+
audit_status = "error"
447+
raise
448+
finally:
449+
audit_latency_ms = time.monotonic() * 1000 - audit_start_ms
450+
_emit_online_audit(
451+
request, features, entity_count, audit_status, audit_latency_ms
398452
)
399453

400454
response_dict = await run_in_threadpool(

sdk/python/feast/infra/feature_servers/base_config.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,17 @@ class MetricsConfig(FeastConfigBaseModel):
8282
"""Emit per-feature-view freshness gauges
8383
(feast_feature_freshness_seconds)."""
8484

85+
offline_features: StrictBool = True
86+
"""Emit offline store retrieval metrics
87+
(feast_offline_store_request_total,
88+
feast_offline_store_request_latency_seconds,
89+
feast_offline_store_row_count)."""
90+
91+
audit_logging: StrictBool = False
92+
"""Emit structured JSON audit log entries for online and offline
93+
feature requests via the ``feast.audit`` logger. Captures requestor
94+
identity, entity keys, feature views, row counts, and latency."""
95+
8596

8697
class BaseFeatureServerConfig(FeastConfigBaseModel):
8798
"""Base Feature Server config that should be extended"""

sdk/python/feast/infra/offline_stores/offline_store.py

Lines changed: 65 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,11 @@
1111
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
14+
import logging
15+
import time
1416
import warnings
1517
from abc import ABC
16-
from datetime import datetime
18+
from datetime import datetime, timedelta, timezone
1719
from pathlib import Path
1820
from typing import (
1921
TYPE_CHECKING,
@@ -70,6 +72,23 @@ def __init__(
7072
self.max_event_timestamp = max_event_timestamp
7173

7274

75+
def _extract_retrieval_metadata(job: "RetrievalJob") -> tuple:
76+
"""Return ``(feature_view_names, feature_count)`` from a RetrievalJob's metadata."""
77+
from feast.utils import _parse_feature_ref
78+
79+
try:
80+
meta = job.metadata
81+
if meta:
82+
feature_count = len(meta.features)
83+
feature_views = list(
84+
{_parse_feature_ref(ref)[0] for ref in meta.features if ":" in ref}
85+
)
86+
return feature_views, feature_count
87+
except (NotImplementedError, AttributeError):
88+
pass
89+
return [], 0
90+
91+
7392
class RetrievalJob(ABC):
7493
"""A RetrievalJob manages the execution of a query to retrieve data from the offline store."""
7594

@@ -152,7 +171,51 @@ def to_arrow(
152171
validation_reference (optional): The validation to apply against the retrieved dataframe.
153172
timeout (optional): The query timeout if applicable.
154173
"""
155-
features_table = self._to_arrow_internal(timeout=timeout)
174+
start_wall = time.monotonic()
175+
status_label = "success"
176+
row_count = 0
177+
try:
178+
features_table = self._to_arrow_internal(timeout=timeout)
179+
row_count = features_table.num_rows
180+
except Exception:
181+
status_label = "error"
182+
raise
183+
finally:
184+
try:
185+
from feast import metrics as feast_metrics
186+
187+
elapsed = time.monotonic() - start_wall
188+
189+
if feast_metrics._config.offline_features:
190+
feast_metrics.offline_store_request_total.labels(
191+
method="to_arrow", status=status_label
192+
).inc()
193+
feast_metrics.offline_store_request_latency_seconds.labels(
194+
method="to_arrow"
195+
).observe(elapsed)
196+
feast_metrics.offline_store_row_count.labels(
197+
method="to_arrow"
198+
).observe(row_count)
199+
200+
if feast_metrics._config.audit_logging:
201+
feature_views, feature_count = _extract_retrieval_metadata(self)
202+
end_dt = datetime.now(tz=timezone.utc)
203+
start_dt = end_dt - timedelta(seconds=elapsed)
204+
feast_metrics.emit_offline_audit_log(
205+
method="to_arrow",
206+
feature_views=feature_views,
207+
feature_count=feature_count,
208+
row_count=row_count,
209+
status=status_label,
210+
start_time=start_dt.isoformat(),
211+
end_time=end_dt.isoformat(),
212+
duration_ms=elapsed * 1000,
213+
)
214+
except Exception:
215+
logging.getLogger(__name__).debug(
216+
"Failed to record offline store metrics", exc_info=True
217+
)
218+
156219
if self.on_demand_feature_views:
157220
# Build a mapping of ODFV name to requested feature names
158221
# This ensures we only return the features that were explicitly requested

0 commit comments

Comments
 (0)