You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/user-guide/feature-retrieval.md
+32Lines changed: 32 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -107,6 +107,38 @@ Feast can retrieve features from any amount of feature sets, as long as they occ
107
107
108
108
Point-in-time-correct joins also prevents the occurrence of feature leakage by trying to accurate the state of the world at a single point in time, instead of just joining features based on the nearest timestamps.
109
109
110
+
### **Computing statistics over retrieved data**
111
+
112
+
Feast is able to compute [TFDV](https://tensorflow.google.cn/tfx/tutorials/data_validation/tfdv_basic) compatible statistics over data retrieved from historical stores. The statistics can be used in conjunction with feature schemas and TFDV to verify the integrity of your retrieved dataset, or to [Facets](https://github.com/PAIR-code/facets) to visualize the distribution.
113
+
114
+
The computation of statistics is not enabled by default. To indicate to Feast that the statistics are to be computed for a given historical retrieval request, pass `compute_statistics=True` to `get_batch_features`.
115
+
116
+
```python
117
+
dataset = client.get_batch_features(
118
+
feature_refs=features,
119
+
entity_rows=entity_df
120
+
compute_statistics=True
121
+
)
122
+
123
+
stats = dataset.statistics()
124
+
```
125
+
126
+
If a schema is already defined over the feature sets on question, tfdv can be used to detect anomalies over the dataset.
Online feature retrieval works in much the same way as batch retrieval, with one important distinction: Online stores only maintain the current state of features. No historical data is served.
0 commit comments