You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/reference/feature-repository/registration-inferencing.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,6 @@
2
2
3
3
## Overview
4
4
5
-
* FeatureView - When the `features` parameter is left out of the feature view definition, upon a `feast apply` call, Feast will automatically consider every column in the data source as a feature to be registered other than the specific timestamp columns associated with the underlying data source definition (e.g. event_timestamp_column) and the columns associated with the feature view's entities.
6
-
* DataSource - When the `event_timestamp_column` parameter is left out of the data source definition, upon a 'feast apply' call, Feast will automatically find the sole timestamp column in the table underlying the data source and use that as the `event_timestamp_column`. If there are no columns of timestamp type or multiple columns of timestamp type, `feast apply` will throw an exception.
5
+
* FeatureView - When the `features` parameter is left out of the feature view definition, upon a `feast apply` call, Feast will automatically consider every column in the data source as a feature to be registered other than the specific timestamp columns associated with the underlying data source definition (e.g. timestamp_field) and the columns associated with the feature view's entities.
6
+
* DataSource - When the `timestamp_field` parameter is left out of the data source definition, upon a 'feast apply' call, Feast will automatically find the sole timestamp column in the table underlying the data source and use that as the `timestamp_field`. If there are no columns of timestamp type or multiple columns of timestamp type, `feast apply` will throw an exception.
7
7
* Entity - When the `value_type` parameter is left out of the entity definition, upon a `feast apply` call, Feast will automatically find the column corresponding with the entity's `join_key` and take that column's data type to be the `value_type`. If the column doesn't exist, `feast apply` will throw an exception.
Copy file name to clipboardExpand all lines: docs/tutorials/validating-historical-features.md
+22-22Lines changed: 22 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,12 @@
1
1
# Validating historical features with Great Expectations
2
2
3
-
In this tutorial, we will use the public dataset of Chicago taxi trips to present data validation capabilities of Feast.
4
-
- The original dataset is stored in BigQuery and consists of raw data for each taxi trip (one row per trip) since 2013.
3
+
In this tutorial, we will use the public dataset of Chicago taxi trips to present data validation capabilities of Feast.
4
+
- The original dataset is stored in BigQuery and consists of raw data for each taxi trip (one row per trip) since 2013.
5
5
- We will generate several training datasets (aka historical features in Feast) for different periods and evaluate expectations made on one dataset against another.
6
6
7
7
Types of features we're ingesting and generating:
8
-
- Features that aggregate raw data with daily intervals (eg, trips per day, average fare or speed for a specific day, etc.).
9
-
- Features using SQL while pulling data from BigQuery (like total trips time or total miles travelled).
8
+
- Features that aggregate raw data with daily intervals (eg, trips per day, average fare or speed for a specific day, etc.).
9
+
- Features using SQL while pulling data from BigQuery (like total trips time or total miles travelled).
10
10
- Features calculated on the fly when requested using Feast's on-demand transformations
11
11
12
12
Our plan:
@@ -31,7 +31,7 @@ Install Feast Python SDK and great expectations:
31
31
```
32
32
33
33
34
-
### 1. Dataset preparation (Optional)
34
+
### 1. Dataset preparation (Optional)
35
35
36
36
**You can skip this step if you don't have GCP account. Please use parquet files that are coming with this tutorial instead**
37
37
@@ -56,15 +56,15 @@ Running some basic aggregations while pulling data from BigQuery. Grouping by ta
56
56
57
57
58
58
```python
59
-
data_query ="""SELECT
59
+
data_query ="""SELECT
60
60
taxi_id,
61
61
TIMESTAMP_TRUNC(trip_start_timestamp, DAY) as day,
62
62
SUM(trip_miles) as total_miles_travelled,
63
63
SUM(trip_seconds) as total_trip_seconds,
64
64
SUM(fare) as total_earned,
65
65
COUNT(*) as trip_count
66
-
FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
67
-
WHERE
66
+
FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
67
+
WHERE
68
68
trip_miles > 0 AND trip_seconds > 60 AND
69
69
trip_start_timestamp BETWEEN '2019-01-01' and '2020-12-31' AND
FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
87
+
FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
88
88
WHERE
89
89
trip_miles > 0 AND trip_seconds > 0 AND
90
90
trip_start_timestamp BETWEEN '{year}-01-01' and '{year}-12-31'
@@ -120,7 +120,7 @@ from google.protobuf.duration_pb2 import Duration
120
120
121
121
```python
122
122
batch_source = FileSource(
123
-
event_timestamp_column="day",
123
+
timestamp_field="day",
124
124
path="trips_stats.parquet", # using parquet file that we created on previous step
125
125
file_format=ParquetFormat()
126
126
)
@@ -141,7 +141,7 @@ trips_stats_fv = FeatureView(
141
141
Feature("total_trip_seconds", ValueType.DOUBLE),
142
142
Feature("total_earned", ValueType.DOUBLE),
143
143
Feature("trip_count", ValueType.INT64),
144
-
144
+
145
145
],
146
146
ttl=Duration(seconds=86400),
147
147
batch_source=batch_source,
@@ -317,8 +317,8 @@ store.create_saved_dataset(
317
317
318
318
Dataset profiler is a function that accepts dataset and generates set of its characteristics. This charasteristics will be then used to evaluate (validate) next datasets.
319
319
320
-
**Important: datasets are not compared to each other!
321
-
Feast use a reference dataset and a profiler function to generate a reference profile.
320
+
**Important: datasets are not compared to each other!
321
+
Feast use a reference dataset and a profiler function to generate a reference profile.
322
322
This profile will be then used during validation of the tested dataset.**
0 commit comments