Skip to content

Commit a5e75b4

Browse files
committed
Explain Spark feature
Signed-off-by: Danny Chiao <danny@tecton.ai>
1 parent 541ebd6 commit a5e75b4

File tree

1 file changed

+8
-1
lines changed

1 file changed

+8
-1
lines changed

module_1/feature_repo/module_1.ipynb

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -440,7 +440,14 @@
440440
"- These features can then be further post-processed and combined with other features or request data in on demand transforms.\n",
441441
"- An example might be to push in the last 5 transactions, and in on demand transforms generate the average of those transactions.\n",
442442
"\n",
443-
"Feast will help manage both batch and streaming sources for you. You can run `feast materialize-incremental` as well as ingest streaming features to the same online store."
443+
"Feast will help manage both batch and streaming sources for you. You can run `feast materialize-incremental` as well as ingest streaming features to the same online store.\n",
444+
"\n",
445+
"**Below, what's happening:**\n",
446+
"- We use Spark to compute a sliding window aggregate feature that computes `daily_miles_driven` using the `miles_driven` column in the event.\n",
447+
"- Triggers every 30 seconds\n",
448+
" - In this case, because we’re just reading from the `driver_stats.parquet`, we could have multiple windows of data coming in. Thus, in this code, we filter for the latest (`driver_id`, `window`) feature\n",
449+
" - If you have a larger watermark and can get events across multiple windows, you’ll want to have the latest window too.\n",
450+
"- Rename `end` to `event_timestamp`, otherwise Feast will throw a validation error since it doesn’t match the schema of the `FeatureView`"
444451
]
445452
},
446453
{

0 commit comments

Comments
 (0)