|
1 | | -# Module 2: On demand transformations |
2 | | -TODO |
| 1 | +<h1>Module 2: On demand transformations</h1> |
| 2 | + |
| 3 | +In this module, we introduce the concept of on demand transforms. These are transformations that execute on-the-fly and accept as input other feature views or request data. |
3 | 4 |
|
| 5 | +TODO: |
| 6 | +- add architecture |
4 | 7 | - Define request data |
5 | 8 | - Define on demand transforms |
6 | 9 | - Note that this can also transforms pushed features (e.g. stream features) |
7 | 10 | - Note that this can combine multiple feature views and request data |
8 | 11 |
|
| 12 | +<h2>Table of Contents</h2> |
9 | 13 |
|
10 | | -<h1>Module 2: On demand transformations</h1> |
| 14 | +- [Workshop](#workshop) |
| 15 | + - [Step 1: Install Feast](#step-1-install-feast) |
| 16 | + - [Step 2: Look at the data we have](#step-2-look-at-the-data-we-have) |
| 17 | + - [Step 3: Apply features](#step-3-apply-features) |
| 18 | + - [Step 3: Materialize batch features](#step-3-materialize-batch-features) |
| 19 | + - [Step 4: Test retrieve features](#step-4-test-retrieve-features) |
| 20 | +- [Conclusion](#conclusion) |
| 21 | + |
| 22 | +# Workshop |
| 23 | +## Step 1: Install Feast |
11 | 24 |
|
12 | | -In this module, we introduce the concept of on demand transforms. These are transformations that execute on-the-fly and accept as input other feature views or request data. |
| 25 | +First, we install Feast as well as a Geohash module we want to use: |
| 26 | +```bash |
| 27 | +pip install feast |
| 28 | +pip install pygeohash |
| 29 | +``` |
13 | 30 |
|
14 | | -We and focus on building features for online serving, and keeping them fresh with a combination of batch feature materialization and stream feature ingestion. We'll be roughly working towards the following: |
| 31 | +## Step 2: Look at the data we have |
| 32 | +We used `data/gen_lat_lon.py` to append randomly generated latitude and longitudes to the original driver stats dataset. |
15 | 33 |
|
16 | | -- **Data sources**: Kafka + File source |
17 | | -- **Online store**: Redis |
18 | | -- **Use case**: Predicting churn for drivers in real time. |
| 34 | +```python |
| 35 | +import pandas as pd |
| 36 | +pd.read_parquet("data/driver_stats_lat_lon.parquet") |
| 37 | +``` |
19 | 38 |
|
20 | | -<img src="architecture.png" width=750> |
| 39 | + |
21 | 40 |
|
22 | | -<h2>Table of Contents</h2> |
| 41 | +## Step 3: Apply features |
| 42 | +```console |
| 43 | +$ feast apply |
23 | 44 |
|
24 | | -# Workshop |
25 | | -## Step 1: Install Feast |
| 45 | +Created entity driver |
| 46 | +Created feature view driver_daily_features |
| 47 | +Created feature view driver_hourly_stats |
| 48 | +Created on demand feature view transformed_conv_rate |
| 49 | +Created on demand feature view avg_hourly_miles_driven |
| 50 | +Created on demand feature view location_features_from_push |
| 51 | +Created feature service model_v3 |
| 52 | +Created feature service model_v2 |
| 53 | +Created feature service model_v1 |
26 | 54 |
|
27 | | -First, we install Feast with Spark and Redis support: |
28 | | -```bash |
29 | | -pip install "feast[spark,redis]" |
30 | | -``` |
| 55 | +Created sqlite table feast_demo_odfv_driver_daily_features |
| 56 | +Created sqlite table feast_demo_odfv_driver_hourly_stats |
| 57 | +``` |
| 58 | + |
| 59 | +## Step 3: Materialize batch features |
| 60 | +```console |
| 61 | +$ feast materialize-incremental $(date +%Y-%m-%d) |
| 62 | + |
| 63 | +Materializing 2 feature views to 2022-05-17 12:41:18-04:00 into the sqlite online store. |
| 64 | + |
| 65 | +driver_hourly_stats from 1748-08-01 16:41:20-04:56:02 to 2022-05-17 12:41:18-04:00: |
| 66 | +100%|████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 495.03it/s] |
| 67 | +driver_daily_features from 1748-08-01 16:41:20-04:56:02 to 2022-05-17 12:41:18-04:00: |
| 68 | +100%|███████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 1274.48it/s] |
| 69 | +``` |
| 70 | + |
| 71 | +## Step 4: Test retrieve features |
| 72 | +Now we'll see how these transformations are executed offline at `get_historical_features` and online at `get_online_features` time. We'll also see how `OnDemandFeatureView` interacts with request data, regular feature views, and streaming / push features. |
| 73 | + |
| 74 | +Try out the Jupyter notebook in [client/module_2_client.ipynb](client/module_2_client.ipynb). This is in a separate directory that contains just a `feature_store.yaml`. |
| 75 | + |
| 76 | +# Conclusion |
| 77 | +TODO |
0 commit comments