You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: module_0/README.md
+33-4Lines changed: 33 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -52,18 +52,19 @@ A quick explanation of what's happening here:
52
52
- Generally, custom offline + online stores and providers are supported and can plug in.
53
53
- e.g. see [adding a new offline store](https://docs.feast.dev/how-to-guides/adding-a-new-offline-store), [adding a new online store](https://docs.feast.dev/how-to-guides/adding-support-for-a-new-online-store)
54
54
- **Project**
55
-
- users can only request features from a single project
55
+
- Users can only request features from a single project
56
56
- **Provider**
57
-
- defaults can be easily overriden in `feature_store.yaml`.
57
+
- Defaults can be easily overriden in `feature_store.yaml`.
58
58
- For example, one can use the `aws` provider and specify Snowflake as the offline store.
59
59
- **Offline Store**
60
-
- we recommend users use data warehouses or Spark as their offline store for performant training dataset generation.
60
+
- We recommend users use data warehouses or Spark as their offline store for performant training dataset generation.
61
61
- Here, we use file sources for instructional purposes. This will directly read from files (local or remote) and use Dask to execute point-in-time joins.
62
62
- A project can only support one type of offline store (cannot mix Snowflake + file for example)
63
+
- Each offline store has its own configurations which map to YAML. (e.g. see [BigQueryOfflineStoreConfig](https://rtd.feast.dev/en/master/index.html#feast.infra.offline_stores.bigquery.BigQueryOfflineStoreConfig)):
63
64
- **Online Store**
64
65
- If you don't need to power real time models with fresh features, this is not needed.
65
66
- If you are precomputing predictions in batch ("batch scoring"), then the online store is optional. You should be using the offline store and running `feature_store.get_historical_features`
66
-
67
+
- Each online store has its own configurations which map to YAML. (e.g. [RedisOnlineStoreConfig](https://rtd.feast.dev/en/master/feast.infra.online_stores.html#feast.infra.online_stores.redis.RedisOnlineStoreConfig))
67
68
With the `feature_store.yaml` setup, you can now run `feast apply` to create & populate the registry.
68
69
69
70
### Step 2: Adding the feature repo to version control
You may note that the above example uses a `to_df()` method to load the training dataset into memory and may be wondering how this scales if you have very large datasets.
160
+
161
+
`get_historical_features`actually returns a `RetrievalJob` object that lazily executes the point-in-time join. The `RetrievalJob` class is extended by each offline store to allow flushing results to e.g. the data warehouse or data lakes.
162
+
163
+
Let's look at an example with BigQuery as the offline store.
164
+
```yaml
165
+
project: feast_demo_gcp
166
+
provider: gcp
167
+
registry: gs://[YOUR BUCKET]/registry.pb
168
+
offline_store:
169
+
type: bigquery
170
+
location: EU
171
+
flags:
172
+
alpha_features: true
173
+
on_demand_transforms: true
174
+
```
175
+
176
+
Retrieving the data with `get_historical_features` gives a `BigQueryRetrievalJob` object ([reference](https://rtd.feast.dev/en/master/index.html#feast.infra.offline_stores.bigquery.BigQueryRetrievalJob)) which exposes a `to_bigquery()` method. Thus, you can do:
0 commit comments