Skip to content

Commit 09f0562

Browse files
committed
Add reference to querying by RepoConfig
Signed-off-by: Danny Chiao <danny@tecton.ai>
1 parent 9b7b2e1 commit 09f0562

File tree

1 file changed

+23
-7
lines changed

1 file changed

+23
-7
lines changed

module_0/README.md

Lines changed: 23 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,9 @@ We focus on a specific example (that does not include online features + models):
2626
- [Step 2e (optional): Merge a sample PR in your fork](#step-2e-optional-merge-a-sample-pr-in-your-fork)
2727
- [Other best practices](#other-best-practices)
2828
- [User group 2: ML Engineers](#user-group-2-ml-engineers)
29-
- [Step 1: Fetch features for batch scoring](#step-1-fetch-features-for-batch-scoring)
30-
- [Step 2 (optional): Scaling to large datasets](#step-2-optional-scaling-to-large-datasets)
29+
- [Step 1: Fetch features for batch scoring (method 1)](#step-1-fetch-features-for-batch-scoring-method-1)
30+
- [Step 2: Fetch features for batch scoring (method 2)](#step-2-fetch-features-for-batch-scoring-method-2)
31+
- [Step 3 (optional): Scaling to large datasets](#step-3-optional-scaling-to-large-datasets)
3132
- [User group 3: Data Scientists](#user-group-3-data-scientists)
3233
- [Conclusion](#conclusion)
3334

@@ -391,7 +392,7 @@ training_df = store.get_historical_features(
391392
predictions = model.predict(training_df)
392393
```
393394

394-
### Step 1: Fetch features for batch scoring
395+
### Step 1: Fetch features for batch scoring (method 1)
395396
First, go into the `module_0/client` directory and change the `feature_store.yaml` to use your S3 bucket.
396397

397398
Then, run `python test_fetch.py`, which runs the above code (printing out the dataframe instead of the model):
@@ -406,7 +407,25 @@ $ python test_fetch.py
406407
359 1001 2022-05-15 20:46:00.308163+00:00 0.404588 0.407571
407408
1444 1004 2022-05-15 20:46:00.308163+00:00 0.977276 0.051582
408409
```
409-
### Step 2 (optional): Scaling to large datasets
410+
411+
### Step 2: Fetch features for batch scoring (method 2)
412+
You can also not have a `feature_store.yaml` and directly instantiate it in Python. See the `module_0/client_no_yaml` directory for an example of this. The output of `python test_fetch.py` will be identical to the previous step.
413+
414+
A quick snippet of the code:
415+
```python
416+
repo_config = RepoConfig(
417+
registry=RegistryConfig(path="s3://[YOUR BUCKET]/registry.pb"),
418+
project="feast_demo_aws",
419+
provider="aws",
420+
offline_store="file", # Could also be the OfflineStoreConfig e.g. FileOfflineStoreConfig
421+
online_store="null", # Could also be the OnlineStoreConfig e.g. RedisOnlineStoreConfig
422+
)
423+
store = FeatureStore(config=repo_config)
424+
...
425+
training_df = store.get_historical_features(...).to_df()
426+
```
427+
428+
### Step 3 (optional): Scaling to large datasets
410429
You may note that the above example uses a `to_df()` method to load the training dataset into memory and may be wondering how this scales if you have very large datasets.
411430

412431
`get_historical_features` actually returns a `RetrievalJob` object that lazily executes the point-in-time join. The `RetrievalJob` class is extended by each offline store to allow flushing results to e.g. the data warehouse or data lakes.
@@ -419,9 +438,6 @@ registry: gs://[YOUR BUCKET]/registry.pb
419438
offline_store:
420439
type: bigquery
421440
location: EU
422-
flags:
423-
alpha_features: true
424-
on_demand_transforms: true
425441
```
426442

427443
Retrieving the data with `get_historical_features` gives a `BigQueryRetrievalJob` object ([reference](https://rtd.feast.dev/en/master/index.html#feast.infra.offline_stores.bigquery.BigQueryRetrievalJob)) which exposes a `to_bigquery()` method. Thus, you can do:

0 commit comments

Comments
 (0)