You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -15,9 +15,10 @@ We focus on a specific example (that does not include online features or realtim
15
15
-[A quick primer on feature views](#a-quick-primer-on-feature-views)
16
16
-[User groups](#user-groups)
17
17
-[User group 1: ML Platform Team](#user-group-1-ml-platform-team)
18
-
-[Step 0: Setup S3 bucket for registry and file sources](#step-0-setup-s3-bucket-for-registry-and-file-sources)
18
+
-[Step 0 (AWS): Setup S3 bucket for registry and file sources](#step-0-aws-setup-s3-bucket-for-registry-and-file-sources)
19
+
-[Step 0 (GCP): Setup GCS bucket for registry and file sources](#step-0-gcp-setup-gcs-bucket-for-registry-and-file-sources)
19
20
-[Step 1: Setup the feature repo](#step-1-setup-the-feature-repo)
20
-
-[Step 1a: Use your configured S3 bucket](#step-1a-use-your-configured-s3-bucket)
21
+
-[Step 1a: Use your configured bucket](#step-1a-use-your-configured-bucket)
21
22
-[Some further notes and gotchas](#some-further-notes-and-gotchas)
22
23
-[Step 1b: Run `feast plan`](#step-1b-run-feast-plan)
23
24
-[Step 1c: Run `feast apply`](#step-1c-run-feast-apply)
@@ -43,11 +44,15 @@ We focus on a specific example (that does not include online features or realtim
43
44
-[Can I call `get_historical_features` without an entity dataframe? I want features for all entities.](#can-i-call-get_historical_features-without-an-entity-dataframe-i-want-features-for-all-entities)
44
45
45
46
# Installing Feast
46
-
Before we get started, first install Feast with AWS dependencies:
47
-
48
-
```bash
49
-
pip install "feast[aws]"
50
-
```
47
+
Before we get started, first install Feast with AWS or GCP dependencies:
48
+
- AWS
49
+
```bash
50
+
pip install "feast[aws]"
51
+
```
52
+
- GCP
53
+
```bash
54
+
pip install "feast[gcp]"
55
+
```
51
56
52
57
# Exploring the data
53
58
We've made some dummy data for this workshop in `infra/driver_stats.parquet`. Let's dive into what the data looks like. You can follow along in [explore_data.ipynb](explore_data.ipynb):
@@ -107,20 +112,17 @@ There are three user groups here worth considering. The ML platform team, the ML
107
112
## User group 1: ML Platform Team
108
113
The team here sets up the centralized Feast feature repository and CI/CD in GitHub. This is what's seen in `feature_repo_aws/`.
109
114
110
-
### Step 0: Setup S3 bucket for registry and file sources
115
+
### Step 0 (AWS): Setup S3 bucket for registry and file sources
111
116
This assumes you have an AWS account & Terraform setup. If you don't:
112
117
- Set up an AWS account, install the AWS CLI, and setup your credentials with `aws configure` as per the [AWS credential quickstart](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html#cli-configure-quickstart-creds)
We've made a simple Terraform project to help with GCS bucket creation (and uploading the test data we need). We'll also need a service account here which the GitHub CI can later use as well to manage resources.
147
+
1. Create a project in GCP (https://console.cloud.google.com/, we use `feast-workshop` as the name below)
148
+
2. Setup credentials (e.g. install `gcloud`, then do)
149
+
```
150
+
gcloud config configurations create personal
151
+
gcloud config set project feast-workshop
152
+
gcloud auth login
153
+
```
154
+
3. Create service account and let terraform know where your credentials are
155
+
```console
156
+
$ gcloud iam service-accounts create feast-workshop-sa --display-name "Terraform and GitHub CI account"
157
+
158
+
Created service account [feast-workshop-sa].
159
+
```
160
+
4. Grant the service account access to your project:
$ gcloud iam service-accounts keys create ~/.config/gcloud/feast-workshop.json --iam-account feast-workshop-sa@feast-workshop.iam.gserviceaccount.com
169
+
170
+
created key [YOUR PRIVATE KEY] of type [json] as [/Users/dannychiao/.config/gcloud/feast-workshop.json] for [feast-workshop-sa@feast-workshop.iam.gserviceaccount.com]
The first thing a platform team needs to do is setup a `feature_store.yaml` file within a version controlled repo like GitHub. `feature_store.yaml` is the primary way to configure an overall Feast project. We've setup a sample feature repository in `feature_repo_aws/`
208
+
The first thing a platform team needs to do is setup a `feature_store.yaml` file within a version controlled repo like GitHub. `feature_store.yaml` is the primary way to configure an overall Feast project. We've setup a sample feature repository in `feature_repo_aws/` or `feature_repo_gcp/`
141
209
142
-
#### Step 1a: Use your configured S3 bucket
143
-
There are two files in `feature_repo_aws` you need to change to point to your S3 bucket:
210
+
#### Step 1a: Use your configured bucket
211
+
There are two files in `feature_repo_aws` or `feature_repo_gcp` you need to change to point to your bucket:
144
212
145
213
**data_sources.py**
146
214
```python
215
+
# AWS version
147
216
driver_stats = FileSource(
148
217
name="driver_stats_source",
149
218
path="s3://[INSERT YOUR BUCKET]/driver_stats.parquet",
@@ -153,10 +222,21 @@ driver_stats = FileSource(
153
222
description="A table describing the stats of a driver based on hourly logs",
154
223
owner="test2@gmail.com",
155
224
)
225
+
226
+
# GCP version
227
+
driver_stats = FileSource(
228
+
name="driver_stats_source",
229
+
path="gs://feast-workshop-danny/driver_stats.parquet", # TODO: Replace with your bucket
230
+
timestamp_field="event_timestamp",
231
+
created_timestamp_column="created",
232
+
description="A table describing the stats of a driver based on hourly logs",
233
+
owner="test2@gmail.com",
234
+
)
156
235
```
157
236
158
237
**feature_store.yaml**
159
238
```yaml
239
+
# AWS version
160
240
project: feast_demo_aws
161
241
provider: aws
162
242
registry: s3://[INSERT YOUR BUCKET]/registry.pb
@@ -166,6 +246,14 @@ offline_store:
166
246
flags:
167
247
alpha_features: true
168
248
on_demand_transforms: true
249
+
250
+
# GCP version
251
+
project: feast_demo_gcp
252
+
provider: gcp
253
+
registry: gcs://feast-workshop-danny/registry.pb # TODO: Replace with your bucket
254
+
online_store: null
255
+
offline_store:
256
+
type: file
169
257
```
170
258
171
259
A quick explanation of what's happening in this `feature_store.yaml`:
@@ -300,7 +388,7 @@ We recommend automatically running `feast plan` on incoming PRs to describe what
300
388
- This is useful for helping PR reviewers understand the effects of a change.
301
389
- This can prevent breaking models in production (e.g. catching PRs that would change features used by an existing model version (`FeatureService)).
302
390
303
-
An example GitHub workflow that runs `feast plan` on PRs (See [feast_plan.yml](../.github/workflows/feast_plan.yml), which is setup in this workshop repo)
391
+
An example GitHub workflow that runs `feast plan` on PRs (See [feast_plan_aws.yml](../.github/workflows/feast_plan_aws.yml) or [feast_plan_gcp.yml](../.github/workflows/feast_plan_gcp.yml), which are setup in this workshop repo)
304
392
305
393
```yaml
306
394
name: Feast plan
@@ -351,7 +439,9 @@ jobs:
351
439
${{ steps.feast_plan.outputs.body }}
352
440
```
353
441
354
-
You'll notice the above logic reference two secrets in GitHub corresponding to your AWS credentials. To make this workflow work, create GitHub secrets with your own `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`
442
+
You'll notice the above logic reference two secrets in GitHub corresponding to your AWS credentials.
443
+
- To make this workflow work, create GitHub secrets with your own `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`
444
+
- For GCP, you'll need a `GCP_PROJECT_ID` and `GCP_SA_KEY` for the service account key
355
445
356
446
See the result on a PR opened in this repo: https://github.com/feast-dev/feast-workshop/pull/3
357
447
@@ -360,7 +450,7 @@ See the result on a PR opened in this repo: https://github.com/feast-dev/feast-w
360
450
#### Step 3b: Automatically run `feast apply` when pull requests are merged
361
451
When a pull request is merged to change the repo, we configure CI/CD to automatically run `feast apply`.
362
452
363
-
An example GitHub workflow which runs `feast apply` on PR merge (See [feast_apply.yml](../.github/workflows/feast_apply.yml), which is setup in this workshop repo)
453
+
An example GitHub workflow which runs `feast apply` on PR merge (See [feast_apply_aws.yml](../.github/workflows/feast_apply_aws.yml), which is setup in this workshop repo)
364
454
365
455
```yaml
366
456
name: Feast apply
@@ -460,7 +550,7 @@ A **feature service** is the recommended way to version a model's feature depend
460
550
- Data lineage. Feature services give visibility into what features are depended on in production. Thus, we can prevent accidental changes
461
551
462
552
### Step 1: Fetch features for batch scoring (method 1)
463
-
Go into the `module_0/client` directory and change the `feature_store.yaml` to use your S3 bucket.
553
+
Go into the `module_0/client_aws` or `module_0/client_gcp` directory and change the `feature_store.yaml` to use your S3/GCS bucket.
464
554
465
555
Then, run `python test_fetch.py`, which runs the above code (printing out the dataframe instead of the model):
466
556
@@ -476,9 +566,11 @@ $ python test_fetch.py
476
566
```
477
567
478
568
### Step 2: Fetch features for batch scoring (method 2)
479
-
You can also not have a `feature_store.yaml` and directly instantiate a `RepoConfig` object in Python (which is the in memory representation of the contents of `feature_store.yaml`). See the `module_0/client_no_yaml` directory for an example of this. The output of `python test_fetch.py` will be identical to the previous step.
569
+
You can also not have a `feature_store.yaml` and directly instantiate a `RepoConfig` object in Python (which is the in memory representation of the contents of `feature_store.yaml`). See the `module_0/client_no_yaml` directory for an example of this.
480
570
481
-
A quick snippet of the code:
571
+
We are going to run one of the two python files depending on if you're on GCP vs AWS. The output will be identical to the previous step.
572
+
573
+
A quick snippet of the AWS code:
482
574
```python
483
575
from feast import FeatureStore, RepoConfig
484
576
from feast.repo_config import RegistryConfig
@@ -495,6 +587,17 @@ store = FeatureStore(config=repo_config)
### Step 3 (optional): Scaling `get_historical_features` to large datasets
499
602
You may note that the above example uses a `to_df()` method to load the training dataset into memory and may be wondering how this scales if you have very large datasets.
500
603
@@ -536,7 +639,7 @@ We don't need to do anything new here since data scientists will be doing many o
536
639
537
640
There are two ways data scientists can use Feast:
538
641
- Use Feast primarily as a way of pulling production ready features.
539
-
- See the `client/` or `client_no_yaml` folders for examples of how users can pull features by only having a `feature_store.yaml` or instantiating a `RepoConfig` object
642
+
- See the `client_aws/` or `client_no_yaml` folders for examples of how users can pull features by only having a `feature_store.yaml` or instantiating a `RepoConfig` object
540
643
- This is **not recommended** since data scientists cannot register feature services to indicate they depend on certain features in production.
541
644
- **[Recommended]** Have a local copy of the feature repository (e.g. `git clone`) and author / iterate / re-use features.
542
645
- Data scientist can:
@@ -549,7 +652,7 @@ There are two ways data scientists can use Feast:
549
652
Data scientists can also investigate other models and their dependent features / data sources / on demand transformations through the repository or through the Web UI (by running `feast ui`)
550
653
551
654
# Exercise: merge a sample PR in your fork
552
-
In your own fork of the `feast-workshop` project, with the above setup (i.e. you've made GitHub secrets with your own `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`), try making a change with a pull request! And then merge that pull request to see the change propagate in the registry.
655
+
In your own fork of the `feast-workshop` project, with the above setup (i.e. you've made GitHub secrets with your own `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` or `GCP_PROJECT_ID` and `GCP_SA_KEY`), try making a change with a pull request! And then merge that pull request to see the change propagate in the registry.
553
656
554
657
Some ideas for what to try:
555
658
- Changing metadata (owner, description, tags) on an existing `FeatureView`
0 commit comments