Skip to content

Commit 7d2cb80

Browse files
committed
Add GCP support in module 0
Signed-off-by: Danny Chiao <danny@tecton.ai>
1 parent 82193a5 commit 7d2cb80

22 files changed

+361
-29
lines changed
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
name: Feast apply
1+
name: Feast apply (AWS)
22

33
on:
44
push:
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
name: Feast apply (GCP)
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
8+
jobs:
9+
feast_apply:
10+
runs-on: ubuntu-latest
11+
steps:
12+
- name: Setup Python
13+
id: setup-python
14+
uses: actions/setup-python@v2
15+
with:
16+
python-version: "3.7"
17+
architecture: x64
18+
- name: Set up Cloud SDK
19+
uses: google-github-actions/setup-gcloud@v0
20+
with:
21+
project_id: ${{ secrets.GCP_PROJECT_ID }}
22+
service_account_key: ${{ secrets.GCP_SA_KEY }}
23+
export_default_credentials: true
24+
- name: Use gcloud CLI
25+
run: gcloud info
26+
27+
# Run `feast apply`
28+
- uses: actions/checkout@v2
29+
- name: Install feast
30+
run: pip install "feast[aws]"
31+
- name: Run feast apply
32+
env:
33+
FEAST_USAGE: "False"
34+
IS_TEST: "True"
35+
run: |
36+
cd module_0/feature_repo_aws
37+
feast apply
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
name: Feast plan
1+
name: Feast plan (AWS)
22

33
on: [pull_request]
44

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
name: Feast plan (GCP)
2+
3+
on: [pull_request]
4+
5+
jobs:
6+
feast_plan:
7+
runs-on: ubuntu-latest
8+
steps:
9+
- name: Setup Python
10+
id: setup-python
11+
uses: actions/setup-python@v2
12+
with:
13+
python-version: "3.7"
14+
architecture: x64
15+
- name: Set up Cloud SDK
16+
uses: google-github-actions/setup-gcloud@v0
17+
with:
18+
project_id: ${{ secrets.GCP_PROJECT_ID }}
19+
service_account_key: ${{ secrets.GCP_SA_KEY }}
20+
export_default_credentials: true
21+
- name: Use gcloud CLI
22+
run: gcloud info
23+
24+
# Run `feast plan`
25+
- uses: actions/checkout@v2
26+
- name: Install feast
27+
run: pip install "feast[aws]"
28+
- name: Capture `feast plan` in a variable
29+
id: feast_plan
30+
env:
31+
FEAST_USAGE: "False"
32+
FEAST_FORCE_USAGE_UUID: None
33+
IS_TEST: "True"
34+
run: |
35+
body=$(cd module_0/feature_repo_gcp; feast plan)
36+
body="${body//'%'/'%25'}"
37+
body="${body//$'\n'/'%0A'}"
38+
body="${body//$'\r'/'%0D'}"
39+
echo "::set-output name=body::$body"
40+
41+
# Post a comment on the PR with the results of `feast plan`
42+
- name: Create comment
43+
uses: peter-evans/create-or-update-comment@v1
44+
if: ${{ steps.feast_plan.outputs.body }}
45+
with:
46+
issue-number: ${{ github.event.pull_request.number }}
47+
body: |
48+
${{ steps.feast_plan.outputs.body }}

module_0/README.md

Lines changed: 129 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,10 @@ We focus on a specific example (that does not include online features or realtim
1515
- [A quick primer on feature views](#a-quick-primer-on-feature-views)
1616
- [User groups](#user-groups)
1717
- [User group 1: ML Platform Team](#user-group-1-ml-platform-team)
18-
- [Step 0: Setup S3 bucket for registry and file sources](#step-0-setup-s3-bucket-for-registry-and-file-sources)
18+
- [Step 0 (AWS): Setup S3 bucket for registry and file sources](#step-0-aws-setup-s3-bucket-for-registry-and-file-sources)
19+
- [Step 0 (GCP): Setup GCS bucket for registry and file sources](#step-0-gcp-setup-gcs-bucket-for-registry-and-file-sources)
1920
- [Step 1: Setup the feature repo](#step-1-setup-the-feature-repo)
20-
- [Step 1a: Use your configured S3 bucket](#step-1a-use-your-configured-s3-bucket)
21+
- [Step 1a: Use your configured bucket](#step-1a-use-your-configured-bucket)
2122
- [Some further notes and gotchas](#some-further-notes-and-gotchas)
2223
- [Step 1b: Run `feast plan`](#step-1b-run-feast-plan)
2324
- [Step 1c: Run `feast apply`](#step-1c-run-feast-apply)
@@ -43,11 +44,15 @@ We focus on a specific example (that does not include online features or realtim
4344
- [Can I call `get_historical_features` without an entity dataframe? I want features for all entities.](#can-i-call-get_historical_features-without-an-entity-dataframe-i-want-features-for-all-entities)
4445

4546
# Installing Feast
46-
Before we get started, first install Feast with AWS dependencies:
47-
48-
```bash
49-
pip install "feast[aws]"
50-
```
47+
Before we get started, first install Feast with AWS or GCP dependencies:
48+
- AWS
49+
```bash
50+
pip install "feast[aws]"
51+
```
52+
- GCP
53+
```bash
54+
pip install "feast[gcp]"
55+
```
5156

5257
# Exploring the data
5358
We've made some dummy data for this workshop in `infra/driver_stats.parquet`. Let's dive into what the data looks like. You can follow along in [explore_data.ipynb](explore_data.ipynb):
@@ -107,20 +112,17 @@ There are three user groups here worth considering. The ML platform team, the ML
107112
## User group 1: ML Platform Team
108113
The team here sets up the centralized Feast feature repository and CI/CD in GitHub. This is what's seen in `feature_repo_aws/`.
109114

110-
### Step 0: Setup S3 bucket for registry and file sources
115+
### Step 0 (AWS): Setup S3 bucket for registry and file sources
111116
This assumes you have an AWS account & Terraform setup. If you don't:
112117
- Set up an AWS account, install the AWS CLI, and setup your credentials with `aws configure` as per the [AWS credential quickstart](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html#cli-configure-quickstart-creds)
113118
- Install [Terraform](https://learn.hashicorp.com/tutorials/terraform/install-cli#install-terraform)
114119

115120
We've made a simple Terraform project to help with S3 bucket creation (and uploading the test data we need):
116-
```bash
117-
cd infra/
118-
terraform init
119-
terraform apply
120-
```
121+
```console
122+
$ cd infra/aws
123+
$ terraform init
124+
$ terraform apply
121125

122-
Output:
123-
```bash
124126
aws_s3_bucket.feast_bucket: Creating...
125127
aws_s3_bucket.feast_bucket: Creation complete after 3s [id=feast-workshop-danny]
126128
aws_s3_bucket_acl.feast_bucket_acl: Creating...
@@ -136,14 +138,81 @@ project_bucket = "s3://feast-workshop-danny"
136138
project_name = "danny"
137139
```
138140

141+
### Step 0 (GCP): Setup GCS bucket for registry and file sources
142+
This assumes you have an GCP account & Terraform setup. If you don't:
143+
- Set up an GCP account, install the `gcloud` CLI
144+
- Install [Terraform](https://learn.hashicorp.com/tutorials/terraform/install-cli#install-terraform)
145+
146+
We've made a simple Terraform project to help with GCS bucket creation (and uploading the test data we need). We'll also need a service account here which the GitHub CI can later use as well to manage resources.
147+
1. Create a project in GCP (https://console.cloud.google.com/, we use `feast-workshop` as the name below)
148+
2. Setup credentials (e.g. install `gcloud`, then do)
149+
```
150+
gcloud config configurations create personal
151+
gcloud config set project feast-workshop
152+
gcloud auth login
153+
```
154+
3. Create service account and let terraform know where your credentials are
155+
```console
156+
$ gcloud iam service-accounts create feast-workshop-sa --display-name "Terraform and GitHub CI account"
157+
158+
Created service account [feast-workshop-sa].
159+
```
160+
4. Grant the service account access to your project:
161+
```console
162+
$ gcloud projects add-iam-policy-binding feast-workshop --member serviceAccount:feast-workshop-sa@feast-workshop.iam.gserviceaccount.com --role roles/editor
163+
164+
Updated IAM policy for project [feast-workshop].
165+
```
166+
5. Generate keys
167+
``` console
168+
$ gcloud iam service-accounts keys create ~/.config/gcloud/feast-workshop.json --iam-account feast-workshop-sa@feast-workshop.iam.gserviceaccount.com
169+
170+
created key [YOUR PRIVATE KEY] of type [json] as [/Users/dannychiao/.config/gcloud/feast-workshop.json] for [feast-workshop-sa@feast-workshop.iam.gserviceaccount.com]
171+
```
172+
5. Let terraform know where your credentials are:
173+
```bash
174+
export GOOGLE_APPLICATION_CREDENTIALS=~/.config/gcloud/feast-workshop.json
175+
```
176+
6. Run the logic in `infra/gcp`
177+
```console
178+
$ cd infra/gcp
179+
$ terraform init
180+
$ terraform apply
181+
182+
var.gcp_project
183+
The GCP project id
184+
185+
Enter a value: feast-workshop
186+
187+
var.project_name
188+
The project identifier is used to uniquely namespace resources
189+
190+
Enter a value: danny
191+
192+
...
193+
194+
google_storage_bucket.feast_bucket: Creating...
195+
google_storage_bucket.feast_bucket: Creation complete after 1s [id=feast-workshop-danny]
196+
google_storage_bucket_object.driver_stats_upload: Creating...
197+
google_storage_bucket_object.driver_stats_upload: Creation complete after 0s [id=feast-workshop-danny-driver_stats.parquet]
198+
199+
Apply complete! Resources: 2 added, 0 changed, 0 destroyed.
200+
201+
Outputs:
202+
203+
project_bucket = "gs://feast-workshop-danny"
204+
project_name = "danny"
205+
```
206+
139207
### Step 1: Setup the feature repo
140-
The first thing a platform team needs to do is setup a `feature_store.yaml` file within a version controlled repo like GitHub. `feature_store.yaml` is the primary way to configure an overall Feast project. We've setup a sample feature repository in `feature_repo_aws/`
208+
The first thing a platform team needs to do is setup a `feature_store.yaml` file within a version controlled repo like GitHub. `feature_store.yaml` is the primary way to configure an overall Feast project. We've setup a sample feature repository in `feature_repo_aws/` or `feature_repo_gcp/`
141209

142-
#### Step 1a: Use your configured S3 bucket
143-
There are two files in `feature_repo_aws` you need to change to point to your S3 bucket:
210+
#### Step 1a: Use your configured bucket
211+
There are two files in `feature_repo_aws` or `feature_repo_gcp` you need to change to point to your bucket:
144212

145213
**data_sources.py**
146214
```python
215+
# AWS version
147216
driver_stats = FileSource(
148217
name="driver_stats_source",
149218
path="s3://[INSERT YOUR BUCKET]/driver_stats.parquet",
@@ -153,10 +222,21 @@ driver_stats = FileSource(
153222
description="A table describing the stats of a driver based on hourly logs",
154223
owner="test2@gmail.com",
155224
)
225+
226+
# GCP version
227+
driver_stats = FileSource(
228+
name="driver_stats_source",
229+
path="gs://feast-workshop-danny/driver_stats.parquet", # TODO: Replace with your bucket
230+
timestamp_field="event_timestamp",
231+
created_timestamp_column="created",
232+
description="A table describing the stats of a driver based on hourly logs",
233+
owner="test2@gmail.com",
234+
)
156235
```
157236

158237
**feature_store.yaml**
159238
```yaml
239+
# AWS version
160240
project: feast_demo_aws
161241
provider: aws
162242
registry: s3://[INSERT YOUR BUCKET]/registry.pb
@@ -166,6 +246,14 @@ offline_store:
166246
flags:
167247
alpha_features: true
168248
on_demand_transforms: true
249+
250+
# GCP version
251+
project: feast_demo_gcp
252+
provider: gcp
253+
registry: gcs://feast-workshop-danny/registry.pb # TODO: Replace with your bucket
254+
online_store: null
255+
offline_store:
256+
type: file
169257
```
170258
171259
A quick explanation of what's happening in this `feature_store.yaml`:
@@ -300,7 +388,7 @@ We recommend automatically running `feast plan` on incoming PRs to describe what
300388
- This is useful for helping PR reviewers understand the effects of a change.
301389
- This can prevent breaking models in production (e.g. catching PRs that would change features used by an existing model version (`FeatureService)).
302390

303-
An example GitHub workflow that runs `feast plan` on PRs (See [feast_plan.yml](../.github/workflows/feast_plan.yml), which is setup in this workshop repo)
391+
An example GitHub workflow that runs `feast plan` on PRs (See [feast_plan_aws.yml](../.github/workflows/feast_plan_aws.yml) or [feast_plan_gcp.yml](../.github/workflows/feast_plan_gcp.yml), which are setup in this workshop repo)
304392

305393
```yaml
306394
name: Feast plan
@@ -351,7 +439,9 @@ jobs:
351439
${{ steps.feast_plan.outputs.body }}
352440
```
353441
354-
You'll notice the above logic reference two secrets in GitHub corresponding to your AWS credentials. To make this workflow work, create GitHub secrets with your own `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`
442+
You'll notice the above logic reference two secrets in GitHub corresponding to your AWS credentials.
443+
- To make this workflow work, create GitHub secrets with your own `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`
444+
- For GCP, you'll need a `GCP_PROJECT_ID` and `GCP_SA_KEY` for the service account key
355445

356446
See the result on a PR opened in this repo: https://github.com/feast-dev/feast-workshop/pull/3
357447

@@ -360,7 +450,7 @@ See the result on a PR opened in this repo: https://github.com/feast-dev/feast-w
360450
#### Step 3b: Automatically run `feast apply` when pull requests are merged
361451
When a pull request is merged to change the repo, we configure CI/CD to automatically run `feast apply`.
362452

363-
An example GitHub workflow which runs `feast apply` on PR merge (See [feast_apply.yml](../.github/workflows/feast_apply.yml), which is setup in this workshop repo)
453+
An example GitHub workflow which runs `feast apply` on PR merge (See [feast_apply_aws.yml](../.github/workflows/feast_apply_aws.yml), which is setup in this workshop repo)
364454

365455
```yaml
366456
name: Feast apply
@@ -460,7 +550,7 @@ A **feature service** is the recommended way to version a model's feature depend
460550
- Data lineage. Feature services give visibility into what features are depended on in production. Thus, we can prevent accidental changes
461551

462552
### Step 1: Fetch features for batch scoring (method 1)
463-
Go into the `module_0/client` directory and change the `feature_store.yaml` to use your S3 bucket.
553+
Go into the `module_0/client_aws` or `module_0/client_gcp` directory and change the `feature_store.yaml` to use your S3/GCS bucket.
464554

465555
Then, run `python test_fetch.py`, which runs the above code (printing out the dataframe instead of the model):
466556

@@ -476,9 +566,11 @@ $ python test_fetch.py
476566
```
477567

478568
### Step 2: Fetch features for batch scoring (method 2)
479-
You can also not have a `feature_store.yaml` and directly instantiate a `RepoConfig` object in Python (which is the in memory representation of the contents of `feature_store.yaml`). See the `module_0/client_no_yaml` directory for an example of this. The output of `python test_fetch.py` will be identical to the previous step.
569+
You can also not have a `feature_store.yaml` and directly instantiate a `RepoConfig` object in Python (which is the in memory representation of the contents of `feature_store.yaml`). See the `module_0/client_no_yaml` directory for an example of this.
480570

481-
A quick snippet of the code:
571+
We are going to run one of the two python files depending on if you're on GCP vs AWS. The output will be identical to the previous step.
572+
573+
A quick snippet of the AWS code:
482574
```python
483575
from feast import FeatureStore, RepoConfig
484576
from feast.repo_config import RegistryConfig
@@ -495,6 +587,17 @@ store = FeatureStore(config=repo_config)
495587
training_df = store.get_historical_features(...).to_df()
496588
```
497589

590+
```console
591+
$ python test_fetch_aws.py
592+
593+
driver_id event_timestamp conv_rate acc_rate
594+
720 1002 2022-05-15 20:46:00.308163+00:00 0.465875 0.315721
595+
1805 1005 2022-05-15 20:46:00.308163+00:00 0.394072 0.046118
596+
1083 1003 2022-05-15 20:46:00.308163+00:00 0.869917 0.779562
597+
359 1001 2022-05-15 20:46:00.308163+00:00 0.404588 0.407571
598+
1444 1004 2022-05-15 20:46:00.308163+00:00 0.977276 0.051582
599+
```
600+
498601
### Step 3 (optional): Scaling `get_historical_features` to large datasets
499602
You may note that the above example uses a `to_df()` method to load the training dataset into memory and may be wondering how this scales if you have very large datasets.
500603

@@ -536,7 +639,7 @@ We don't need to do anything new here since data scientists will be doing many o
536639

537640
There are two ways data scientists can use Feast:
538641
- Use Feast primarily as a way of pulling production ready features.
539-
- See the `client/` or `client_no_yaml` folders for examples of how users can pull features by only having a `feature_store.yaml` or instantiating a `RepoConfig` object
642+
- See the `client_aws/` or `client_no_yaml` folders for examples of how users can pull features by only having a `feature_store.yaml` or instantiating a `RepoConfig` object
540643
- This is **not recommended** since data scientists cannot register feature services to indicate they depend on certain features in production.
541644
- **[Recommended]** Have a local copy of the feature repository (e.g. `git clone`) and author / iterate / re-use features.
542645
- Data scientist can:
@@ -549,7 +652,7 @@ There are two ways data scientists can use Feast:
549652
Data scientists can also investigate other models and their dependent features / data sources / on demand transformations through the repository or through the Web UI (by running `feast ui`)
550653

551654
# Exercise: merge a sample PR in your fork
552-
In your own fork of the `feast-workshop` project, with the above setup (i.e. you've made GitHub secrets with your own `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`), try making a change with a pull request! And then merge that pull request to see the change propagate in the registry.
655+
In your own fork of the `feast-workshop` project, with the above setup (i.e. you've made GitHub secrets with your own `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` or `GCP_PROJECT_ID` and `GCP_SA_KEY`), try making a change with a pull request! And then merge that pull request to see the change propagate in the registry.
553656

554657
Some ideas for what to try:
555658
- Changing metadata (owner, description, tags) on an existing `FeatureView`
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
project: feast_demo_gcp
2+
provider: gcp
3+
registry: gs://feast-workshop-danny/registry.pb # TODO: Replace with your bucket
4+
online_store: null
5+
offline_store:
6+
type: file

module_0/client_gcp/test_fetch.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
from feast import FeatureStore
2+
3+
store = FeatureStore(repo_path=".")
4+
5+
import pandas as pd
6+
7+
# Get the latest feature values for unique entities
8+
entity_df = pd.DataFrame.from_dict({"driver_id": [1001, 1002, 1003, 1004, 1005],})
9+
entity_df["event_timestamp"] = pd.to_datetime("now", utc=True)
10+
training_df = store.get_historical_features(
11+
entity_df=entity_df, features=store.get_feature_service("model_v2"),
12+
).to_df()
13+
14+
# Make batch predictions
15+
# predictions = model.predict(training_df)
16+
print(training_df)

0 commit comments

Comments
 (0)