Skip to content

Commit 0904638

Browse files
jparthasarthygitbook-bot
authored andcommitted
GitBook: [master] one page and 6 assets modified
1 parent a1823b7 commit 0904638

File tree

7 files changed

+209
-0
lines changed

7 files changed

+209
-0
lines changed
22.2 KB
Loading
6.57 KB
Loading
6.57 KB
Loading

docs/.gitbook/assets/image (2).png

20.9 KB
Loading

docs/.gitbook/assets/image (3).png

21.1 KB
Loading
165 KB
Loading

docs/quickstart.md

Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,211 @@
11
# Quickstart
22

3+
Welcome to the Feast quickstart! This quickstart is intended to get you up and running with Feast in your local environment. It covers the following workflows:
4+
5+
1. Setting up Feast
6+
2. Registering features
7+
3. Constructing training datasets from offline data
8+
4. Materializing feature data to the online feature store
9+
5. Fetching feature vectors for inference
10+
11+
The quickstart uses some example data about a ride-hailing app to walk through Feast. Let's get into it!
12+
13+
## 1. Setting up Feast
14+
15+
A Feast installation includes a Python SDK and a CLI. Feast can be installed from `pip`:
16+
17+
```text
18+
pip install feast
19+
```
20+
21+
You can test your installation by running`feast version` from your command line:
22+
23+
```text
24+
$ feast version
25+
26+
> "0.10"
27+
```
28+
29+
## 2. Registering features to Feast
30+
31+
We can bootstrap a feature repository using the `feast init` command:
32+
33+
```text
34+
$ feast init
35+
36+
> Generated feature_store.yaml and example features in example_repo.py
37+
Now try running `feast apply` to apply and `feast materialize` to
38+
sync data to the online store
39+
```
40+
41+
This command generates two files. Let's take a look at `feature_store.yaml`:
42+
43+
```text
44+
project: happy_ant
45+
registry: data/registry.db
46+
provider: local
47+
online_store:
48+
local:
49+
path: data/online_store.db
50+
```
51+
52+
This file defines how the feature store is configured to run. The most important option here is `provider`, which specifies the environment that Feast will run in. We've initialized `provider=local`, indicating that Feast will run the feature store on our local machine. See [Repository Config](reference/repository-config.md) for more details.
53+
54+
Next, take a look at `example.py`. This file defines some example features:
55+
56+
```text
57+
# This is an example feature definition file
58+
59+
from google.protobuf.duration_pb2 import Duration
60+
61+
from feast import Entity, Feature, FeatureView, ValueType
62+
from feast.data_source import FileSource
63+
64+
# Read data from parquet files. Parquet is convenient for local development mode. For
65+
# production, you can use your favorite DWH, such as BigQuery. See Feast documentation
66+
# for more info.
67+
driver_hourly_stats = FileSource(
68+
path="/Users/jay/Projects/feast-10-test-2/hehe/data/driver_stats.parquet",
69+
event_timestamp_column="datetime",
70+
created_timestamp_column="created",
71+
)
72+
73+
# Define an entity for the driver. You can think of entity as a primary key used to
74+
# fetch features.
75+
driver = Entity(name="driver_id", value_type=ValueType.INT64, description="driver id",)
76+
77+
# Our parquet files contain sample data that includes a driver_id column, timestamps and
78+
# three feature column. Here we define a Feature View that will allow us to serve this
79+
# data to our model online.
80+
driver_hourly_stats_view = FeatureView(
81+
name="driver_hourly_stats",
82+
entities=["driver_id"],
83+
ttl=Duration(seconds=86400 * 1),
84+
features=[
85+
Feature(name="conv_rate", dtype=ValueType.FLOAT),
86+
Feature(name="acc_rate", dtype=ValueType.FLOAT),
87+
Feature(name="avg_daily_trips", dtype=ValueType.INT64),
88+
],
89+
online=True,
90+
input=driver_hourly_stats,
91+
tags={},
92+
)
93+
94+
```
95+
96+
This file defines three objects:
97+
98+
* A `DataSource`, which is a pointer to persistent feature data. In this example, we're using a `FileSource`, which points to a set of parquet files on our local machine.
99+
* An `Entity`, which is a metadata object that is used to organize and join features. In this example, our entity is `driver_id`, indicating that our features are modeling attributes of drivers.
100+
* A `FeatureView`, which defines a group of features. In this example, our features are statistics about drivers, like their conversion rate and average daily trips.
101+
102+
Feature definitions in Feast work similarly to Terraform: local definitions don't actually affect what's running in production until we explicitly register them with Feast. At this point, we have a set of feature definitions, but we haven't registered them with Feast yet.
103+
104+
We can register our features by running `feast apply` from the CLI.
105+
106+
```text
107+
$ feast apply
108+
109+
> Processing /Users/jay/Projects/feast-10-test-2/hehe/example.py as example
110+
> Done!
111+
112+
```
113+
114+
After this command completes, our features have been registered to Feast, and they're now ready for offline retrieval and materialization.
115+
116+
## 3. Generating training data
117+
118+
Feast generates point-in-time accurate training data. In our example, we are using statistics about drivers to predict the likelihood of a booking completing. When we generate training data, we want to know what the features of the drivers were _at the time of prediction_.
119+
120+
![](.gitbook/assets/ride-hailing.png)
121+
122+
Generating training datasets is a workflow best done from an interactive computing environment, like a Jupyter notebook. You can start a Jupyter notebook by running `jupyter notebook` from the command line. Then, run the following code to generate an _entity DataFrame_:
123+
124+
```text
125+
import pandas as pd
126+
from datetime import datetime
127+
128+
# entity_df generally comes from upstream systems
129+
entity_df = pd.DataFrame.from_dict({
130+
"driver_id": [1001, 1002, 1003, 1004],
131+
"event_timestamp": [
132+
datetime(2021, 4, 12, 10, 59, 42),
133+
datetime(2021, 4, 12, 8, 12, 10),
134+
datetime(2021, 4, 12, 16, 40, 26),
135+
datetime(2021, 4, 12, 15, 1, 12)
136+
]
137+
})
138+
139+
entity_df.head()
140+
```
141+
142+
![](.gitbook/assets/feast-landing-page-blog-post-page-5.png)
143+
144+
This DataFrame represents the entity keys and timestamps that we want feature values as of. We can pass this Entity DataFrame into Feast, and Feast will fetch point-in-time correct features for each row:
145+
146+
```text
147+
from feast import FeatureStore
148+
149+
store = FeatureStore(repo_path=".")
150+
151+
training_df = store.get_historical_features(
152+
entity_df=entity_df,
153+
feature_refs = [
154+
'driver_hourly_stats:conv_rate',
155+
'driver_hourly_stats:acc_rate',
156+
'driver_hourly_stats:avg_daily_trips'
157+
],
158+
).to_df()
159+
160+
training_df.head()
161+
```
162+
163+
![](.gitbook/assets/feast-landing-page-blog-post-feature-df.png)
164+
165+
This DataFrame contains all the necessary signals needed to train a model, excluding labels, which are typically managed outside of Feast. Before you can train a model, you'll need to join on labels from external systems.
166+
167+
## 4. Materializing features to the online store
168+
169+
We just used Feast to generate Using the `local` provider, the online store is a SQLite database. To materialize features, run the following command from the CLI:
170+
171+
```text
172+
# Materialize feature values up until the current time
173+
feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")
174+
```
175+
176+
We've just populated the online store with the most up-to-date features from the offline store. Our feature values are now ready for real-time fetching.
177+
178+
## 5. Fetching feature vectors for inference
179+
180+
After we materialize our features, we can use the `store.get_online_features` to fetch the _latest_ feature values for real-time inference. Run the following code in your notebook to fetch online features:
181+
182+
```text
183+
from pprint import pprint
184+
185+
feature_vector = store.get_online_features(
186+
feature_refs=[
187+
'driver_hourly_stats:conv_rate',
188+
'driver_hourly_stats:conv_rate',
189+
'driver_hourly_stats:avg_daily_trips'
190+
],
191+
entity_rows=[{"user_id": "1001"}]
192+
).to_dict()
193+
194+
print(feature_vector)
195+
```
196+
197+
```text
198+
pprint(feature_vector)
199+
200+
{'driver_hourly_stats__avg_daily_trips': [None], 'driver_hourly_stats__conv_rate': [None], 'driver_id': ['1005']}
201+
```
202+
203+
This feature vector can be used for real-time inference, for example, in a model-serving microservice.
204+
205+
## Next steps
206+
207+
This quickstart covered the essential workflows of using Feast in your local environment. The next step is to set `provider="gcp"` in your `feature_store.yaml` file and push your work to production deployment. You can also use the `feast init -t gcp` command in the CLI to initialize a feature repository with example features in the GCP environment.
208+
209+
* See [Create a feature repository](how-to-guides/create-a-feature-repository.md) for more information on the workflows we covered.
210+
* Join our[ Slack group](https://slack.com) to talk to other Feast users and the maintainers!
211+

0 commit comments

Comments
 (0)