Welcome to the Feast quickstart! This quickstart is intended to get you up and running with Feast in your local environment. It covers the following workflows:
- Setting up Feast
- Registering features
- Constructing training datasets from offline data
- Materializing feature data to the online feature store
- Fetching feature vectors for real-time inference
This quickstart uses some example data about a ride-hailing app to walk through Feast. Let's get into it!
A Feast installation includes a Python SDK and a CLI. Both can be installed from pip:
pip install feastYou can test your installation by runningfeast version from your command line:
$ feast version
# 0.10We can bootstrap a feature repository using the feast init command:
feast init feature_repo
# Creating a new Feast repository in <cwd>/feature_repo.This command generates an example repository containing the following files.
{% code title="CLI" %}
tree
# .
# └── feature_repo
# ├── data
# │ └── driver_stats.parquet
# ├── example.py
# └── feature_store.yaml{% endcode %}
Now, let's take a look at these files. First, cd into the feature repository:
{% code title="CLI" %}
cd feature_repo
{% endcode %}
Next, take a look at the feature_store.yaml file, which configures how the feature store runs:
{% code title="feature_store.yaml" %}
project: feature_repo
registry: data/registry.db
provider: local
online_store:
path: data/online_store.db{% endcode %}
An important field to be aware of is provider, which specifies the environment that Feast will run in. We've initialized provider=local, indicating that Feast will run the feature store on our local machine. See Repository Config for more details.
Next, take a look at example.py, which defines some example features:
{% code title="example.py" %}
# This is an example feature definition file
from google.protobuf.duration_pb2 import Duration
from feast import Entity, Feature, FeatureView, ValueType
from feast.data_source import FileSource
# Read data from parquet files. Parquet is convenient for local development mode. For
# production, you can use your favorite DWH, such as BigQuery. See Feast documentation
# for more info.
driver_hourly_stats = FileSource(
path="/<cwd>/feature_repo/data/driver_stats.parquet",
event_timestamp_column="datetime",
created_timestamp_column="created",
)
# Define an entity for the driver. You can think of entity as a primary key used to
# fetch features.
driver = Entity(name="driver_id", value_type=ValueType.INT64, description="driver id",)
# Our parquet files contain sample data that includes a driver_id column, timestamps and
# three feature column. Here we define a Feature View that will allow us to serve this
# data to our model online.
driver_hourly_stats_view = FeatureView(
name="driver_hourly_stats",
entities=["driver_id"],
ttl=Duration(seconds=86400 * 1),
features=[
Feature(name="conv_rate", dtype=ValueType.FLOAT),
Feature(name="acc_rate", dtype=ValueType.FLOAT),
Feature(name="avg_daily_trips", dtype=ValueType.INT64),
],
online=True,
input=driver_hourly_stats,
tags={},
){% endcode %}
There are three objects defined in this file:
- A
DataSource, which is a pointer to persistent feature data. In this example, we're using aFileSource, which points to a set of parquet files on our local machine. - An
Entity, which is a metadata object that is used to organize and join features. In this example, our entity isdriver_id, indicating that our features are modeling attributes of drivers. - A
FeatureView, which defines a group of features. In this example, our features are statistics about drivers, like their conversion rate and average daily trips.
Feature definitions in Feast work similarly to Terraform: local definitions don't actually affect what's running in production until we explicitly register them with Feast. At this point, we have a set of feature definitions, but we haven't registered them with Feast yet.
We can register our features by running feast apply from the CLI:
{% code title="CLI" %}
feast apply
# Processing <cwd>/feature_repo/example.py as example
# Done!{% endcode %}
This command has registered our features to Feast. They're now ready for offline retrieval and materialization.
Feast generates point-in-time accurate training data. In our ride-hailing example, we are using statistics about drivers to predict the likelihood of a booking completion. When we generate training data, we want to know what the features of the drivers were at the time of prediction (in the past.)
Generating training datasets is a workflow best done from an interactive computing environment, like a Jupyter notebook. You can start a Jupyter notebook by running jupyter notebook from the command line. Then, run the following code to generate an entity DataFrame:
{% code title="jupyter notebook" %}
import pandas as pd
from datetime import datetime
# entity_df generally comes from upstream systems
entity_df = pd.DataFrame.from_dict({
"driver_id": [1001, 1002, 1003, 1004],
"event_timestamp": [
datetime(2021, 4, 12, 10, 59, 42),
datetime(2021, 4, 12, 8, 12, 10),
datetime(2021, 4, 12, 16, 40, 26),
datetime(2021, 4, 12, 15, 1 , 12)
]
})
entity_df.head(){% endcode %}
This DataFrame represents the entity keys and timestamps that we want feature values for. We can pass this Entity DataFrame into Feast, and Feast will fetch point-in-time correct features for each row:
{% code title="jupyter notebook" %}
from feast import FeatureStore
store = FeatureStore(repo_path=".")
training_df = store.get_historical_features(
entity_df=entity_df,
feature_refs = [
'driver_hourly_stats:conv_rate',
'driver_hourly_stats:acc_rate',
'driver_hourly_stats:avg_daily_trips'
],
).to_df()
training_df.head(){% endcode %}
Feast has joined on the correct feature values for the drivers that specified, as of the timestamp we specified.
This DataFrame contains all the necessary signals needed to train a model, excluding labels, which are typically managed outside of Feast. Before you can train a model, you'll need to join on labels from external systems.
We have just seen how we can use Feast in the model training workflow. Now, we'll see how Feast fits into the model inferencing workflow.
When running inference on Feast features, the first step is to populate the online store to make our features available for real-time inference. When using the local provider, the online store is a SQLite database.
To materialize features, run the following command from the CLI:
{% code title="CLI" %}
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize-incremental $CURRENT_TIME
# Materializing feature view driver_hourly_stats from 2021-04-13 23:50:05.754655-04:00
# to 2021-04-14 23:50:04-04:00 done!{% endcode %}
We've just populated the online store with the most recent features from the offline store. Our feature values are now ready for real-time retrieval.
After we materialize our features, we can use the store.get_online_features to fetch the latest feature values for real-time inference:
{% code title="jupyter notebook" %}
from pprint import pprint
from feast import FeatureStore
store = FeatureStore(repo_path=".")
feature_vector = store.get_online_features(
feature_refs=[
'driver_hourly_stats:conv_rate',
'driver_hourly_stats:acc_rate',
'driver_hourly_stats:avg_daily_trips'
],
entity_rows=[{"driver_id": 1001}]
).to_dict()
pprint(feature_vector){% endcode %}
{
'driver_id': [1001],
'driver_hourly_stats__conv_rate': [0.49274],
'driver_hourly_stats__acc_rate': [0.92743],
'driver_hourly_stats__avg_daily_trips': [72],
}
This feature vector can be used for real-time inference, for example, in a model serving microservice.
This quickstart covered the essential workflows of using Feast in your local environment. The next step is to set provider="gcp" in your feature_store.yaml file and push your work to production deployment. You can also use the feast init -t gcp command in the CLI to initialize a feature repository with example features in the GCP environment.
- See Create a feature repository for more information on the workflows we covered.
- Join our Slack group to talk to other Feast users and the maintainers!


