|
1 | 1 | # Quickstart |
2 | 2 |
|
| 3 | +Welcome to the Feast quickstart! This quickstart is intended to get you up and running with Feast in your local environment. It covers the following workflows: |
| 4 | + |
| 5 | +1. Setting up Feast |
| 6 | +2. Registering features |
| 7 | +3. Constructing training datasets from offline data |
| 8 | +4. Materializing feature data to the online feature store |
| 9 | +5. Fetching feature vectors for inference |
| 10 | + |
| 11 | +The quickstart uses some example data about a ride-hailing app to walk through Feast. Let's get into it! |
| 12 | + |
| 13 | +## 1. Setting up Feast |
| 14 | + |
| 15 | +A Feast installation includes a Python SDK and a CLI. Feast can be installed from `pip`: |
| 16 | + |
| 17 | +```text |
| 18 | +pip install feast |
| 19 | +``` |
| 20 | + |
| 21 | +You can test your installation by running`feast version` from your command line: |
| 22 | + |
| 23 | +```text |
| 24 | +$ feast version |
| 25 | +
|
| 26 | +> "0.10" |
| 27 | +``` |
| 28 | + |
| 29 | +## 2. Registering features to Feast |
| 30 | + |
| 31 | +We can bootstrap a feature repository using the `feast init` command: |
| 32 | + |
| 33 | +```text |
| 34 | +$ feast init |
| 35 | +
|
| 36 | +> Generated feature_store.yaml and example features in example_repo.py |
| 37 | + Now try running `feast apply` to apply and `feast materialize` to |
| 38 | + sync data to the online store |
| 39 | +``` |
| 40 | + |
| 41 | +This command generates two files. Let's take a look at `feature_store.yaml`: |
| 42 | + |
| 43 | +```text |
| 44 | +project: happy_ant |
| 45 | +registry: data/registry.db |
| 46 | +provider: local |
| 47 | +online_store: |
| 48 | + local: |
| 49 | + path: data/online_store.db |
| 50 | +``` |
| 51 | + |
| 52 | +This file defines how the feature store is configured to run. The most important option here is `provider`, which specifies the environment that Feast will run in. We've initialized `provider=local`, indicating that Feast will run the feature store on our local machine. See [Repository Config](reference/repository-config.md) for more details. |
| 53 | + |
| 54 | +Next, take a look at `example.py`. This file defines some example features: |
| 55 | + |
| 56 | +```text |
| 57 | +# This is an example feature definition file |
| 58 | +
|
| 59 | +from google.protobuf.duration_pb2 import Duration |
| 60 | +
|
| 61 | +from feast import Entity, Feature, FeatureView, ValueType |
| 62 | +from feast.data_source import FileSource |
| 63 | +
|
| 64 | +# Read data from parquet files. Parquet is convenient for local development mode. For |
| 65 | +# production, you can use your favorite DWH, such as BigQuery. See Feast documentation |
| 66 | +# for more info. |
| 67 | +driver_hourly_stats = FileSource( |
| 68 | + path="/Users/jay/Projects/feast-10-test-2/hehe/data/driver_stats.parquet", |
| 69 | + event_timestamp_column="datetime", |
| 70 | + created_timestamp_column="created", |
| 71 | +) |
| 72 | +
|
| 73 | +# Define an entity for the driver. You can think of entity as a primary key used to |
| 74 | +# fetch features. |
| 75 | +driver = Entity(name="driver_id", value_type=ValueType.INT64, description="driver id",) |
| 76 | +
|
| 77 | +# Our parquet files contain sample data that includes a driver_id column, timestamps and |
| 78 | +# three feature column. Here we define a Feature View that will allow us to serve this |
| 79 | +# data to our model online. |
| 80 | +driver_hourly_stats_view = FeatureView( |
| 81 | + name="driver_hourly_stats", |
| 82 | + entities=["driver_id"], |
| 83 | + ttl=Duration(seconds=86400 * 1), |
| 84 | + features=[ |
| 85 | + Feature(name="conv_rate", dtype=ValueType.FLOAT), |
| 86 | + Feature(name="acc_rate", dtype=ValueType.FLOAT), |
| 87 | + Feature(name="avg_daily_trips", dtype=ValueType.INT64), |
| 88 | + ], |
| 89 | + online=True, |
| 90 | + input=driver_hourly_stats, |
| 91 | + tags={}, |
| 92 | +) |
| 93 | +
|
| 94 | +``` |
| 95 | + |
| 96 | +This file defines three objects: |
| 97 | + |
| 98 | +* A `DataSource`, which is a pointer to persistent feature data. In this example, we're using a `FileSource`, which points to a set of parquet files on our local machine. |
| 99 | +* An `Entity`, which is a metadata object that is used to organize and join features. In this example, our entity is `driver_id`, indicating that our features are modeling attributes of drivers. |
| 100 | +* A `FeatureView`, which defines a group of features. In this example, our features are statistics about drivers, like their conversion rate and average daily trips. |
| 101 | + |
| 102 | +Feature definitions in Feast work similarly to Terraform: local definitions don't actually affect what's running in production until we explicitly register them with Feast. At this point, we have a set of feature definitions, but we haven't registered them with Feast yet. |
| 103 | + |
| 104 | +We can register our features by running `feast apply` from the CLI. |
| 105 | + |
| 106 | +```text |
| 107 | +$ feast apply |
| 108 | +
|
| 109 | +> Processing /Users/jay/Projects/feast-10-test-2/hehe/example.py as example |
| 110 | +> Done! |
| 111 | +
|
| 112 | +``` |
| 113 | + |
| 114 | +After this command completes, our features have been registered to Feast, and they're now ready for offline retrieval and materialization. |
| 115 | + |
| 116 | +## 3. Generating training data |
| 117 | + |
| 118 | +Feast generates point-in-time accurate training data. In our example, we are using statistics about drivers to predict the likelihood of a booking completing. When we generate training data, we want to know what the features of the drivers were _at the time of prediction_. |
| 119 | + |
| 120 | + |
| 121 | + |
| 122 | +Generating training datasets is a workflow best done from an interactive computing environment, like a Jupyter notebook. You can start a Jupyter notebook by running `jupyter notebook` from the command line. Then, run the following code to generate an _entity DataFrame_: |
| 123 | + |
| 124 | +```text |
| 125 | +import pandas as pd |
| 126 | +from datetime import datetime |
| 127 | +
|
| 128 | +# entity_df generally comes from upstream systems |
| 129 | +entity_df = pd.DataFrame.from_dict({ |
| 130 | + "driver_id": [1001, 1002, 1003, 1004], |
| 131 | + "event_timestamp": [ |
| 132 | + datetime(2021, 4, 12, 10, 59, 42), |
| 133 | + datetime(2021, 4, 12, 8, 12, 10), |
| 134 | + datetime(2021, 4, 12, 16, 40, 26), |
| 135 | + datetime(2021, 4, 12, 15, 1, 12) |
| 136 | + ] |
| 137 | +}) |
| 138 | +
|
| 139 | +entity_df.head() |
| 140 | +``` |
| 141 | + |
| 142 | + |
| 143 | + |
| 144 | +This DataFrame represents the entity keys and timestamps that we want feature values as of. We can pass this Entity DataFrame into Feast, and Feast will fetch point-in-time correct features for each row: |
| 145 | + |
| 146 | +```text |
| 147 | +from feast import FeatureStore |
| 148 | +
|
| 149 | +store = FeatureStore(repo_path=".") |
| 150 | +
|
| 151 | +training_df = store.get_historical_features( |
| 152 | + entity_df=entity_df, |
| 153 | + feature_refs = [ |
| 154 | + 'driver_hourly_stats:conv_rate', |
| 155 | + 'driver_hourly_stats:acc_rate', |
| 156 | + 'driver_hourly_stats:avg_daily_trips' |
| 157 | + ], |
| 158 | +).to_df() |
| 159 | +
|
| 160 | +training_df.head() |
| 161 | +``` |
| 162 | + |
| 163 | + |
| 164 | + |
| 165 | +This DataFrame contains all the necessary signals needed to train a model, excluding labels, which are typically managed outside of Feast. Before you can train a model, you'll need to join on labels from external systems. |
| 166 | + |
| 167 | +## 4. Materializing features to the online store |
| 168 | + |
| 169 | +We just used Feast to generate Using the `local` provider, the online store is a SQLite database. To materialize features, run the following command from the CLI: |
| 170 | + |
| 171 | +```text |
| 172 | +# Materialize feature values up until the current time |
| 173 | +feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S") |
| 174 | +``` |
| 175 | + |
| 176 | +We've just populated the online store with the most up-to-date features from the offline store. Our feature values are now ready for real-time fetching. |
| 177 | + |
| 178 | +## 5. Fetching feature vectors for inference |
| 179 | + |
| 180 | +After we materialize our features, we can use the `store.get_online_features` to fetch the _latest_ feature values for real-time inference. Run the following code in your notebook to fetch online features: |
| 181 | + |
| 182 | +```text |
| 183 | +from pprint import pprint |
| 184 | +
|
| 185 | +feature_vector = store.get_online_features( |
| 186 | + feature_refs=[ |
| 187 | + 'driver_hourly_stats:conv_rate', |
| 188 | + 'driver_hourly_stats:conv_rate', |
| 189 | + 'driver_hourly_stats:avg_daily_trips' |
| 190 | + ], |
| 191 | + entity_rows=[{"user_id": "1001"}] |
| 192 | +).to_dict() |
| 193 | +
|
| 194 | +print(feature_vector) |
| 195 | +``` |
| 196 | + |
| 197 | +```text |
| 198 | +pprint(feature_vector) |
| 199 | +
|
| 200 | +{'driver_hourly_stats__avg_daily_trips': [None], 'driver_hourly_stats__conv_rate': [None], 'driver_id': ['1005']} |
| 201 | +``` |
| 202 | + |
| 203 | +This feature vector can be used for real-time inference, for example, in a model-serving microservice. |
| 204 | + |
| 205 | +## Next steps |
| 206 | + |
| 207 | +This quickstart covered the essential workflows of using Feast in your local environment. The next step is to set `provider="gcp"` in your `feature_store.yaml` file and push your work to production deployment. You can also use the `feast init -t gcp` command in the CLI to initialize a feature repository with example features in the GCP environment. |
| 208 | + |
| 209 | +* See [Create a feature repository](how-to-guides/create-a-feature-repository.md) for more information on the workflows we covered. |
| 210 | +* Join our[ Slack group](https://slack.com) to talk to other Feast users and the maintainers! |
| 211 | + |
0 commit comments