Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
21e47d2
test
mavysavydav Jun 12, 2021
06e0c77
refactored existing tests to test full_feature_names feature on data …
Mwad22 Jun 16, 2021
4b7dd18
removed full_feature_names usage from quickstart and README to have m…
Mwad22 Jun 16, 2021
579e08f
Update CHANGELOG for Feast v0.10.8
Jun 17, 2021
462da43
GitBook: [master] 2 pages modified
achals Jun 17, 2021
df95ee8
Schema Inferencing should happen at apply time (#1646)
mavysavydav Jun 18, 2021
e383575
GitBook: [master] 80 pages modified
woop Jun 19, 2021
dd25ad6
GitBook: [master] 80 pages modified
woop Jun 20, 2021
cef2869
Provide descriptive error on invalid table reference (#1627)
codyjlin Jun 21, 2021
c2e2b4d
Refactor OnlineStoreConfig classes into owning modules (#1649)
achals Jun 21, 2021
d2cda24
Possibility to specify a project for BigQuery queries (#1656)
MattDelac Jun 21, 2021
4ab4c60
Refactor OfflineStoreConfig classes into their owning modules (#1657)
achals Jun 22, 2021
64a2cb5
Run python unit tests in parallel (#1652)
achals Jun 22, 2021
9e4c907
Rename telemetry to usage (#1660)
Jun 22, 2021
b951282
resolved final comments on PR (variable renaming, refactor tests)
Mwad22 Jun 23, 2021
a68b12b
reformatted after merge conflict
Mwad22 Jun 23, 2021
094dbf3
Update CHANGELOG for Feast v0.11.0
woop Jun 24, 2021
0a148f9
Update charts README (#1659)
szalai1 Jun 25, 2021
0ce8210
Added Redis to list of online stores for local provider in providers …
nels Jun 25, 2021
d71e4c5
Grouped inferencing statements together in apply methods for easier r…
mavysavydav Jun 25, 2021
c14023f
Add RedshiftDataSource (#1669)
Jun 28, 2021
d138648
Provide the user with more options for setting the to_bigquery config…
codyjlin Jun 28, 2021
c02b9eb
Add streaming sources to the FeatureView API (#1664)
achals Jun 28, 2021
12dbbea
Add to_table() to RetrievalJob object (#1663)
MattDelac Jun 29, 2021
d0fe0a9
Rename to_table to to_arrow (#1671)
MattDelac Jun 29, 2021
6e8670e
Cancel BigQuery job if timeout hits (#1672)
MattDelac Jun 29, 2021
5314024
Fix Feature References example (#1674)
GregKuhlmann Jun 30, 2021
eb1da5e
Allow strings for online/offline store instead of dicts (#1673)
achals Jun 30, 2021
183a0b9
Remove default list from the FeatureView constructor (#1679)
achals Jul 1, 2021
b714a12
made changes requested by @tsotnet
Mwad22 Jul 2, 2021
c78894f
Fix unit tests that got broken by Pandas 1.3.0 release (#1683)
Jul 3, 2021
20c9461
Add support for DynamoDB and S3 registry (#1483)
leonid133 Jul 3, 2021
d36d1a0
Parallelize integration tests (#1684)
Jul 4, 2021
651bce3
BQ exception should be raised first before we check the timedout (#1675)
MattDelac Jul 5, 2021
f3b92c3
Update sdk/python/feast/infra/provider.py
Mwad22 Jul 5, 2021
f400d65
Update sdk/python/feast/feature_store.py
Mwad22 Jul 5, 2021
082fca7
made error logic/messages more descriptive
Mwad22 Jul 5, 2021
3aca976
made error logic/messages more descriptive.
Mwad22 Jul 5, 2021
79aa736
Simplified error messages
Mwad22 Jul 6, 2021
d7d08ef
ran formatter, issue in errors.py
Mwad22 Jul 7, 2021
2ab8eea
Merge branch 'master' into mwad22-1618-PR
Mwad22 Jul 7, 2021
650340d
python linter issues resolved
Mwad22 Jul 7, 2021
5d582a6
removed unnecessary default assignment in get_historical_features. de…
Mwad22 Jul 8, 2021
8724e0b
added error message assertion for feature name collisions, and other …
Mwad22 Jul 8, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
GitBook: [master] 80 pages modified
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>
  • Loading branch information
woop authored and Mwad22 committed Jul 7, 2021
commit dd25ad6d4f1c6921a39c5e279b534072f5459fa9
14 changes: 7 additions & 7 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,30 +18,30 @@
* [Overview](concepts/overview.md)
* [Feature view](concepts/feature-view.md)
* [Data model](concepts/data-model-and-concepts.md)
* [Online Store](concepts/online-store.md)
* [Offline Store](concepts/offline-store.md)
* [Online store](concepts/online-store.md)
* [Offline store](concepts/offline-store.md)
* [Provider](concepts/provider.md)
* [Architecture](concepts/architecture-and-components.md)

## Reference

* [Data Sources](reference/data-sources/README.md)
* [Data sources](reference/data-sources/README.md)
* [BigQuery](reference/data-sources/bigquery.md)
* [File](reference/data-sources/file.md)
* [Offline stores](reference/offline-stores/README.md)
* [File](reference/offline-stores/file.md)
* [BigQuery](reference/offline-stores/untitled.md)
* [Online stores](reference/online-stores/README.md)
* [SQLite](reference/online-stores/sqlite.md)
* [Redis](reference/online-stores/redis.md)
* [Datastore](reference/online-stores/datastore.md)
* [Offline stores](reference/offline-stores/README.md)
* [File](reference/offline-stores/file.md)
* [BigQuery](reference/offline-stores/untitled.md)
* [Providers](reference/providers/README.md)
* [Local](reference/providers/local.md)
* [Google Cloud Platform](reference/providers/google-cloud-platform.md)
* [Feast CLI reference](reference/feast-cli-commands.md)
* [Feature repository](reference/feature-repository/README.md)
* [feature\_store.yaml](reference/feature-repository/feature-store-yaml.md)
* [.feastignore](reference/feature-repository/feast-ignore.md)
* [Feast CLI reference](reference/feast-cli-commands.md)
* [Python API reference](http://rtd.feast.dev/)
* [Telemetry](reference/telemetry.md)

Expand Down
14 changes: 7 additions & 7 deletions docs/concepts/offline-store.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
# Offline Store
# Offline store

An offline store is a storage and compute system where historic feature data can be stored or accessed for building training datasets or for sourcing data for materialization into the online store.
Feast uses offline stores as storage and compute systems. Offline stores store historic time-series feature values. Feast does not generate these features, but instead uses the offline store as the interface for querying existing features in your organization.

Offline stores are used primarily for two reasons

1. Building training datasets
2. Querying data sources for feature data in order to load these features into your online store
1. Building training datasets from time-series features.
2. Materializing \(loading\) features from the offline store into an online store in order to serve those features at low latency for prediction.

Feast does not actively manage your offline store. Instead, you are asked to select an offline store \(like `BigQuery` or the `File` offline store\) and then to introduce batch sources from these stores using [data sources](data-model-and-concepts.md#data-source) inside feature views.
Offline stores are configured through the [feature\_store.yaml](../reference/offline-stores/). When building training datasets or materializing features into an online store, Feast will use the configured offline store along with the data sources you have defined as part of feature views to execute the necessary data operations.

Feast will use your offline store to query these sources. It is not possible to query all data sources from all offline stores, and only a single offline store can be used at a time. For example, it is not possible to query a BigQuery table from a `File` offline store, nor is it possible for a `BigQuery` offline store to query files in your local file system.
It is not possible to query all data sources from all offline stores, and only a single offline store can be used at a time. For example, it is not possible to query a BigQuery table from a `File` offline store, nor is it possible for a `BigQuery` offline store to query files from your local file system.

Please see [feature\_store.yaml](../reference/feature-repository/feature-store-yaml.md#overview) for configuring your offline store.
Please see the [Offline Stores](../reference/offline-stores/) reference for more details on configuring offline stores.

4 changes: 1 addition & 3 deletions docs/concepts/online-store.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Online Store
# Online store

The Feast online store is used for low-latency online feature value lookups. Feature values are loaded into the online store from data sources in feature views using the `materialize` command.

Expand All @@ -12,5 +12,3 @@ Once the above data source is materialized into Feast \(using `feast materialize

![](../.gitbook/assets/image%20%285%29.png)

###

235 changes: 53 additions & 182 deletions docs/quickstart.md
Original file line number Diff line number Diff line change
@@ -1,227 +1,103 @@
# Quickstart

Welcome to the Feast quickstart! This quickstart is intended to get you up and running with Feast in your local environment. It covers the following workflows:
In this tutorial we will

1. Setting up Feast
2. Registering features
3. Constructing training datasets from offline data
4. Materializing feature data to the online feature store
5. Fetching feature vectors for real-time inference
1. Deploy a local feature store with a **Parquet file offline store** and **Sqlite online store**.
2. Build a training dataset using our time series features from our **Parquet files**.
3. Materialize feature values from the offline store into the online store.
4. Read the latest features from the online store for inference.

This quickstart uses some example data about a ride-hailing app to walk through Feast. Let's get into it!
### Install Feast

## 1. Setting up Feast

A Feast installation includes a Python SDK and a CLI. Both can be installed from `pip`:
Install the Feast SDK and CLI using pip:

```bash
pip install feast
```

You can test your installation by running`feast version` from your command line:

```bash
$ feast version

# 0.10
```

## 2. Registering features to Feast
### Create a feature repository

We can bootstrap a feature repository using the `feast init` command:

```bash
feast init feature_repo

# Creating a new Feast repository in <cwd>/feature_repo.
```

This command generates an example repository containing the following files.

{% code title="CLI" %}
```bash
tree

# .
# └── feature_repo
# ├── data
# │ └── driver_stats.parquet
# ├── example.py
# └── feature_store.yaml
```
{% endcode %}
Bootstrap a new feature repository using `feast init` from the command line:

Now, let's take a look at these files. First, `cd` into the feature repository:

{% code title="CLI" %}
```text
feast init feature_repo
cd feature_repo
```
{% endcode %}

Next, take a look at the `feature_store.yaml` file, which configures how the feature store runs:

{% code title="feature\_store.yaml" %}
```yaml
project: feature_repo
registry: data/registry.db
provider: local
online_store:
path: data/online_store.db
```
{% endcode %}

An important field to be aware of is `provider`, which specifies the environment that Feast will run in. We've initialized `provider=local`, indicating that Feast will run the feature store on our local machine. See [Repository Config](reference/feature-repository/feature-store-yaml.md) for more details.

Next, take a look at `example.py`, which defines some example features:

{% code title="example.py" %}
```python
# This is an example feature definition file

from google.protobuf.duration_pb2 import Duration

from feast import Entity, Feature, FeatureView, ValueType
from feast.data_source import FileSource

# Read data from parquet files. Parquet is convenient for local development mode. For
# production, you can use your favorite DWH, such as BigQuery. See Feast documentation
# for more info.
driver_hourly_stats = FileSource(
path="/<cwd>/feature_repo/data/driver_stats.parquet",
event_timestamp_column="datetime",
created_timestamp_column="created",
)

# Define an entity for the driver. You can think of entity as a primary key used to
# fetch features.
driver = Entity(name="driver_id", value_type=ValueType.INT64, description="driver id",)

# Our parquet files contain sample data that includes a driver_id column, timestamps and
# three feature column. Here we define a Feature View that will allow us to serve this
# data to our model online.
driver_hourly_stats_view = FeatureView(
name="driver_hourly_stats",
entities=["driver_id"],
ttl=Duration(seconds=86400 * 1),
features=[
Feature(name="conv_rate", dtype=ValueType.FLOAT),
Feature(name="acc_rate", dtype=ValueType.FLOAT),
Feature(name="avg_daily_trips", dtype=ValueType.INT64),
],
online=True,
input=driver_hourly_stats,
tags={},
)
```text
Creating a new Feast repository in /home/Jovyan/feature_repo.
```
{% endcode %}

There are three objects defined in this file:

* A `DataSource`, which is a pointer to persistent feature data. In this example, we're using a `FileSource`, which points to a set of parquet files on our local machine.
* An `Entity`, which is a metadata object that is used to organize and join features. In this example, our entity is `driver_id`, indicating that our features are modeling attributes of drivers.
* A `FeatureView`, which defines a group of features. In this example, our features are statistics about drivers, like their conversion rate and average daily trips.

Feature definitions in Feast work similarly to Terraform: local definitions don't actually affect what's running in production until we explicitly register them with Feast. At this point, we have a set of feature definitions, but we haven't registered them with Feast yet.
### Register feature definitions and deploy your feature store

We can register our features by running `feast apply` from the CLI:
The `apply` command registers all the objects in your feature repository and deploys a feature store:

{% code title="CLI" %}
```bash
feast apply
```
{% endcode %}

```text
Registered entity driver_id
Registered feature view driver_hourly_stats
Deploying infrastructure for driver_hourly_stats
```

This command has registered our features to Feast. They're now ready for offline retrieval and materialization.
### Generating training data

## 3. Generating training data
The `apply` command builds a training dataset based on the time-series features defined in the feature repository:

Feast generates point-in-time accurate training data. In our ride-hailing example, we are using statistics about drivers to predict the likelihood of a booking completion. When we generate training data, we want to know what the features of the drivers were _at the time of prediction_ \(in the past.\)

![](.gitbook/assets/ride-hailing.png)

Generating training datasets is a workflow best done from an interactive computing environment, like a Jupyter notebook. You can start a Jupyter notebook by running `jupyter notebook` from the command line. Then, run the following code to generate an _entity DataFrame_:

{% code title="jupyter notebook" %}
```python
import pandas as pd
from datetime import datetime

# entity_df generally comes from upstream systems
entity_df = pd.DataFrame.from_dict({
"driver_id": [1001, 1002, 1003, 1004],
"event_timestamp": [
datetime(2021, 4, 12, 10, 59, 42),
datetime(2021, 4, 12, 8, 12, 10),
datetime(2021, 4, 12, 16, 40, 26),
datetime(2021, 4, 12, 15, 1 , 12)
]
})

entity_df.head()
```
{% endcode %}

![](.gitbook/assets/feast-landing-page-blog-post-page-5%20%281%29%20%281%29%20%281%29%20%282%29%20%282%29%20%285%29%20%287%29%20%287%29%20%283%29%20%287%29.png)

This DataFrame represents the entity keys and timestamps that we want feature values for. We can pass this Entity DataFrame into Feast, and Feast will fetch point-in-time correct features for each row:
import pandas as pd

{% code title="jupyter notebook" %}
```python
from feast import FeatureStore

entity_df = pd.DataFrame.from_dict(
{
"driver_id": [1001, 1002, 1003, 1004],
"event_timestamp": [
datetime(2021, 4, 12, 10, 59, 42),
datetime(2021, 4, 12, 8, 12, 10),
datetime(2021, 4, 12, 16, 40, 26),
datetime(2021, 4, 12, 15, 1, 12),
],
}
)

store = FeatureStore(repo_path=".")

training_df = store.get_historical_features(
entity_df=entity_df,
feature_refs = [
'driver_hourly_stats:conv_rate',
'driver_hourly_stats:acc_rate',
'driver_hourly_stats:avg_daily_trips'
entity_df=entity_df,
feature_refs=[
"driver_hourly_stats:conv_rate",
"driver_hourly_stats:acc_rate",
"driver_hourly_stats:avg_daily_trips",
],
).to_df()

training_df.head()
print(training_df.head())
```
{% endcode %}

![\(These feature values are non-deterministic, by the way.\)](.gitbook/assets/feast-landing-page-blog-post-feature-df.png)

Feast has joined on the correct feature values for the drivers that specified, as of the timestamp we specified.

This DataFrame contains all the necessary signals needed to train a model, excluding labels, which are typically managed outside of Feast. Before you can train a model, you'll need to join on labels from external systems.

## 4. Materializing features to the online store

We have just seen how we can use Feast in the model training workflow. Now, we'll see how Feast fits into the model inferencing workflow.
```bash
event_timestamp driver_id driver_hourly_stats__conv_rate driver_hourly_stats__acc_rate driver_hourly_stats__avg_daily_trips
2021-04-12 1002 0.328245 0.993218 329
2021-04-12 1001 0.448272 0.873785 767
2021-04-12 1004 0.822571 0.571790 673
2021-04-12 1003 0.556326 0.605357 335
```

When running inference on Feast features, the first step is to populate the online store to make our features available for real-time inference. When using the `local` provider, the online store is a SQLite database.
### Load features into your online store

To materialize features, run the following command from the CLI:
The `materialize` command loads the latest feature values from your feature views into your online store:

{% code title="CLI" %}
```bash
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize-incremental $CURRENT_TIME

# Materializing feature view driver_hourly_stats from 2021-04-13 23:50:05.754655-04:00
# to 2021-04-14 23:50:04-04:00 done!
```
{% endcode %}

We've just populated the online store with the most recent features from the offline store. Our feature values are now ready for real-time retrieval.

## 5. Fetching feature vectors for inference
### Fetching feature vectors for inference

After we materialize our features, we can use the `store.get_online_features` to fetch the latest feature values for real-time inference:

{% code title="jupyter notebook" %}
```python
from pprint import pprint
from feast import FeatureStore
Expand All @@ -230,18 +106,17 @@ store = FeatureStore(repo_path=".")

feature_vector = store.get_online_features(
feature_refs=[
'driver_hourly_stats:conv_rate',
'driver_hourly_stats:acc_rate',
'driver_hourly_stats:avg_daily_trips'
"driver_hourly_stats:conv_rate",
"driver_hourly_stats:acc_rate",
"driver_hourly_stats:avg_daily_trips",
],
entity_rows=[{"driver_id": 1001}]
entity_rows=[{"driver_id": 1001}],
).to_dict()

pprint(feature_vector)
```
{% endcode %}

```text
```python
{
'driver_id': [1001],
'conv_rate': [0.49274],
Expand All @@ -250,12 +125,8 @@ pprint(feature_vector)
}
```

This feature vector can be used for real-time inference, for example, in a model serving microservice.

## Next steps

This quickstart covered the essential workflows of using Feast in your local environment. The next step is to `pip install "feast[gcp]"` and set `provider="gcp"` in your `feature_store.yaml` file and push your work to production deployment. You can also use the `feast init -t gcp` command in the CLI to initialize a feature repository with example features in the GCP environment.
### Next steps

* See [Create a feature repository](getting-started/create-a-feature-repository.md) for more information on the workflows we covered.
* Join our [Slack group](https://slack.feast.dev) to talk to other Feast users and the maintainers!
* Follow our [Getting Started](getting-started/) guide for a hands tutorial in using Feast
* Join other Feast users and contributors in [Slack](https://slack.feast.dev/) and become part of the community!

2 changes: 1 addition & 1 deletion docs/reference/data-sources/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Data Sources
# Data sources

Please see [Data Source](../../concepts/feature-view.md#data-source) for an explanation of data sources.

Expand Down
4 changes: 0 additions & 4 deletions docs/reference/feast-cli-commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,3 @@ Print the current Feast version
feast version
```



##

Loading