GitBook: [master] 80 pages modified

Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>
feast-dev · feast-ci-bot · Jul 8, 2021 · Jun 12, 2021 · Jun 16, 2021 · Jun 16, 2021
commit dd25ad6d4f1c6921a39c5e279b534072f5459fa9
diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md
@@ -18,30 +18,30 @@
 * [Overview](concepts/overview.md)
 * [Feature view](concepts/feature-view.md)
 * [Data model](concepts/data-model-and-concepts.md)
-* [Online Store](concepts/online-store.md)
-* [Offline Store](concepts/offline-store.md)
+* [Online store](concepts/online-store.md)
+* [Offline store](concepts/offline-store.md)
 * [Provider](concepts/provider.md)
 * [Architecture](concepts/architecture-and-components.md)
 
 ## Reference
 
-* [Data Sources](reference/data-sources/README.md)
+* [Data sources](reference/data-sources/README.md)
   * [BigQuery](reference/data-sources/bigquery.md)
   * [File](reference/data-sources/file.md)
+* [Offline stores](reference/offline-stores/README.md)
+  * [File](reference/offline-stores/file.md)
+  * [BigQuery](reference/offline-stores/untitled.md)
 * [Online stores](reference/online-stores/README.md)
   * [SQLite](reference/online-stores/sqlite.md)
   * [Redis](reference/online-stores/redis.md)
   * [Datastore](reference/online-stores/datastore.md)
-* [Offline stores](reference/offline-stores/README.md)
-  * [File](reference/offline-stores/file.md)
-  * [BigQuery](reference/offline-stores/untitled.md)
 * [Providers](reference/providers/README.md)
   * [Local](reference/providers/local.md)
   * [Google Cloud Platform](reference/providers/google-cloud-platform.md)
-* [Feast CLI reference](reference/feast-cli-commands.md)
 * [Feature repository](reference/feature-repository/README.md)
   * [feature\_store.yaml](reference/feature-repository/feature-store-yaml.md)
   * [.feastignore](reference/feature-repository/feast-ignore.md)
+* [Feast CLI reference](reference/feast-cli-commands.md)
 * [Python API reference](http://rtd.feast.dev/)
 * [Telemetry](reference/telemetry.md)
 

diff --git a/docs/concepts/offline-store.md b/docs/concepts/offline-store.md
@@ -1,15 +1,15 @@
-# Offline Store
+# Offline store
 
-An offline store is a storage and compute system where historic feature data can be stored or accessed for building training datasets or for sourcing data for materialization into the online store.
+Feast uses offline stores as storage and compute systems. Offline stores store historic time-series feature values. Feast does not generate these features, but instead uses the offline store as the interface for querying existing features in your organization.
 
 Offline stores are used primarily for two reasons
 
-1. Building training datasets
-2. Querying data sources for feature data in order to load these features into your online store
+1. Building training datasets from time-series features.
+2. Materializing \(loading\) features from the offline store into an online store in order to serve those features at low latency for prediction.
 
-Feast does not actively manage your offline store. Instead, you are asked to select an offline store \(like `BigQuery` or the `File` offline store\) and then to introduce batch sources from these stores using [data sources](data-model-and-concepts.md#data-source) inside feature views.
+Offline stores are configured through the [feature\_store.yaml](../reference/offline-stores/). When building training datasets or materializing features into an online store, Feast will use the configured offline store along with the data sources you have defined as part of feature views to execute the necessary data operations.
 
-Feast will use your offline store to query these sources. It is not possible to query all data sources from all offline stores, and only a single offline store can be used at a time. For example, it is not possible to query a BigQuery table from a `File` offline store, nor is it possible for a `BigQuery` offline store to query files in your local file system.
+It is not possible to query all data sources from all offline stores, and only a single offline store can be used at a time. For example, it is not possible to query a BigQuery table from a `File` offline store, nor is it possible for a `BigQuery` offline store to query files from your local file system.
 
-Please see [feature\_store.yaml](../reference/feature-repository/feature-store-yaml.md#overview) for configuring your offline store.
+Please see the [Offline Stores](../reference/offline-stores/) reference for more details on configuring offline stores.
 
diff --git a/docs/concepts/online-store.md b/docs/concepts/online-store.md
@@ -1,4 +1,4 @@
-# Online Store
+# Online store
 
 The Feast online store is used for low-latency online feature value lookups. Feature values are loaded into the online store from data sources in feature views using the `materialize` command.
 
@@ -12,5 +12,3 @@ Once the above data source is materialized into Feast \(using `feast materialize
 
 ![](../.gitbook/assets/image%20%285%29.png)
 
-### 
-
@@ -1,227 +1,103 @@
 # Quickstart
 
-Welcome to the Feast quickstart! This quickstart is intended to get you up and running with Feast in your local environment. It covers the following workflows:
+In this tutorial we will
 
-1. Setting up Feast
-2. Registering features
-3. Constructing training datasets from offline data
-4. Materializing feature data to the online feature store
-5. Fetching feature vectors for real-time inference
+1. Deploy a local feature store with a **Parquet file offline store** and **Sqlite online store**.
+2. Build a training dataset using our time series features from our **Parquet files**.
+3. Materialize feature values from the offline store into the online store.
+4. Read the latest features from the online store for inference.
 
-This quickstart uses some example data about a ride-hailing app to walk through Feast. Let's get into it!
+### Install Feast
 
-## 1. Setting up Feast
-
-A Feast installation includes a Python SDK and a CLI. Both can be installed from `pip`:
+Install the Feast SDK and CLI using pip:
 
 ```bash
 pip install feast
 ```
 
-You can test your installation by running`feast version` from your command line:
-
-```bash
-$ feast version
-
-# 0.10
-```
-
-## 2. Registering features to Feast
+### Create a feature repository
 
-We can bootstrap a feature repository using the `feast init` command:
-
-```bash
-feast init feature_repo
-
-# Creating a new Feast repository in <cwd>/feature_repo.
-```
-
-This command generates an example repository containing the following files.
-
-{% code title="CLI" %}
-```bash
-tree
-
-# .
-# └── feature_repo
-#     ├── data
-#     │   └── driver_stats.parquet
-#     ├── example.py
-#     └── feature_store.yaml
-```
-{% endcode %}
+Bootstrap a new feature repository using `feast init`  from the command line:
 
-Now, let's take a look at these files. First, `cd` into the feature repository:
-
-{% code title="CLI" %}
 ```text
+feast init feature_repo
 cd feature_repo
 ```
-{% endcode %}
-
-Next, take a look at the `feature_store.yaml` file, which configures how the feature store runs:
-
-{% code title="feature\_store.yaml" %}
-```yaml
-project: feature_repo
-registry: data/registry.db
-provider: local
-online_store:
-    path: data/online_store.db
-```
-{% endcode %}
-
-An important field to be aware of is `provider`, which specifies the environment that Feast will run in. We've initialized `provider=local`, indicating that Feast will run the feature store on our local machine. See [Repository Config](reference/feature-repository/feature-store-yaml.md) for more details.
-
-Next, take a look at `example.py`, which defines some example features:
 
-{% code title="example.py" %}
-```python
-# This is an example feature definition file
-
-from google.protobuf.duration_pb2 import Duration
-
-from feast import Entity, Feature, FeatureView, ValueType
-from feast.data_source import FileSource
-
-# Read data from parquet files. Parquet is convenient for local development mode. For
-# production, you can use your favorite DWH, such as BigQuery. See Feast documentation
-# for more info.
-driver_hourly_stats = FileSource(
-    path="/<cwd>/feature_repo/data/driver_stats.parquet",
-    event_timestamp_column="datetime",
-    created_timestamp_column="created",
-)
-
-# Define an entity for the driver. You can think of entity as a primary key used to
-# fetch features.
-driver = Entity(name="driver_id", value_type=ValueType.INT64, description="driver id",)
-
-# Our parquet files contain sample data that includes a driver_id column, timestamps and
-# three feature column. Here we define a Feature View that will allow us to serve this
-# data to our model online.
-driver_hourly_stats_view = FeatureView(
-    name="driver_hourly_stats",
-    entities=["driver_id"],
-    ttl=Duration(seconds=86400 * 1),
-    features=[
-        Feature(name="conv_rate", dtype=ValueType.FLOAT),
-        Feature(name="acc_rate", dtype=ValueType.FLOAT),
-        Feature(name="avg_daily_trips", dtype=ValueType.INT64),
-    ],
-    online=True,
-    input=driver_hourly_stats,
-    tags={},
-)
+```text
+Creating a new Feast repository in /home/Jovyan/feature_repo.
 ```
-{% endcode %}
-
-There are three objects defined in this file:
-
-* A `DataSource`, which is a pointer to persistent feature data. In this example, we're using a `FileSource`, which points to a set of parquet files on our local machine.
-* An `Entity`, which is a metadata object that is used to organize and join features.  In this example, our entity is `driver_id`, indicating that our features are modeling attributes of drivers.
-* A `FeatureView`, which defines a group of features. In this example, our features are statistics about drivers, like their conversion rate and average daily trips.
 
-Feature definitions in Feast work similarly to Terraform: local definitions don't actually affect what's running in production until we explicitly register them with Feast. At this point, we have a set of feature definitions, but we haven't registered them with Feast yet.
+### Register feature definitions and deploy your feature store
 
-We can register our features by running `feast apply` from the CLI:
+The `apply` command registers all the objects in your feature repository and deploys a feature store:
 
-{% code title="CLI" %}
 ```bash
 feast apply
 ```
-{% endcode %}
 
 ```text
 Registered entity driver_id
 Registered feature view driver_hourly_stats
 Deploying infrastructure for driver_hourly_stats
 ```
 
-This command has registered our features to Feast. They're now ready for offline retrieval and materialization.
+### Generating training data
 
-## 3. Generating training data
+The `apply` command builds a training dataset based on the time-series features defined in the feature repository:
 
-Feast generates point-in-time accurate training data. In our ride-hailing example, we are using statistics about drivers to predict the likelihood of a booking completion. When we generate training data, we want to know what the features of the drivers were _at the time of prediction_ \(in the past.\)
-
-![](.gitbook/assets/ride-hailing.png)
-
-Generating training datasets is a workflow best done from an interactive computing environment, like a Jupyter notebook. You can start a Jupyter notebook by running `jupyter notebook` from the command line. Then, run the following code to generate an _entity DataFrame_:
-
-{% code title="jupyter notebook" %}
 ```python
-import pandas as pd
 from datetime import datetime
 
-# entity_df generally comes from upstream systems
-entity_df = pd.DataFrame.from_dict({
-    "driver_id": [1001, 1002, 1003, 1004],
-    "event_timestamp": [
-        datetime(2021, 4, 12, 10, 59, 42),
-        datetime(2021, 4, 12, 8,  12, 10),
-        datetime(2021, 4, 12, 16, 40, 26),
-        datetime(2021, 4, 12, 15, 1 , 12)
-    ]
-})
-
-entity_df.head()
-```
-{% endcode %}
-
-![](.gitbook/assets/feast-landing-page-blog-post-page-5%20%281%29%20%281%29%20%281%29%20%282%29%20%282%29%20%285%29%20%287%29%20%287%29%20%283%29%20%287%29.png)
-
-This DataFrame represents the entity keys and timestamps that we want feature values for. We can pass this Entity DataFrame into Feast, and Feast will fetch point-in-time correct features for each row:
+import pandas as pd
 
-{% code title="jupyter notebook" %}
-```python
 from feast import FeatureStore
 
+entity_df = pd.DataFrame.from_dict(
+    {
+        "driver_id": [1001, 1002, 1003, 1004],
+        "event_timestamp": [
+            datetime(2021, 4, 12, 10, 59, 42),
+            datetime(2021, 4, 12, 8, 12, 10),
+            datetime(2021, 4, 12, 16, 40, 26),
+            datetime(2021, 4, 12, 15, 1, 12),
+        ],
+    }
+)
+
 store = FeatureStore(repo_path=".")
 
 training_df = store.get_historical_features(
-    entity_df=entity_df, 
-    feature_refs = [
-        'driver_hourly_stats:conv_rate',
-        'driver_hourly_stats:acc_rate',
-        'driver_hourly_stats:avg_daily_trips'
+    entity_df=entity_df,
+    feature_refs=[
+        "driver_hourly_stats:conv_rate",
+        "driver_hourly_stats:acc_rate",
+        "driver_hourly_stats:avg_daily_trips",
     ],
 ).to_df()
 
-training_df.head()
+print(training_df.head())
 ```
-{% endcode %}
-
-![\(These feature values are non-deterministic, by the way.\)](.gitbook/assets/feast-landing-page-blog-post-feature-df.png)
-
-Feast has joined on the correct feature values for the drivers that specified, as of the timestamp we specified.
-
-This DataFrame contains all the necessary signals needed to train a model, excluding labels, which are typically managed outside of Feast. Before you can train a model, you'll need to join on labels from external systems.
 
-## 4. Materializing features to the online store
-
-We have just seen how we can use Feast in the model training workflow. Now, we'll see how Feast fits into the model inferencing workflow.
+```bash
+event_timestamp   driver_id  driver_hourly_stats__conv_rate  driver_hourly_stats__acc_rate  driver_hourly_stats__avg_daily_trips
+2021-04-12        1002       0.328245                        0.993218                       329
+2021-04-12        1001       0.448272                        0.873785                       767
+2021-04-12        1004       0.822571                        0.571790                       673
+2021-04-12        1003       0.556326                        0.605357                       335
+```
 
-When running inference on Feast features, the first step is to populate the online store to make our features available for real-time inference. When using the `local` provider, the online store is a SQLite database.
+### Load features into your online store
 
-To materialize features, run the following command from the CLI:
+The `materialize` command loads the latest feature values from your feature views into your online store:
 
-{% code title="CLI" %}
 ```bash
 CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
 feast materialize-incremental $CURRENT_TIME
-
-# Materializing feature view driver_hourly_stats from 2021-04-13 23:50:05.754655-04:00 
-# to 2021-04-14 23:50:04-04:00 done!
 ```
-{% endcode %}
-
-We've just populated the online store with the most recent features from the offline store. Our feature values are now ready for real-time retrieval.
 
-## 5. Fetching feature vectors for inference
+### Fetching feature vectors for inference
 
-After we materialize our features, we can use the `store.get_online_features` to fetch the latest feature values for real-time inference:
-
-{% code title="jupyter notebook" %}
 ```python
 from pprint import pprint
 from feast import FeatureStore
@@ -230,18 +106,17 @@ store = FeatureStore(repo_path=".")
 
 feature_vector = store.get_online_features(
     feature_refs=[
-        'driver_hourly_stats:conv_rate',
-        'driver_hourly_stats:acc_rate',
-        'driver_hourly_stats:avg_daily_trips'
+        "driver_hourly_stats:conv_rate",
+        "driver_hourly_stats:acc_rate",
+        "driver_hourly_stats:avg_daily_trips",
     ],
-    entity_rows=[{"driver_id": 1001}]
+    entity_rows=[{"driver_id": 1001}],
 ).to_dict()
 
 pprint(feature_vector)
 ```
-{% endcode %}
 
-```text
+```python
 {
     'driver_id': [1001],
     'conv_rate': [0.49274],
@@ -250,12 +125,8 @@ pprint(feature_vector)
 }
 ```
 
-This feature vector can be used for real-time inference, for example, in a model serving microservice.
-
-## Next steps
-
-This quickstart covered the essential workflows of using Feast in your local environment. The next step is to `pip install "feast[gcp]"` and set `provider="gcp"` in your `feature_store.yaml` file and push your work to production deployment. You can also use the `feast init -t gcp` command in the CLI to initialize a feature repository with example features in the GCP environment.
+### Next steps
 
-* See [Create a feature repository](getting-started/create-a-feature-repository.md) for more information on the workflows we covered.
-* Join our [Slack group](https://slack.feast.dev) to talk to other Feast users and the maintainers!
+* Follow our [Getting Started](getting-started/) guide for a hands tutorial in using Feast
+* Join other Feast users and contributors in [Slack](https://slack.feast.dev/) and become part of the community!
 
diff --git a/docs/reference/data-sources/README.md b/docs/reference/data-sources/README.md
@@ -1,4 +1,4 @@
-# Data Sources
+# Data sources
 
 Please see [Data Source](../../concepts/feature-view.md#data-source) for an explanation of data sources.
 

diff --git a/docs/reference/feast-cli-commands.md b/docs/reference/feast-cli-commands.md
@@ -165,7 +165,3 @@ Print the current Feast version
 feast version
 ```
 
-
-
-## 
-
Original file line number	Diff line number	Diff line change
Expand Up		@@ -165,7 +165,3 @@ Print the current Feast version
		feast version
		```



		##