-
Notifications
You must be signed in to change notification settings - Fork 1.3k
feat: Remote offline Store #4262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jeremyary
merged 41 commits into
feast-dev:master
from
RHEcosystemAppEng:remote_offline
Jun 13, 2024
Merged
Changes from 36 commits
Commits
Show all changes
41 commits
Select commit
Hold shift + click to select a range
47a9cfb
feat: Added offline store remote deployment functionly using arrow fl…
redhatHameed 088d739
Initial functional commit for remote get_historical_features
dmartinol 241fe50
remote offline store example
dmartinol e276704
removing unneeded test code and fixinf impotrts
dmartinol cdd8a4e
call do_put only once, postpone the invocation of do_put and simplifi…
dmartinol 54f6061
added primitive parameters to the command descriptor
dmartinol be91d23
removed redundant param
dmartinol f074044
Initial skeleton of unit test for offline server
dmartinol 06fe80d
added unit test for offline store remote client
redhatHameed 49857da
testing all offlinestore APIs
dmartinol a30c666
integrated comments
dmartinol 2c8d677
Updated remote offline server readme with the capability to init with…
tmihalac 1bca528
added RemoteOfflineStoreDataSourceCreator,
dmartinol f035430
added missing CI requirement
dmartinol 632a4c0
fixed linter
dmartinol 94f927b
fixed multiprocess CI requirement
dmartinol 36b3479
feat: Added offline store remote deployment functionly using arrow fl…
redhatHameed 799ae07
fix test errors
dmartinol 3029881
managing feature view aliases and restored skipped tests
dmartinol f6c481b
fixced linter issue
dmartinol 871e5d4
fixed broken test
dmartinol ddc3600
added supported deployment modes using helm chart for online (defaul…
redhatHameed 9a34251
updated the document for offline remote server
redhatHameed efeeeae
added the document for remote offline server
redhatHameed 7077bee
rebase and fix conflicts
redhatHameed 5697056
feat: Added offline store remote deployment functionly using arrow fl…
redhatHameed f9ca13b
added unit test for offline store remote client
redhatHameed 46fa3d5
added RemoteOfflineStoreDataSourceCreator,
dmartinol b36105a
feat: Added offline store remote deployment functionly using arrow fl…
redhatHameed 2f4a5ba
Added missing remote offline store apis implementation
tmihalac 52b0156
Fixed tests
tmihalac 32600fc
Implemented PR change proposal
tmihalac d2e012c
Implemented PR change proposal
tmihalac 301021f
updated example readme file
redhatHameed e34b070
Implemented PR change proposal
tmihalac dec05c9
fixing the integration tests
redhatHameed 143ef18
Fixed OfflineServer teardown
tmihalac bd9cd79
updated the document for remote offline feature server and client
redhatHameed bdf0150
Merge pull request #15 from redhatHameed/doc-change
redhatHameed 17c7e72
Implemented PR change proposal
tmihalac 11a3dae
Merge pull request #16 from tmihalac/implement-remaining-offline-methods
tmihalac File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| # Remote Offline Store | ||
|
|
||
| ## Description | ||
|
|
||
| The Python offline server is an Apache Arrow Flight Server that uses the gRPC communication protocol to exchange data. | ||
| This server wraps calls to existing offline store implementations and exposes interfaces as Arrow Flight endpoints. | ||
|
|
||
|
|
||
| ## CLI | ||
|
|
||
| There is a CLI command that starts the remote offline server: `feast serve_offline`. By default, remote offline server uses port 8815, the port can be overridden with a `--port` flag. | ||
|
|
||
| ## Deploying as a service on Kubernetes | ||
|
|
||
| The remote offline server can be deployed using helm chart see this [helm chart](https://github.com/feast-dev/feast/blob/master/infra/charts/feast-feature-server). | ||
|
|
||
| User need to set `feast_mode=offline`, when installing offline server as shown in the helm command below: | ||
|
|
||
| ``` | ||
| helm install feast-offline-server feast-charts/feast-feature-server --set feast_mode=offline --set feature_store_yaml_base64=$(base64 > feature_store.yaml) | ||
| ``` | ||
|
|
||
| ## Client Example | ||
|
|
||
| User needs to create client side `feature_store.yaml` file and set the `offline_store` type `remote` and provide the server connection configuration | ||
| including adding the host and specifying the port (default is 8815) required by the Arrow Flight client to connect with the Arrow Flight server. | ||
|
|
||
| {% code title="feature_store.yaml" %} | ||
| ```yaml | ||
| offline_store: | ||
| type: remote | ||
| host: localhost | ||
| port: 8815 | ||
| ``` | ||
| {% endcode %} | ||
|
|
||
| The complete example can be find under [remote-offline-store-example](../../../examples/remote-offline-store) | ||
|
|
||
| ## Functionality Matrix | ||
|
|
||
| The set of functionalities supported by remote offline stores is the same as those supported by offline stores with the SDK, which are described in detail [here](../offline-stores/overview.md#functionality). | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,98 @@ | ||
| # Feast Remote Offline Store Server | ||
|
|
||
| This example demonstrates the steps using an [Arrow Flight](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/) server/client as the remote Feast offline store. | ||
|
|
||
| ## Launch the offline server locally | ||
|
|
||
| 1. **Create Feast Project**: Using the `feast init` command for example the [offline_server](./offline_server) folder contains a sample Feast repository. | ||
|
|
||
| 2. **Start Remote Offline Server**: Use the `feast server_offline` command to start remote offline requests. This command will: | ||
| - Spin up an `Arrow Flight` server at the default port 8815. | ||
|
|
||
| 3. **Initialize Offline Server**: The offline server can be initialized by providing the `feature_store.yml` file via an environment variable named `FEATURE_STORE_YAML_BASE64`. A temporary directory will be created with the provided YAML file named `feature_store.yml`. | ||
|
|
||
| Example | ||
|
|
||
| ```console | ||
| cd offline_server | ||
| feast -c feature_repo apply | ||
| ``` | ||
|
|
||
| ```console | ||
| feast -c feature_repo serve_offline | ||
| ``` | ||
|
|
||
| Sample output: | ||
| ```console | ||
| Serving on grpc+tcp://127.0.0.1:8815 | ||
| ``` | ||
|
|
||
| ## Launch a remote offline client | ||
|
|
||
| The [offline_client](./offline_client) folder includes a test python function that uses an offline store of type `remote`, leveraging the remote server as the | ||
| actual data provider. | ||
|
|
||
|
|
||
| The test class is located under [offline_client](./offline_client/) and uses a remote configuration of the offline store to delegate the actual | ||
| implementation to the offline store server: | ||
| ```yaml | ||
| offline_store: | ||
| type: remote | ||
| host: localhost | ||
| port: 8815 | ||
| ``` | ||
|
|
||
| The test code in [test.py](./offline_client/test.py) initializes the store from the local configuration and then fetches the historical features | ||
| from the store like any other Feast client, but the actual implementation is delegated to the offline server | ||
| ```py | ||
| store = FeatureStore(repo_path=".") | ||
| training_df = store.get_historical_features(entity_df, features).to_df() | ||
| ``` | ||
|
|
||
|
|
||
| Run client | ||
| `cd offline_client; | ||
| python test.py` | ||
|
|
||
| Sample output: | ||
|
|
||
| ```console | ||
| config.offline_store is <class 'feast.infra.offline_stores.remote.RemoteOfflineStoreConfig'> | ||
| ----- Feature schema ----- | ||
|
|
||
| <class 'pandas.core.frame.DataFrame'> | ||
| RangeIndex: 3 entries, 0 to 2 | ||
| Data columns (total 10 columns): | ||
| # Column Non-Null Count Dtype | ||
| --- ------ -------------- ----- | ||
| 0 driver_id 3 non-null int64 | ||
| 1 event_timestamp 3 non-null datetime64[ns, UTC] | ||
| 2 label_driver_reported_satisfaction 3 non-null int64 | ||
| 3 val_to_add 3 non-null int64 | ||
| 4 val_to_add_2 3 non-null int64 | ||
| 5 conv_rate 3 non-null float32 | ||
| 6 acc_rate 3 non-null float32 | ||
| 7 avg_daily_trips 3 non-null int32 | ||
| 8 conv_rate_plus_val1 3 non-null float64 | ||
| 9 conv_rate_plus_val2 3 non-null float64 | ||
| dtypes: datetime64[ns, UTC](1), float32(2), float64(2), int32(1), int64(4) | ||
| memory usage: 332.0 bytes | ||
| None | ||
|
|
||
| ----- Features ----- | ||
|
|
||
| driver_id event_timestamp label_driver_reported_satisfaction ... avg_daily_trips conv_rate_plus_val1 conv_rate_plus_val2 | ||
| 0 1001 2021-04-12 10:59:42+00:00 1 ... 590 1.022378 10.022378 | ||
| 1 1002 2021-04-12 08:12:10+00:00 5 ... 974 2.762213 20.762213 | ||
| 2 1003 2021-04-12 16:40:26+00:00 3 ... 127 3.419828 30.419828 | ||
|
|
||
| [3 rows x 10 columns] | ||
| ------training_df---- | ||
| driver_id event_timestamp label_driver_reported_satisfaction ... avg_daily_trips conv_rate_plus_val1 conv_rate_plus_val2 | ||
| 0 1001 2021-04-12 10:59:42+00:00 1 ... 590 1.022378 10.022378 | ||
| 1 1002 2021-04-12 08:12:10+00:00 5 ... 974 2.762213 20.762213 | ||
| 2 1003 2021-04-12 16:40:26+00:00 3 ... 127 3.419828 30.419828 | ||
|
|
||
| [3 rows x 10 columns] | ||
| ``` | ||
|
|
Empty file.
10 changes: 10 additions & 0 deletions
10
examples/remote-offline-store/offline_client/feature_store.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| project: offline_server | ||
| # By default, the registry is a file (but can be turned into a more scalable SQL-backed registry) | ||
| registry: ../offline_server/feature_repo/data/registry.db | ||
| # The provider primarily specifies default offline / online stores & storing the registry in a given cloud | ||
| provider: local | ||
| offline_store: | ||
| type: remote | ||
| host: localhost | ||
| port: 8815 | ||
| entity_key_serialization_version: 2 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| from datetime import datetime | ||
| from feast import FeatureStore | ||
| import pandas as pd | ||
|
|
||
| entity_df = pd.DataFrame.from_dict( | ||
| { | ||
| "driver_id": [1001, 1002, 1003], | ||
| "event_timestamp": [ | ||
| datetime(2021, 4, 12, 10, 59, 42), | ||
| datetime(2021, 4, 12, 8, 12, 10), | ||
| datetime(2021, 4, 12, 16, 40, 26), | ||
| ], | ||
| "label_driver_reported_satisfaction": [1, 5, 3], | ||
| "val_to_add": [1, 2, 3], | ||
| "val_to_add_2": [10, 20, 30], | ||
| } | ||
| ) | ||
|
|
||
| features = [ | ||
| "driver_hourly_stats:conv_rate", | ||
| "driver_hourly_stats:acc_rate", | ||
| "driver_hourly_stats:avg_daily_trips", | ||
| "transformed_conv_rate:conv_rate_plus_val1", | ||
| "transformed_conv_rate:conv_rate_plus_val2", | ||
| ] | ||
|
|
||
| store = FeatureStore(repo_path=".") | ||
|
|
||
| training_df = store.get_historical_features(entity_df, features).to_df() | ||
|
|
||
| print("----- Feature schema -----\n") | ||
| print(training_df.info()) | ||
|
|
||
| print() | ||
| print("----- Features -----\n") | ||
| print(training_df.head()) | ||
|
|
||
| print("------training_df----") | ||
|
|
||
| print(training_df) |
Empty file.
Empty file.
Binary file added
BIN
+34.3 KB
examples/remote-offline-store/offline_server/feature_repo/data/driver_stats.parquet
Binary file not shown.
Binary file added
BIN
+28 KB
examples/remote-offline-store/offline_server/feature_repo/data/online_store.db
Binary file not shown.
140 changes: 140 additions & 0 deletions
140
examples/remote-offline-store/offline_server/feature_repo/example_repo.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,140 @@ | ||
| # This is an example feature definition file | ||
|
|
||
| from datetime import timedelta | ||
|
|
||
| import pandas as pd | ||
| import os | ||
|
|
||
| from feast import ( | ||
| Entity, | ||
| FeatureService, | ||
| FeatureView, | ||
| Field, | ||
| FileSource, | ||
| PushSource, | ||
| RequestSource, | ||
| ) | ||
| from feast.on_demand_feature_view import on_demand_feature_view | ||
| from feast.types import Float32, Float64, Int64 | ||
|
|
||
| # Define an entity for the driver. You can think of an entity as a primary key used to | ||
| # fetch features. | ||
| driver = Entity(name="driver", join_keys=["driver_id"]) | ||
|
|
||
| # Read data from parquet files. Parquet is convenient for local development mode. For | ||
| # production, you can use your favorite DWH, such as BigQuery. See Feast documentation | ||
| # for more info. | ||
| driver_stats_source = FileSource( | ||
| name="driver_hourly_stats_source", | ||
| path=f"{os.path.dirname(os.path.abspath(__file__))}/data/driver_stats.parquet", | ||
| timestamp_field="event_timestamp", | ||
| created_timestamp_column="created", | ||
| ) | ||
|
|
||
| # Our parquet files contain sample data that includes a driver_id column, timestamps and | ||
| # three feature column. Here we define a Feature View that will allow us to serve this | ||
| # data to our model online. | ||
| driver_stats_fv = FeatureView( | ||
| # The unique name of this feature view. Two feature views in a single | ||
| # project cannot have the same name | ||
| name="driver_hourly_stats", | ||
| entities=[driver], | ||
| ttl=timedelta(days=1), | ||
| # The list of features defined below act as a schema to both define features | ||
| # for both materialization of features into a store, and are used as references | ||
| # during retrieval for building a training dataset or serving features | ||
| schema=[ | ||
| Field(name="conv_rate", dtype=Float32), | ||
| Field(name="acc_rate", dtype=Float32), | ||
| Field(name="avg_daily_trips", dtype=Int64, description="Average daily trips"), | ||
| ], | ||
| online=True, | ||
| source=driver_stats_source, | ||
| # Tags are user defined key/value pairs that are attached to each | ||
| # feature view | ||
| tags={"team": "driver_performance"}, | ||
| ) | ||
|
|
||
| # Define a request data source which encodes features / information only | ||
| # available at request time (e.g. part of the user initiated HTTP request) | ||
| input_request = RequestSource( | ||
| name="vals_to_add", | ||
| schema=[ | ||
| Field(name="val_to_add", dtype=Int64), | ||
| Field(name="val_to_add_2", dtype=Int64), | ||
| ], | ||
| ) | ||
|
|
||
|
|
||
| # Define an on demand feature view which can generate new features based on | ||
| # existing feature views and RequestSource features | ||
| @on_demand_feature_view( | ||
| sources=[driver_stats_fv, input_request], | ||
| schema=[ | ||
| Field(name="conv_rate_plus_val1", dtype=Float64), | ||
| Field(name="conv_rate_plus_val2", dtype=Float64), | ||
| ], | ||
| ) | ||
| def transformed_conv_rate(inputs: pd.DataFrame) -> pd.DataFrame: | ||
| df = pd.DataFrame() | ||
| df["conv_rate_plus_val1"] = inputs["conv_rate"] + inputs["val_to_add"] | ||
| df["conv_rate_plus_val2"] = inputs["conv_rate"] + inputs["val_to_add_2"] | ||
| return df | ||
|
|
||
|
|
||
| # This groups features into a model version | ||
| driver_activity_v1 = FeatureService( | ||
| name="driver_activity_v1", | ||
| features=[ | ||
| driver_stats_fv[["conv_rate"]], # Sub-selects a feature from a feature view | ||
| transformed_conv_rate, # Selects all features from the feature view | ||
| ], | ||
| ) | ||
| driver_activity_v2 = FeatureService( | ||
| name="driver_activity_v2", features=[driver_stats_fv, transformed_conv_rate] | ||
| ) | ||
|
|
||
| # Defines a way to push data (to be available offline, online or both) into Feast. | ||
| driver_stats_push_source = PushSource( | ||
| name="driver_stats_push_source", | ||
| batch_source=driver_stats_source, | ||
| ) | ||
|
|
||
| # Defines a slightly modified version of the feature view from above, where the source | ||
| # has been changed to the push source. This allows fresh features to be directly pushed | ||
| # to the online store for this feature view. | ||
| driver_stats_fresh_fv = FeatureView( | ||
| name="driver_hourly_stats_fresh", | ||
| entities=[driver], | ||
| ttl=timedelta(days=1), | ||
| schema=[ | ||
| Field(name="conv_rate", dtype=Float32), | ||
| Field(name="acc_rate", dtype=Float32), | ||
| Field(name="avg_daily_trips", dtype=Int64), | ||
| ], | ||
| online=True, | ||
| source=driver_stats_push_source, # Changed from above | ||
| tags={"team": "driver_performance"}, | ||
| ) | ||
|
|
||
|
|
||
| # Define an on demand feature view which can generate new features based on | ||
| # existing feature views and RequestSource features | ||
| @on_demand_feature_view( | ||
| sources=[driver_stats_fresh_fv, input_request], # relies on fresh version of FV | ||
| schema=[ | ||
| Field(name="conv_rate_plus_val1", dtype=Float64), | ||
| Field(name="conv_rate_plus_val2", dtype=Float64), | ||
| ], | ||
| ) | ||
| def transformed_conv_rate_fresh(inputs: pd.DataFrame) -> pd.DataFrame: | ||
| df = pd.DataFrame() | ||
| df["conv_rate_plus_val1"] = inputs["conv_rate"] + inputs["val_to_add"] | ||
| df["conv_rate_plus_val2"] = inputs["conv_rate"] + inputs["val_to_add_2"] | ||
| return df | ||
|
|
||
|
|
||
| driver_activity_v3 = FeatureService( | ||
| name="driver_activity_v3", | ||
| features=[driver_stats_fresh_fv, transformed_conv_rate_fresh], | ||
| ) |
9 changes: 9 additions & 0 deletions
9
examples/remote-offline-store/offline_server/feature_repo/feature_store.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| project: offline_server | ||
| # By default, the registry is a file (but can be turned into a more scalable SQL-backed registry) | ||
| registry: data/registry.db | ||
| # The provider primarily specifies default offline / online stores & storing the registry in a given cloud | ||
| provider: local | ||
| online_store: | ||
| type: sqlite | ||
| path: data/online_store.db | ||
| entity_key_serialization_version: 2 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.