Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Update docs
Signed-off-by: Kevin Zhang <kzhang@tecton.ai>
  • Loading branch information
kevjumba committed Jul 22, 2022
commit b70c64b8d96fd74cc5cd349caf65001585c57277
51 changes: 44 additions & 7 deletions docs/how-to-guides/adding-a-new-offline-store.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ Feast makes adding support for a new offline store (database) easy. Developers c

In this guide, we will show you how to extend the existing File offline store and use in a feature repo. While we will be implementing a specific store, this guide should be representative for adding support for any new offline store.

The full working code for this guide can be found at [feast-dev/feast-custom-offline-store-demo](https://github.com/feast-dev/feast-custom-offline-store-demo).
The full working code for this guide can be found at [feast-dev/feast-custom-offline-store-demo](https://github.com/feast-dev/feast-custom-offline-store-demo) and an example of a custom offline store that was contributed by developers can be found [here](https://github.com/feast-dev/feast/pull/2401).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk if I would use this as an example. It's a really large PR with also online store + registry store components

Would be better if we created some example PR (similar to what is in this guide)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I can link the spark pr instead. I think the example guide for the feast custom offline store is fine for high level but I dont' see a clear way of implementing some dummy procedures with. more clarity than the feast custom offline store. Actual implementations to reference would actually then be actual real world pr would help them work out the little kinks in implementation. I agree that the postgres one is a little bit too large but I think the spark one is relevant and useful

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you convert the dummy example in the guide into an actual PR so they can see?

You can also just point users to the directory that has the offline / online store implementations too


The process for using a custom offline store consists of 4 steps:
The process for using a custom offline store consists of 6 steps:

1. Defining an `OfflineStore` class.
2. Defining an `OfflineStoreConfig` class.
Comment thread
adchia marked this conversation as resolved.
Outdated
Expand All @@ -30,6 +30,8 @@ There are two methods that deal with reading data from the offline stores`get_hi
* `pull_latest_from_table_or_query` is invoked when running materialization (using the `feast materialize` or `feast materialize-incremental` commands, or the corresponding `FeatureStore.materialize()` method. This method pull data from the offline store, and the `FeatureStore` class takes care of writing this data into the online store.
* `get_historical_features` is invoked when reading values from the offline store using the `FeatureStore.get_historical_features()` method. Typically, this method is used to retrieve features when training ML models.
* `pull_all_from_table_or_query` is a method that pulls all the data from an offline store from a specified start date to a specified end date.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might call out that this method is optional since it is only used to work with SavedDatasets

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you commented on the wrong method , you mean write_logged_features right?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope. pull_all_from_table_or_query right now in our logic is only exposed for SavedDatasets afaict

* `write_logged_features` is a method that takes a pyarrow table or a path that points to a parquet file and writes the data to a defined source defined by `LoggingSource` and `LoggingConfig`.
* `offline_write_batch` is a method that supports directly pushing a pyarrow table to a feature view. Given a feature view with a specific schema, this function should write the pyarrow table to the batch source defined. More details about the push api can be found [here](docs/reference/data-sources/push.md).

{% code title="feast_custom_offline_store/file.py" %}
Comment thread
adchia marked this conversation as resolved.
Outdated
```python
Comment thread
adchia marked this conversation as resolved.
Outdated
Expand Down Expand Up @@ -128,6 +130,8 @@ Custom offline stores may need to implement their own instances of the `Retrieva

The `RetrievalJob` interface exposes two methods - `to_df` and `to_arrow`. The expectation is for the retrieval job to be able to return the rows read from the offline store as a parquet DataFrame, or as an Arrow table respectively.

Users who want to have their offline store support batch materialization (detailed in this [RFC](https://docs.google.com/document/d/1J7XdwwgQ9dY_uoV9zkRVGQjK9Sy43WISEW6D5V9qzGo/edit#heading=h.9gaqqtox9jg6)) will also need to implement `to_remote_storage` to write distribute the reading and writing of offline store records to a distributed framework. If this functionality is not needed, the RetrievalJob will default to local materialization.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Users who want to have their offline store support batch materialization (detailed in this [RFC](https://docs.google.com/document/d/1J7XdwwgQ9dY_uoV9zkRVGQjK9Sy43WISEW6D5V9qzGo/edit#heading=h.9gaqqtox9jg6)) will also need to implement `to_remote_storage` to write distribute the reading and writing of offline store records to a distributed framework. If this functionality is not needed, the RetrievalJob will default to local materialization.
Users who want to have their offline store support scalable batch materialization for online use cases (detailed in this [RFC](https://docs.google.com/document/d/1J7XdwwgQ9dY_uoV9zkRVGQjK9Sy43WISEW6D5V9qzGo/edit#heading=h.9gaqqtox9jg6)) will also need to implement `to_remote_storage` to write distribute the reading and writing of offline store records to a distributed framework. If this is not implemented, Feast will default to local materialization (pulling all records in memory to materialize).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixe..


{% code title="feast_custom_offline_store/file.py" %}
```python
class CustomFileRetrievalJob(RetrievalJob):
Expand Down Expand Up @@ -215,7 +219,7 @@ class CustomFileDataSource(FileSource):

After implementing these classes, the custom offline store can be used by referencing it in a feature repo's `feature_store.yaml` file, specifically in the `offline_store` field. The value specified should be the fully qualified class name of the OfflineStore.&#x20;

As long as your OfflineStore class is available in your Python environment, it will be imported by Feast dynamically at runtime.
As long as your OfflineStore class is available in your Python environment, it will be imported by Feast dynamically at runtime. It is crucial to specify the type as the package that Feast can import.
Comment thread
kevjumba marked this conversation as resolved.
Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be more clear if you left this as a comment in the example feature_store.yaml

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


To use our custom file offline store, we can use the following `feature_store.yaml`:

Expand Down Expand Up @@ -260,23 +264,56 @@ driver_hourly_stats_view = FeatureView(

## 6. Testing the OfflineStore class

### Contrib
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem like it logically belongs under "testing" ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to Defining an OfflineStore class


Generally, new offline stores should go in the contrib folder and have alpha functionality tags. The contrib folder is designated for community contributions that are not fully maintained by Feast maintainers and may contain potential instability and have API changes. It is recommended to add warnings to users that the offline store functionality is still in alpha development and is not fully stable. In order to be classified as fully stable and be moved to the main offline store folder, the offline store should integrate with the full integration test suite in Feast and pass all of the test cases.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alpha functionality tags is ambiguous imo.

Maybe say our standard is to print messages stating it's alpha status? This is also where a sample simple PR we provide is useful

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


### Integrating with the integration test suite and unit test suite.

Even if you have created the `OfflineStore` class in a separate repo, you can still test your implementation against the Feast test suite, as long as you have Feast as a submodule in your repo. In the Feast submodule, we can run all the unit tests with:

In order to test against the test suite, you need to create a custom `DataSourceCreator`. This class will need to implement our testing infrastructure methods, `create_data_source` and `created_saved_dataset_destination`. `create_data_source` creates a datasource for testing from the dataframe given that will register the dataframe with your offline store and return a datasource pointing to that location. See `BigQueryDataSourceCreator` for an implementation of a datawarehouse data source creator. **Saved datasets** are special datasets used for data validation and is Feast's way of snapshotting your data for future data validation.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In order to test against the test suite, you need to create a custom `DataSourceCreator`. This class will need to implement our testing infrastructure methods, `create_data_source` and `created_saved_dataset_destination`. `create_data_source` creates a datasource for testing from the dataframe given that will register the dataframe with your offline store and return a datasource pointing to that location. See `BigQueryDataSourceCreator` for an implementation of a datawarehouse data source creator. **Saved datasets** are special datasets used for data validation and is Feast's way of snapshotting your data for future data validation.
In order to test against the test suite, you need to create a custom `DataSourceCreator`. This class will need to implement our testing infrastructure methods, `create_data_source` and `created_saved_dataset_destination`. `create_data_source` should create a datasource forbased on the dataframe passed in. It may be implemented by uploading the contents of the dataframe into the offline store and returning a datasource object pointing to that location. See `BigQueryDataSourceCreator` for an implementation of a data source creator. **Saved datasets** are special datasets used for data validation and is Feast's way of snapshotting your data for future data validation.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


```
make test
make test-python
```
This should run the unit tests and the unit tests should all pass. Please add unit tests for your data source that test out basic functionality of reading and writing to and from the datasource. This should just be class level functionality that ensures that the methods you implemented for the OfflineStore and the DataSource associated with it work as expected. In order to be approved to merge into Feast, these unit tests should all pass and demonstrate that the DataSource works as intended.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This should run the unit tests and the unit tests should all pass. Please add unit tests for your data source that test out basic functionality of reading and writing to and from the datasource. This should just be class level functionality that ensures that the methods you implemented for the OfflineStore and the DataSource associated with it work as expected. In order to be approved to merge into Feast, these unit tests should all pass and demonstrate that the DataSource works as intended.
This command runs the python unit tests. It's required that unit tests should all pass for contributed components.
Contributors should add unit tests for contributed data source that test out basic functionality of reading and writing to and from the datasource. This should just be class level functionality that ensures that the methods you implemented for the OfflineStore and the DataSource associated with it work as expected. In order to be approved to merge into Feast, these unit tests should all pass and demonstrate that the DataSource works as intended.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test out basic functionality of reading and writing to and from the datasource

This actually is not clear at all. Do you mean the test should write from say bigquery? How is that a unit test then?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I guess most datasources will not have unit tests then. I guess as long as the offline source passes the integration tests, we can trust the store.


The universal tests, which are integration tests specifically intended to test offline and online stores, can be run with:
The universal tests, which are integration tests specifically intended to test offline and online stores, will be run against Feast to ensure that the Feast APIs works with your offline store. The universal tests can be run by running the following commands:

```
make test-python-integration-container
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is actually a recommended practice. The author is going to run into issues around authentication for different clouds. What they actually only need is just their offline store, mapped against the stubbed out online stores.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

make test-python-universal
```

The unit tests should succeed, but the universal tests will likely fail. The tests are parametrized based on the `FULL_REPO_CONFIGS` variable defined in `sdk/python/tests/integration/feature_repos/repo_configuration.py`. To overwrite these configurations, you can simply create your own file that contains a `FULL_REPO_CONFIGS`, and point Feast to that file by setting the environment variable `FULL_REPO_CONFIGS_MODULE` to point to that file. The main challenge there will be to write a `DataSourceCreator` for the offline store. In this repo, the file that overwrites `FULL_REPO_CONFIGS` is `feast_custom_offline_store/feast_tests.py`, so you would run
The unit tests should succeed, but the universal tests will likely fail. The tests are parametrized based on the `FULL_REPO_CONFIGS` variable defined in `sdk/python/tests/integration/feature_repos/repo_configuration.py`. To overwrite these configurations, you can simply create your own file that contains a `FULL_REPO_CONFIGS`, and point Feast to that file by setting the environment variable `FULL_REPO_CONFIGS_MODULE` to point to that file. The module should add new `IntegrationTestRepoConfig` classes to the `AVAILABLE_OFFLINE_STORES` by defining an offline and online store. In general, use the sqlite online store to test your offline store so that you don't need to setup any other online store containers. The main challenge there will be to write a `DataSourceCreator` for the offline store. In this repo, the file that overwrites `FULL_REPO_CONFIGS` is `feast_custom_offline_store/feast_tests.py`, so you would run
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should not recommend a path that we expect to fail. Instead, we should ask them to define this FULL_REPO_CONFIGS_MODULE first

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we similarly to the rest of the guide also show a code snippet of what this looks like for this example?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


```
export FULL_REPO_CONFIGS_MODULE='feast_custom_offline_store.feast_tests'
make test-python-universal
```

to test the offline store against the Feast universal tests. You should notice that some of the tests actually fail; this indicates that there is a mistake in the implementation of this offline store!
to test the offline store against the Feast universal tests. You should notice that some of the tests actually fail. If you have configured the offline stores and only that offline store is being used in `FULL_REPO_CONFIGS`, `make test-python-integration-container` should work as it tests the offline store and online stores that are containerized in Docker. All of the containerized integration tests should pass and if they don't, this indicates that there is a mistake in the implementation of this offline store!

Remember to add your datasource to `repo_config.py` similar to how we added `spark`, `trino`, etc, to the dictionary `OFFLINE_STORE_CLASS_FOR_TYPE` and add the necessary configuration to `repo_configuration.py`. Namely, `AVAILABLE_OFFLINE_STORES` should load your repo configuration module.

### Dependencies

Finally, if your offline store requires special packages, add them to our `sdk/python/setup.py` under a new `<OFFLINE>_STORE_REQUIRED` list with the packages and add it to the setup script so that if your offline store is needed, users can install the necessary python packages.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should clarify it should go as an extra so it isn't installed by users by default

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

You will need to regenerate our requirements files. To do this, create separate pyenv environments for python 3.8, 3.9, and 3.10. In each environment, run the following commands:

```
export PYTHON=<version>
make lock-python-ci-dependencies
make lock-python-dependencies
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the normal lock-python-dependencies shouldn't be needed right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup thx for the catch

```


### Documentation

Remember to update the documentation for your offline store. This can be found in `docs/reference/offline-stores/` and `docs/reference/data-sources/`. You should also add a reference in `docs/reference/data-sources/README.md` and `docs/SUMMARY.md`. Add a new markdown documentation and document the functions similar to how the other offline stores are documented. Be sure to cover how to create the datasource and most importantly, what configuration is needed in the `feature_store.yaml` file in order to create the datasource and also make sure to flag that the datasource is in alpha development.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "add documentation" instead of update.

There's also additional work around adding python code docs (i.e. call make build-sphinx)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.



An example of a full pull request for adding a custom offline store can be found [here](https://github.com/feast-dev/feast/pull/2401).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again would not use this as an example because it's too large and hard to parse

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed above.



71 changes: 66 additions & 5 deletions docs/how-to-guides/adding-support-for-a-new-online-store.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,10 @@ Feast makes adding support for a new online store (database) easy. Developers ca

In this guide, we will show you how to integrate with MySQL as an online store. While we will be implementing a specific store, this guide should be representative for adding support for any new online store.

The full working code for this guide can be found at [feast-dev/feast-custom-online-store-demo](https://github.com/feast-dev/feast-custom-online-store-demo).
The full working code for this guide can be found at [feast-dev/feast-custom-online-store-demo](https://github.com/feast-dev/feast-custom-online-store-demo) and an example of a custom offline store that was contributed by developers can be found [here](https://github.com/feast-dev/feast/pull/2401).

The process of using a custom online store consists of 3 steps:

The process of using a custom online store consists of 4 steps:

1. Defining the `OnlineStore` class.
2. Defining the `OnlineStoreConfig` class.
Expand Down Expand Up @@ -79,7 +80,7 @@ def teardown(
entities: Sequence[Entity],
):
"""

"""
conn = self._get_conn(config)
cur = conn.cursor(buffered=True)
Expand Down Expand Up @@ -243,7 +244,7 @@ To use our MySQL online store, we can use the following `feature_store.yaml`:
project: test_custom
registry: data/registry.db
provider: local
online_store:
online_store:
type: feast_custom_online_store.mysql.MySQLOnlineStore
user: foo
password: bar
Expand All @@ -263,12 +264,22 @@ online_store: feast_custom_online_store.mysql.MySQLOnlineStore

## 4. Testing the OnlineStore class

### Contrib

Generally, new online stores should go in the contrib folder and have alpha functionality tags. The contrib folder is designated for community contributions that are not fully maintained by Feast maintainers and may contain potential instability and have API changes. It is recommended to add warnings to users that the online store functionality is still in alpha development and is not fully stable. In order to be classified as fully stable and be moved to the main online store folder, the online store should integrate with the full integration test suite in Feast and pass all of the test cases.

### Integrating with the integration test suite and unit test suite.

Even if you have created the `OnlineStore` class in a separate repo, you can still test your implementation against the Feast test suite, as long as you have Feast as a submodule in your repo. In the Feast submodule, we can run all the unit tests with:

```
make test
make test-python
```

This should run the unit tests and the unit tests should all pass. Please add unit tests for your online store that test out basic functionality of reading and writing to and from the online store. This should just be class level functionality that ensures that the methods you implemented for the OnlineStore work as expected. In order to be approved to merge into Feast, these unit tests should all pass and demonstrate that the DataSource works as intended.


Comment thread
adchia marked this conversation as resolved.
Outdated

The universal tests, which are integration tests specifically intended to test offline and online stores, can be run with:

```
Expand All @@ -283,3 +294,53 @@ make test-python-universal
```

to test the MySQL online store against the Feast universal tests. You should notice that some of the tests actually fail; this indicates that there is a mistake in the implementation of this online store!


In order to test your online store, overwrite the `AVAILABLE_ONLINE_STORES` dictionary and add a reference to your online store. If you are planning to start the online store up locally(e.g spin up a local Redis Instance), then the dictionary entry should be something like:

```
{
"sqlite": ({"type": "sqlite"}, None),
}
```

The key is the name of the online store and the tuple contains the online store config represented as a dictionary. Make sure that the sqlite config defines the full package so that Feast can import the online store dynamically at runtime. The None type in the tuple can be replaced by an `OnlineStoreCreator` which is only useful for specific online stores that have containerized docker images. If you create a containerized docker image for testing, developers who are trying to test with your online store will not have to spin up their own instance of the online store(e.g a redis instance or dynamo instance) in order to perform testing. An example of an `OnlineStoreCreator` is shown below:

```
class RedisOnlineStoreCreator(OnlineStoreCreator):
def __init__(self, project_name: str, **kwargs):
super().__init__(project_name)
self.container = DockerContainer("redis").with_exposed_ports("6379")

def create_online_store(self) -> Dict[str, str]:
self.container.start()
log_string_to_wait_for = "Ready to accept connections"
wait_for_logs(
container=self.container, predicate=log_string_to_wait_for, timeout=10
)
exposed_port = self.container.get_exposed_port("6379")
return {"type": "redis", "connection_string": f"localhost:{exposed_port},db=0"}

def teardown(self):
self.container.stop()
```


### Dependencies

Finally, if your online store requires special packages, add them to our `sdk/python/setup.py` under a new `<ONLINE>_STORE_REQUIRED` list with the packages and add it to the setup script so that if your online store is needed, users can install the necessary python packages.
You will need to regenerate our requirements files. To do this, create separate pyenv environments for python 3.8, 3.9, and 3.10. In each environment, run the following commands:

```
export PYTHON=<version>
make lock-python-ci-dependencies
make lock-python-dependencies
```


### Documentation

Remember to update the documentation for your online store. This can be found in `docs/reference/online-stores/`. You should also add a reference in `docs/reference/online-stores/README.md` and `docs/SUMMARY.md`. Add a new markdown documentation and document the functions similar to how the other online stores are documented. Be sure to cover how to create the datasource and most importantly, what configuration is needed in the `feature_store.yaml` file in order to create the datasource and also make sure to flag that the online store is in alpha development.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might additionally want to cover what the data model is

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.



An example of a full pull request for adding a custom online store can be found [here](https://github.com/feast-dev/feast/pull/2401).