From df26658f39eeef760d23986d65c6a6a521b2fff5 Mon Sep 17 00:00:00 2001 From: Felix Wang Date: Mon, 8 Aug 2022 18:03:32 -0700 Subject: [PATCH 1/2] Codebase structure docs Signed-off-by: Felix Wang --- docs/SUMMARY.md | 1 + docs/reference/codebase-structure.md | 147 +++++++++++++++++++++++++++ 2 files changed, 148 insertions(+) create mode 100644 docs/reference/codebase-structure.md diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index fa1a49bc1f..bdfe9555dd 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -59,6 +59,7 @@ ## Reference +* [Codebase Structure](reference/codebase-structure.md) * [Data sources](reference/data-sources/README.md) * [File](reference/data-sources/file.md) * [Snowflake](reference/data-sources/snowflake.md) diff --git a/docs/reference/codebase-structure.md b/docs/reference/codebase-structure.md new file mode 100644 index 0000000000..12349b5465 --- /dev/null +++ b/docs/reference/codebase-structure.md @@ -0,0 +1,147 @@ +# Codebase structure + +Let's examine the Feast codebase. + +``` +$ tree -L 1 -d +. +├── docs +├── examples +├── go +├── infra +├── java +├── protos +├── sdk +└── ui +``` + +## Python SDK + +The Python SDK lives in `sdk/python/feast`. + +``` +$ cd sdk/python/feast +$ tree -L 1 +. +├── __init__.py +├── aggregation.py +├── base_feature_view.py +├── batch_feature_view.py +├── cli.py +├── constants.py +├── data_format.py +├── data_source.py +├── diff +├── dqm +├── driver_test_data.py +├── embedded_go +├── entity.py +├── errors.py +├── feast_object.py +├── feature.py +├── feature_logging.py +├── feature_server.py +├── feature_service.py +├── feature_store.py +├── feature_view.py +├── feature_view_projection.py +├── field.py +├── flags_helper.py +├── importer.py +├── inference.py +├── infra +├── loaders +├── names.py +├── on_demand_feature_view.py +├── online_response.py +├── project_metadata.py +├── proto_json.py +├── py.typed +├── registry.py +├── registry_store.py +├── repo_config.py +├── repo_contents.py +├── repo_operations.py +├── repo_upgrade.py +├── request_feature_view.py +├── saved_dataset.py +├── stream_feature_view.py +├── templates +├── transformation_server.py +├── type_map.py +├── types.py +├── ui +├── ui_server.py +├── usage.py +├── utils.py +├── value_type.py +├── version.py +└── wait.py +``` + +The majority of Feast logic lives in these Python files: +* The core Feast objects ([entities](../getting-started/concepts/entity.md), [feature views](../getting-started/concepts/feature-view.md), [data sources](../getting-started/concepts/dataset.md), etc.) are defined in their respective Python files, such as `entity.py`, `feature_view.py`, and `data_source.py`. +* The feature store object is defined in `feature_store.py` and the associated configuration objects are defined in `repo_config.py`. +* The registry is defined in `registry.py`. +* The CLI and other core feature store logic are defined in `cli.py` and `repo_operations.py`. +* The type system that is used to manage conversion between Feast types and external typing systems is managed in `type_map.py`. +* The Python feature server is defined in `feature_server.py`. + +There are also several important submodules: +* `infra/` contains all the infrastructure components, such as offline and online stores. +* `dqm/` covers data quality monitoring. +* `diff/` covers the logic for creating plans. +* `embedded_go/` covers the Go feature server. +* `ui/` contains the embedded Web UI, to be launched on the `feast ui` command. + +Of these submodules, `infra/` is the most important. +It contains the interfaces for the [provider](getting-started/architecture-and-components/provider.md), [offline store](getting-started/architecture-and-components/offline-store.md), [online store](getting-started/architecture-and-components/online-store.md), and [batch materialization engine](getting-started/architecture-and-components/batch-materialization-engine.md), as well as all of their individual implementations. + +``` +$ tree -L 1 infra +. +├── __init__.py +├── __pycache__ +├── aws.py +├── contrib +├── feature_servers +├── gcp.py +├── infra_object.py +├── key_encoding_utils.py +├── local.py +├── materialization +├── offline_stores +├── online_stores +├── passthrough_provider.py +├── provider.py +├── registry_stores +├── transformation_servers +└── utils +``` + +The tests for the Python SDK are contained in `sdk/python/tests`. +For more details, see this [overview](../how-to-guides/adding-or-reusing-tests.md#test-suite-overview) of the test suite. + +## Java SDK + +The `java/` directory contains the Java serving component. +See [here](https://github.com/feast-dev/feast/blob/master/java/CONTRIBUTING.md) for more details on how the repo is structured. + +## Go feature server + +The `go/` directory contains the Go feature server. +Within `go/`, the `internal/feast/` directory contains most of the core logic: +* `onlineserving/` covers the core serving logic. +* `model/` contains the implementations of the Feast objects (entity, feature view, etc.). +* `registry/` covers the registry. +* `onlinestore/` covers the online stores (currently only Redis and SQLite are supported). + +## Protobufs + +Feast uses [protobuf](https://github.com/protocolbuffers/protobuf) to store serialized versions of the core Feast objects. +The protobuf definitions are stored in `protos/feast`. + +## Web UI + +The `ui/` directory contains the Web UI. +See [here](https://github.com/feast-dev/feast/blob/master/ui/CONTRIBUTING.md) for more details on the structure of the Web UI. From c2ab2a6c3862b99ec43a24ee1e395f145e08166f Mon Sep 17 00:00:00 2001 From: Felix Wang Date: Wed, 10 Aug 2022 18:45:12 -0700 Subject: [PATCH 2/2] Address code review Signed-off-by: Felix Wang --- CONTRIBUTING.md | 2 + docs/reference/codebase-structure.md | 144 ++++++++++++--------------- 2 files changed, 66 insertions(+), 80 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index d16aa23d89..6d5b21d4cc 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -48,6 +48,8 @@ the main Feast repository: - [Feast Java Serving](#feast-java-serving) - [Feast Go Client](#feast-go-client) +Please see [this page](https://docs.feast.dev/reference/codebase-structure) for more details on the structure of the entire codebase. + ## Community See [Contribution process](https://docs.feast.dev/project/contributing) and [Community](https://docs.feast.dev/community) for details on how to get more involved in the community. diff --git a/docs/reference/codebase-structure.md b/docs/reference/codebase-structure.md index 12349b5465..b75227860b 100644 --- a/docs/reference/codebase-structure.md +++ b/docs/reference/codebase-structure.md @@ -1,6 +1,7 @@ # Codebase structure Let's examine the Feast codebase. +This analysis is accurate as of Feast 0.23. ``` $ tree -L 1 -d @@ -18,110 +19,90 @@ $ tree -L 1 -d ## Python SDK The Python SDK lives in `sdk/python/feast`. - -``` -$ cd sdk/python/feast -$ tree -L 1 -. -├── __init__.py -├── aggregation.py -├── base_feature_view.py -├── batch_feature_view.py -├── cli.py -├── constants.py -├── data_format.py -├── data_source.py -├── diff -├── dqm -├── driver_test_data.py -├── embedded_go -├── entity.py -├── errors.py -├── feast_object.py -├── feature.py -├── feature_logging.py -├── feature_server.py -├── feature_service.py -├── feature_store.py -├── feature_view.py -├── feature_view_projection.py -├── field.py -├── flags_helper.py -├── importer.py -├── inference.py -├── infra -├── loaders -├── names.py -├── on_demand_feature_view.py -├── online_response.py -├── project_metadata.py -├── proto_json.py -├── py.typed -├── registry.py -├── registry_store.py -├── repo_config.py -├── repo_contents.py -├── repo_operations.py -├── repo_upgrade.py -├── request_feature_view.py -├── saved_dataset.py -├── stream_feature_view.py -├── templates -├── transformation_server.py -├── type_map.py -├── types.py -├── ui -├── ui_server.py -├── usage.py -├── utils.py -├── value_type.py -├── version.py -└── wait.py -``` - The majority of Feast logic lives in these Python files: * The core Feast objects ([entities](../getting-started/concepts/entity.md), [feature views](../getting-started/concepts/feature-view.md), [data sources](../getting-started/concepts/dataset.md), etc.) are defined in their respective Python files, such as `entity.py`, `feature_view.py`, and `data_source.py`. -* The feature store object is defined in `feature_store.py` and the associated configuration objects are defined in `repo_config.py`. -* The registry is defined in `registry.py`. +* The `FeatureStore` class is defined in `feature_store.py` and the associated configuration object (the Python representation of the `feature_store.yaml` file) are defined in `repo_config.py`. * The CLI and other core feature store logic are defined in `cli.py` and `repo_operations.py`. * The type system that is used to manage conversion between Feast types and external typing systems is managed in `type_map.py`. -* The Python feature server is defined in `feature_server.py`. +* The Python feature server (the server that is started through the `feast serve` command) is defined in `feature_server.py`. There are also several important submodules: -* `infra/` contains all the infrastructure components, such as offline and online stores. -* `dqm/` covers data quality monitoring. -* `diff/` covers the logic for creating plans. +* `infra/` contains all the infrastructure components, such as the provider, offline store, online store, batch materialization engine, and registry. +* `dqm/` covers data quality monitoring, such as the dataset profiler. +* `diff/` covers the logic for determining how to apply infrastructure changes upon feature repo changes (e.g. the output of `feast plan` and `feast apply`). * `embedded_go/` covers the Go feature server. * `ui/` contains the embedded Web UI, to be launched on the `feast ui` command. Of these submodules, `infra/` is the most important. -It contains the interfaces for the [provider](getting-started/architecture-and-components/provider.md), [offline store](getting-started/architecture-and-components/offline-store.md), [online store](getting-started/architecture-and-components/online-store.md), and [batch materialization engine](getting-started/architecture-and-components/batch-materialization-engine.md), as well as all of their individual implementations. +It contains the interfaces for the [provider](getting-started/architecture-and-components/provider.md), [offline store](getting-started/architecture-and-components/offline-store.md), [online store](getting-started/architecture-and-components/online-store.md), [batch materialization engine](getting-started/architecture-and-components/batch-materialization-engine.md), and [registry](getting-started/architecture-and-components/registry.md), as well as all of their individual implementations. ``` -$ tree -L 1 infra -. -├── __init__.py -├── __pycache__ -├── aws.py +$ tree --dirsfirst -L 1 infra +infra ├── contrib ├── feature_servers +├── materialization +├── offline_stores +├── online_stores +├── registry +├── transformation_servers +├── utils +├── __init__.py +├── aws.py ├── gcp.py ├── infra_object.py ├── key_encoding_utils.py ├── local.py -├── materialization -├── offline_stores -├── online_stores ├── passthrough_provider.py -├── provider.py -├── registry_stores -├── transformation_servers -└── utils +└── provider.py ``` The tests for the Python SDK are contained in `sdk/python/tests`. For more details, see this [overview](../how-to-guides/adding-or-reusing-tests.md#test-suite-overview) of the test suite. +### Example flow: `feast apply` + +Let's walk through how `feast apply` works by tracking its execution across the codebase. + +1. All CLI commands are in `cli.py`. + Most of these commands are backed by methods in `repo_operations.py`. + The `feast apply` command triggers `apply_total_command`, which then calls `apply_total` in `repo_operations.py`. +2. With a `FeatureStore` object (from `feature_store.py`) that is initialized based on the `feature_store.yaml` in the current working directory, `apply_total` first parses the feature repo with `parse_repo` and then calls either `FeatureStore.apply` or `FeatureStore._apply_diffs` to apply those changes to the feature store. +3. Let's examine `FeatureStore.apply`. + It splits the objects based on class (e.g. `Entity`, `FeatureView`, etc.) and then calls the appropriate registry method to apply or delete the object. + For example, it might call `self._registry.apply_entity` to apply an entity. + If the default file-based registry is used, this logic can be found in `infra/registry/registry.py`. +4. Then the feature store must update its cloud infrastructure (e.g. online store tables) to match the new feature repo, so it calls `Provider.update_infra`, which can be found in `infra/provider.py`. +5. Assuming the provider is a built-in provider (e.g. one of the local, GCP, or AWS providers), it will call `PassthroughProvider.update_infra` in `infra/passthrough_provider.py`. +6. This delegates to the online store and batch materialization engine. + For example, if the feature store is configured to use the Redis online store then the `update` method from `infra/online_stores/redis.py` will be called. + And if the local materialization engine is configured then the `update` method from `infra/materialization/local_engine.py` will be called. + +At this point, the `feast apply` command is complete. + +### Example flow: `feast materialize` + +Let's walk through how `feast materialize` works by tracking its execution across the codebase. + +1. The `feast materialize` command triggers `materialize_command` in `cli.py`, which then calls `FeatureStore.materialize` from `feature_store.py`. +2. This then calls `Provider.materialize_single_feature_view`, which can be found in `infra/provider.py`. +3. As with `feast apply`, the provider is most likely backed by the passthrough provider, in which case `PassthroughProvider.materialize_single_feature_view` will be called. +4. This delegates to the underlying batch materialization engine. + Assuming that the local engine has been configured, `LocalMaterializationEngine.materialize` from `infra/materialization/local_engine.py` will be called. +5. Since materialization involves reading features from the offline store and writing them to the online store, the local engine will delegate to both the offline store and online store. + Specifically, it will call `OfflineStore.pull_latest_from_table_or_query` and `OnlineStore.online_write_batch`. + These two calls will be routed to the offline store and online store that have been configured. + +### Example flow: `get_historical_features` + +Let's walk through how `get_historical_features` works by tracking its execution across the codebase. + +1. We start with `FeatureStore.get_historical_features` in `feature_store.py`. + This method does some internal preparation, and then delegates the actual execution to the underlying provider by calling `Provider.get_historical_features`, which can be found in `infra/provider.py`. +2. As with `feast apply`, the provider is most likely backed by the passthrough provider, in which case `PassthroughProvider.get_historical_features` will be called. +3. That call simply delegates to `OfflineStore.get_historical_features`. + So if the feature store is configured to use Snowflake as the offline store, `SnowflakeOfflineStore.get_historical_features` will be executed. + ## Java SDK The `java/` directory contains the Java serving component. @@ -130,10 +111,13 @@ See [here](https://github.com/feast-dev/feast/blob/master/java/CONTRIBUTING.md) ## Go feature server The `go/` directory contains the Go feature server. +Most of the files here have logic to help with reading features from the online store. Within `go/`, the `internal/feast/` directory contains most of the core logic: * `onlineserving/` covers the core serving logic. * `model/` contains the implementations of the Feast objects (entity, feature view, etc.). + * For example, `entity.go` is the Go equivalent of `entity.py`. It contains a very simple Go implementation of the entity object. * `registry/` covers the registry. + * Currently only the file-based registry supported (the sql-based registry is unsupported). Additionally, the file-based registry only supports a file-based registry store, not the GCS or S3 registry stores. * `onlinestore/` covers the online stores (currently only Redis and SQLite are supported). ## Protobufs