Skip to content

Commit 9f7e557

Browse files
kevjumbaadchia
andauthored
feat: Contrib azure provider with synapse/mssql offline store and Azure registry store (feast-dev#3072)
* Broken state Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * working state Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix the lint issues Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Semi working state Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fremove print Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix lint Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Run build-sphinx Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Add tutorials Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix? Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix lint Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix lint Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Begin configuring tests Signed-off-by: Danny Chiao <danny@tecton.ai> * Fix Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Working version Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix lint Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix lint Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix lint Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix azure Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix lint and address issues Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix integration tests Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix lint and address issues Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Revert Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix lint Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix lint Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix pyarrow Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * Fix lint Signed-off-by: Kevin Zhang <kzhang@tecton.ai> * add requirements files Signed-off-by: Danny Chiao <danny@tecton.ai> * fix name of docs Signed-off-by: Danny Chiao <danny@tecton.ai> * fix offline store readme Signed-off-by: Danny Chiao <danny@tecton.ai> * fix offline store readme Signed-off-by: Danny Chiao <danny@tecton.ai> * fix Signed-off-by: Danny Chiao <danny@tecton.ai> * fix Signed-off-by: Danny Chiao <danny@tecton.ai> Signed-off-by: Kevin Zhang <kzhang@tecton.ai> Signed-off-by: Danny Chiao <danny@tecton.ai> Co-authored-by: Danny Chiao <danny@tecton.ai>
1 parent 4310ed7 commit 9f7e557

65 files changed

Lines changed: 3806 additions & 87 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Makefile

Lines changed: 29 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,8 @@ test-python-integration-local:
8181
python -m pytest -n 8 --integration \
8282
-k "not gcs_registry and \
8383
not s3_registry and \
84-
not test_lambda_materialization" \
84+
not test_lambda_materialization and \
85+
not test_snowflake" \
8586
sdk/python/tests \
8687
) || echo "This script uses Docker, and it isn't running - please start the Docker Daemon and try again!";
8788

@@ -113,7 +114,8 @@ test-python-universal-spark:
113114
not test_push_features_to_offline_store.py and \
114115
not gcs_registry and \
115116
not s3_registry and \
116-
not test_universal_types" \
117+
not test_universal_types and \
118+
not test_snowflake" \
117119
sdk/python/tests
118120

119121
test-python-universal-trino:
@@ -136,9 +138,27 @@ test-python-universal-trino:
136138
not test_push_features_to_offline_store.py and \
137139
not gcs_registry and \
138140
not s3_registry and \
139-
not test_universal_types" \
141+
not test_universal_types and \
142+
not test_snowflake" \
140143
sdk/python/tests
141144

145+
146+
# Note: to use this, you'll need to have Microsoft ODBC 17 installed.
147+
# See https://docs.microsoft.com/en-us/sql/connect/odbc/linux-mac/install-microsoft-odbc-driver-sql-server-macos?view=sql-server-ver15#17
148+
test-python-universal-mssql:
149+
PYTHONPATH='.' \
150+
FULL_REPO_CONFIGS_MODULE=sdk.python.feast.infra.offline_stores.contrib.mssql_repo_configuration \
151+
PYTEST_PLUGINS=feast.infra.offline_stores.contrib.mssql_offline_store.tests \
152+
FEAST_USAGE=False IS_TEST=True \
153+
FEAST_LOCAL_ONLINE_CONTAINER=True \
154+
python -m pytest -n 8 --integration \
155+
-k "not gcs_registry and \
156+
not s3_registry and \
157+
not test_lambda_materialization and \
158+
not test_snowflake" \
159+
sdk/python/tests
160+
161+
142162
#To use Athena as an offline store, you need to create an Athena database and an S3 bucket on AWS. https://docs.aws.amazon.com/athena/latest/ug/getting-started.html
143163
#Modify environment variables ATHENA_DATA_SOURCE, ATHENA_DATABASE, ATHENA_S3_BUCKET_NAME if you want to change the data source, database, and bucket name of S3 to use.
144164
#If tests fail with the pytest -n 8 option, change the number to 1.
@@ -161,7 +181,8 @@ test-python-universal-athena:
161181
not test_historical_features_persisting and \
162182
not test_historical_retrieval_fails_on_validation and \
163183
not gcs_registry and \
164-
not s3_registry" \
184+
not s3_registry and \
185+
not test_snowflake" \
165186
sdk/python/tests
166187

167188
test-python-universal-postgres-offline:
@@ -203,7 +224,8 @@ test-python-universal-postgres-online:
203224
not test_push_features_to_offline_store and \
204225
not gcs_registry and \
205226
not s3_registry and \
206-
not test_universal_types" \
227+
not test_universal_types and \
228+
not test_snowflake" \
207229
sdk/python/tests
208230

209231
test-python-universal-cassandra:
@@ -230,7 +252,8 @@ test-python-universal-cassandra-no-cloud-providers:
230252
not test_apply_data_source_integration and \
231253
not test_nullable_online_store and \
232254
not gcs_registry and \
233-
not s3_registry" \
255+
not s3_registry and \
256+
not test_snowflake" \
234257
sdk/python/tests
235258

236259
test-python-universal:

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -152,7 +152,7 @@ The list below contains the functionality that contributors are planning to deve
152152
* [x] [Redshift source](https://docs.feast.dev/reference/data-sources/redshift)
153153
* [x] [BigQuery source](https://docs.feast.dev/reference/data-sources/bigquery)
154154
* [x] [Parquet file source](https://docs.feast.dev/reference/data-sources/file)
155-
* [x] [Synapse source (community plugin)](https://github.com/Azure/feast-azure)
155+
* [x] [Azure Synapse + Azure SQL source (contrib plugin)](https://docs.feast.dev/reference/data-sources/mssql)
156156
* [x] [Hive (community plugin)](https://github.com/baineng/feast-hive)
157157
* [x] [Postgres (contrib plugin)](https://docs.feast.dev/reference/data-sources/postgres)
158158
* [x] [Spark (contrib plugin)](https://docs.feast.dev/reference/data-sources/spark)
@@ -161,7 +161,7 @@ The list below contains the functionality that contributors are planning to deve
161161
* [x] [Snowflake](https://docs.feast.dev/reference/offline-stores/snowflake)
162162
* [x] [Redshift](https://docs.feast.dev/reference/offline-stores/redshift)
163163
* [x] [BigQuery](https://docs.feast.dev/reference/offline-stores/bigquery)
164-
* [x] [Synapse (community plugin)](https://github.com/Azure/feast-azure)
164+
* [x] [Azure Synapse + Azure SQL (contrib plugin)](https://docs.feast.dev/reference/offline-stores/mssql.md)
165165
* [x] [Hive (community plugin)](https://github.com/baineng/feast-hive)
166166
* [x] [Postgres (contrib plugin)](https://docs.feast.dev/reference/offline-stores/postgres)
167167
* [x] [Trino (contrib plugin)](https://github.com/Shopify/feast-trino)

docs/SUMMARY.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,7 @@
7171
* [Spark (contrib)](reference/data-sources/spark.md)
7272
* [PostgreSQL (contrib)](reference/data-sources/postgres.md)
7373
* [Trino (contrib)](reference/data-sources/trino.md)
74+
* [Azure Synapse + Azure SQL (contrib)](reference/data-sources/mssql.md)
7475
* [Offline stores](reference/offline-stores/README.md)
7576
* [Overview](reference/offline-stores/overview.md)
7677
* [File](reference/offline-stores/file.md)
@@ -80,17 +81,20 @@
8081
* [Spark (contrib)](reference/offline-stores/spark.md)
8182
* [PostgreSQL (contrib)](reference/offline-stores/postgres.md)
8283
* [Trino (contrib)](reference/offline-stores/trino.md)
84+
* [Azure Synapse + Azure SQL (contrib)](reference/offline-stores/mssql.md)
8385
* [Online stores](reference/online-stores/README.md)
8486
* [SQLite](reference/online-stores/sqlite.md)
8587
* [Snowflake](reference/online-stores/snowflake.md)
8688
* [Redis](reference/online-stores/redis.md)
8789
* [Datastore](reference/online-stores/datastore.md)
8890
* [DynamoDB](reference/online-stores/dynamodb.md)
8991
* [PostgreSQL (contrib)](reference/online-stores/postgres.md)
92+
* [Cassandra + Astra DB (contrib)](reference/online-stores/cassandra.md)
9093
* [Providers](reference/providers/README.md)
9194
* [Local](reference/providers/local.md)
9295
* [Google Cloud Platform](reference/providers/google-cloud-platform.md)
9396
* [Amazon Web Services](reference/providers/amazon-web-services.md)
97+
* [Azure](reference/providers/azure.md)
9498
* [Feature repository](reference/feature-repository/README.md)
9599
* [feature\_store.yaml](reference/feature-repository/feature-store-yaml.md)
96100
* [.feastignore](reference/feature-repository/feast-ignore.md)

docs/getting-started/concepts/registry.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
# Registry
22

3-
Feast uses a registry to store all applied Feast objects (e.g. Feature views, entities, etc). The registry exposes
3+
Feast uses a registry to store all applied Feast objects (e.g. Feature views, entities, etc). The registry exposes
44
methods to apply, list, retrieve and delete these objects, and is an abstraction with multiple implementations.
55

66
### Options for registry implementations
77

88
#### File-based registry
9-
By default, Feast uses a file-based registry implementation, which stores the protobuf representation of the registry as
10-
a serialized file. This registry file can be stored in a local file system, or in cloud storage (in, say, S3 or GCS).
9+
By default, Feast uses a file-based registry implementation, which stores the protobuf representation of the registry as
10+
a serialized file. This registry file can be stored in a local file system, or in cloud storage (in, say, S3 or GCS, or Azure).
1111

12-
The quickstart guides that use `feast init` will use a registry on a local file system. To allow Feast to configure
12+
The quickstart guides that use `feast init` will use a registry on a local file system. To allow Feast to configure
1313
a remote file registry, you need to create a GCS / S3 bucket that Feast can understand:
1414
{% tabs %}
1515
{% tab title="Example S3 file registry" %}
@@ -35,9 +35,9 @@ offline_store:
3535
{% endtab %}
3636
{% endtabs %}
3737
38-
However, there are inherent limitations with a file-based registry, since changing a single field in the registry
39-
requires re-writing the whole registry file. With multiple concurrent writers, this presents a risk of data loss, or
40-
bottlenecks writes to the registry since all changes have to be serialized (e.g. when running materialization for
38+
However, there are inherent limitations with a file-based registry, since changing a single field in the registry
39+
requires re-writing the whole registry file. With multiple concurrent writers, this presents a risk of data loss, or
40+
bottlenecks writes to the registry since all changes have to be serialized (e.g. when running materialization for
4141
multiple feature views or time ranges concurrently).
4242
4343
#### SQL Registry
@@ -47,14 +47,14 @@ This supports any SQLAlchemy compatible database as a backend. The exact schema
4747
4848
### Updating the registry
4949
50-
We recommend users store their Feast feature definitions in a version controlled repository, which then via CI/CD
51-
automatically stays synced with the registry. Users will often also want multiple registries to correspond to
52-
different environments (e.g. dev vs staging vs prod), with staging and production registries with locked down write
50+
We recommend users store their Feast feature definitions in a version controlled repository, which then via CI/CD
51+
automatically stays synced with the registry. Users will often also want multiple registries to correspond to
52+
different environments (e.g. dev vs staging vs prod), with staging and production registries with locked down write
5353
access since they can impact real user traffic. See [Running Feast in Production](../../how-to-guides/running-feast-in-production.md#1.-automatically-deploying-changes-to-your-feature-definitions) for details on how to set this up.
5454
5555
### Accessing the registry from clients
5656
57-
Users can specify the registry through a `feature_store.yaml` config file, or programmatically. We often see teams
57+
Users can specify the registry through a `feature_store.yaml` config file, or programmatically. We often see teams
5858
preferring the programmatic approach because it makes notebook driven development very easy:
5959

6060
#### Option 1: programmatically specifying the registry

docs/how-to-guides/adding-or-reusing-tests.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -241,7 +241,8 @@ def test_historical_features(environment, universal_data_sources, full_feature_n
241241
validate_dataframes(
242242
expected_df,
243243
table_from_df_entities,
244-
keys=[event_timestamp, "order_id", "driver_id", "customer_id"],
244+
sort_by=[event_timestamp, "order_id", "driver_id", "customer_id"],
245+
event_timestamp = event_timestamp,
245246
)
246247
# ... more test code
247248
```

docs/reference/data-sources/README.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,13 @@ Please see [Data Source](../../getting-started/concepts/data-ingestion.md) for a
3535
{% endcontent-ref %}
3636

3737
{% content-ref url="postgres.md" %}
38-
[postgres.md]([postgres].md)
38+
[postgres.md](postgres.md)
3939
{% endcontent-ref %}
4040

4141
{% content-ref url="trino.md" %}
42-
[trino.md]([trino].md)
42+
[trino.md](trino.md)
43+
{% endcontent-ref %}
44+
45+
{% content-ref url="mssql.md" %}
46+
[mssql.md](mssql.md)
4347
{% endcontent-ref %}
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# MsSQL source (contrib)
2+
3+
## Description
4+
5+
MsSQL data sources are Microsoft sql table sources.
6+
These can be specified either by a table reference or a SQL query.
7+
8+
## Disclaimer
9+
10+
The MsSQL data source does not achieve full test coverage.
11+
Please do not assume complete stability.
12+
13+
## Examples
14+
15+
Defining a MsSQL source:
16+
17+
```python
18+
from feast.infra.offline_stores.contrib.mssql_offline_store.mssqlserver_source import (
19+
MsSqlServerSource,
20+
)
21+
22+
driver_hourly_table = "driver_hourly"
23+
24+
driver_source = MsSqlServerSource(
25+
table_ref=driver_hourly_table,
26+
event_timestamp_column="datetime",
27+
created_timestamp_column="created",
28+
)
29+
```

docs/reference/offline-stores/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,3 +35,7 @@ Please see [Offline Store](../../getting-started/architecture-and-components/off
3535
{% content-ref url="trino.md" %}
3636
[trino.md](trino.md)
3737
{% endcontent-ref %}
38+
39+
{% content-ref url="mssql.md" %}
40+
[mssql.md](mssql.md)
41+
{% endcontent-ref %}
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# MsSQL/Synapse offline store (contrib)
2+
3+
## Description
4+
5+
The MsSQL offline store provides support for reading [MsSQL Sources](../data-sources/mssql.md). Specifically, it is developed to read from [Synapse SQL](https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/overview-features) on Microsoft Azure
6+
7+
* Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe.
8+
9+
## Disclaimer
10+
11+
The MsSQL offline store does not achieve full test coverage.
12+
Please do not assume complete stability.
13+
14+
## Example
15+
16+
{% code title="feature_store.yaml" %}
17+
```yaml
18+
registry:
19+
registry_store_type: AzureRegistryStore
20+
path: ${REGISTRY_PATH} # Environment Variable
21+
project: production
22+
provider: azure
23+
online_store:
24+
type: redis
25+
connection_string: ${REDIS_CONN} # Environment Variable
26+
offline_store:
27+
type: mssql
28+
connection_string: ${SQL_CONN} # Environment Variable
29+
```
30+
{% endcode %}
31+
32+
## Functionality Matrix
33+
34+
The set of functionality supported by offline stores is described in detail [here](overview.md#functionality).
35+
Below is a matrix indicating which functionality is supported by the Spark offline store.
36+
37+
| | MsSql |
38+
| :-------------------------------- | :-- |
39+
| `get_historical_features` (point-in-time correct join) | yes |
40+
| `pull_latest_from_table_or_query` (retrieve latest feature values) | yes |
41+
| `pull_all_from_table_or_query` (retrieve a saved dataset) | yes |
42+
| `offline_write_batch` (persist dataframes to offline store) | no |
43+
| `write_logged_features` (persist logged features to offline store) | no |
44+
45+
Below is a matrix indicating which functionality is supported by `MsSqlServerRetrievalJob`.
46+
47+
| | MsSql |
48+
| --------------------------------- | --- |
49+
| export to dataframe | yes |
50+
| export to arrow table | yes |
51+
| export to arrow batches | no |
52+
| export to SQL | no |
53+
| export to data lake (S3, GCS, etc.) | no |
54+
| export to data warehouse | no |
55+
| local execution of Python-based on-demand transforms | no |
56+
| remote execution of Python-based on-demand transforms | no |
57+
| persist results in the offline store | yes |
58+
59+
To compare this set of functionality against other offline stores, please see the full [functionality matrix](overview.md#functionality-matrix).

docs/reference/online-stores/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,3 +29,4 @@ Please see [Online Store](../../getting-started/architecture-and-components/onli
2929
{% content-ref url="cassandra.md" %}
3030
[cassandra.md](cassandra.md)
3131
{% endcontent-ref %}
32+

0 commit comments

Comments
 (0)