Skip to content

Commit 1bbd5dc

Browse files
authored
Merge branch 'master' into feat/dynamo_db_online_write_read
2 parents fb6eacb + 71d7ae2 commit 1bbd5dc

14 files changed

Lines changed: 88 additions & 57 deletions

File tree

CONTRIBUTING.md

Lines changed: 23 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,12 @@ A quick list of things to keep in mind as you're making changes:
2020
- When you make the PR
2121
- Make a pull request from the forked repo you made
2222
- Ensure you add a GitHub **label** (i.e. a kind tag to the PR (e.g. `kind/bug` or `kind/housekeeping`)) or else checks will fail.
23-
- Ensure you leave a release note for any user facing changes in the PR. There is a field automatically generated in the PR request. You can write `NONE` in that field if there are no user facing changes.
23+
- Ensure you leave a release note for any user facing changes in the PR. There is a field automatically generated in the PR request. You can write `NONE` in that field if there are no user facing changes.
2424
- Please run tests locally before submitting a PR (e.g. for Python, the [local integration tests](#local-integration-tests))
2525
- Try to keep PRs smaller. This makes them easier to review.
2626

2727
### Forking the repo
28-
Fork the Feast Github repo and clone your fork locally. Then make changes to a local branch to the fork.
28+
Fork the Feast Github repo and clone your fork locally. Then make changes to a local branch to the fork.
2929

3030
See [Creating a pull request from a fork](https://docs.github.com/en/github/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork)
3131

@@ -40,10 +40,10 @@ pre-commit install --hook-type pre-commit --hook-type pre-push
4040
3. On push, the pre-commit hook will run. This runs `make format` and `make lint`.
4141

4242
### Signing off commits
43-
> :warning: Warning: using the default integrations with IDEs like VSCode or IntelliJ will not sign commits.
43+
> :warning: Warning: using the default integrations with IDEs like VSCode or IntelliJ will not sign commits.
4444
> When you submit a PR, you'll have to re-sign commits to pass the DCO check.
4545
46-
Use git signoffs to sign your commits. See
46+
Use git signoffs to sign your commits. See
4747
https://docs.github.com/en/github/authenticating-to-github/managing-commit-signature-verification for details
4848

4949
Then, you can sign off commits with the `-s` flag:
@@ -121,15 +121,15 @@ There are two sets of tests you can run:
121121
To get local integration tests running, you'll need to have Redis setup:
122122

123123
Redis
124-
1. Install Redis: [Quickstart](https://redis.io/topics/quickstart)
125-
2. Run `redis-server`
124+
1. Install Redis: [Quickstart](https://redis.io/topics/quickstart)
125+
2. Run `redis-server`
126126

127127
Now run `make test-python-universal-local`
128128

129129
#### Full integration tests
130130
To test across clouds, on top of setting up Redis, you also need GCP / AWS / Snowflake setup.
131131

132-
> Note: you can manually control what tests are run today by inspecting
132+
> Note: you can manually control what tests are run today by inspecting
133133
> [RepoConfiguration](https://github.com/feast-dev/feast/blob/master/sdk/python/tests/integration/feature_repos/repo_configuration.py)
134134
> and commenting out tests that are added to `DEFAULT_FULL_REPO_CONFIGS`
135135
@@ -187,4 +187,19 @@ go vet
187187
Unit tests for the Feast Go Client can be run as follows:
188188
```sh
189189
go test
190-
```
190+
```
191+
192+
### Testing with Github Actions workflows
193+
* Update your current master on your forked branch and make a pull request against your own forked master.
194+
* Enable workflows by going to actions and clicking `Enable Workflows`.
195+
* Pushes will now run your edited workflow yaml file against your test code.
196+
* Unfortunately, in order to test any github workflow changes, you must push the code to the branch and see the output in the actions tab.
197+
198+
## Issues
199+
* pr-integration-tests workflow is skipped
200+
* Add `ok-to-test` github label.
201+
* pr-integration-tests errors out with `Error: fatal: invalid refspec '+refs/pull//merge:refs/remotes/pull//merge'`
202+
* This is because github actions cannot pull the branch version for some reason so just find your PR number in your pull request header and hard code it into the `uses: actions/checkout@v2` section (i.e replace `refs/pull/${{ github.event.pull_request.number }}/merge` with `refs/pull/<pr number>/merge`)
203+
* AWS/GCP workflow
204+
* Currently still cannot test GCP/AWS workflow without setting up secrets in a forked repository.
205+

docs/how-to-guides/adding-or-reusing-tests.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -202,4 +202,3 @@ Starting 6006
202202
* You should be able to run the integration tests and have the redis cluster tests pass.
203203
* If you would like to run your own redis cluster, you can run the above commands with your own specified ports and connect to the newly configured cluster.
204204
* To stop the cluster, run `./create-cluster stop` and then `./create-cluster clean`.
205-

docs/reference/data-sources/spark.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,9 @@ The spark data source API allows for the retrieval of historical feature values
1313
Using a table reference from SparkSession(for example, either in memory or a Hive Metastore)
1414

1515
```python
16-
from feast import SparkSource
16+
from feast.infra.offline_stores.contrib.spark_offline_store.spark_source import (
17+
SparkSource,
18+
)
1719

1820
my_spark_source = SparkSource(
1921
table="FEATURE_TABLE",
@@ -23,7 +25,9 @@ my_spark_source = SparkSource(
2325
Using a query
2426

2527
```python
26-
from feast import SparkSource
28+
from feast.infra.offline_stores.contrib.spark_offline_store.spark_source import (
29+
SparkSource,
30+
)
2731

2832
my_spark_source = SparkSource(
2933
query="SELECT timestamp as ts, created, f1, f2 "
@@ -34,7 +38,9 @@ my_spark_source = SparkSource(
3438
Using a file reference
3539

3640
```python
37-
from feast import SparkSource
41+
from feast.infra.offline_stores.contrib.spark_offline_store.spark_source import (
42+
SparkSource,
43+
)
3844

3945
my_spark_source = SparkSource(
4046
path=f"{CURRENT_DIR}/data/driver_hourly_stats",

sdk/python/feast/__init__.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,6 @@
33
from pkg_resources import DistributionNotFound, get_distribution
44

55
from feast.infra.offline_stores.bigquery_source import BigQuerySource
6-
from feast.infra.offline_stores.contrib.spark_offline_store.spark_source import (
7-
SparkSource,
8-
)
96
from feast.infra.offline_stores.file_source import FileSource
107
from feast.infra.offline_stores.redshift_source import RedshiftSource
118
from feast.infra.offline_stores.snowflake_source import SnowflakeSource
@@ -50,5 +47,4 @@
5047
"RedshiftSource",
5148
"RequestFeatureView",
5249
"SnowflakeSource",
53-
"SparkSource",
5450
]

sdk/python/feast/feature_store.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,6 @@
4242
from feast.data_source import DataSource
4343
from feast.diff.infra_diff import InfraDiff, diff_infra_protos
4444
from feast.diff.registry_diff import RegistryDiff, apply_diff_to_registry, diff_between
45-
from feast.dqm.profilers.ge_profiler import GEProfiler
4645
from feast.entity import Entity
4746
from feast.errors import (
4847
EntityNotFoundException,
@@ -881,7 +880,6 @@ def create_saved_dataset(
881880
storage: SavedDatasetStorage,
882881
tags: Optional[Dict[str, str]] = None,
883882
feature_service: Optional[FeatureService] = None,
884-
profiler: Optional[GEProfiler] = None,
885883
) -> SavedDataset:
886884
"""
887885
Execute provided retrieval job and persist its outcome in given storage.

sdk/python/feast/inference.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@
88
FileSource,
99
RedshiftSource,
1010
SnowflakeSource,
11-
SparkSource,
1211
)
1312
from feast.data_source import DataSource, RequestDataSource
1413
from feast.errors import RegistryInferenceFailure
@@ -87,8 +86,10 @@ def update_data_sources_with_inferred_event_timestamp_col(
8786
):
8887
# prepare right match pattern for data source
8988
ts_column_type_regex_pattern = ""
90-
if isinstance(data_source, FileSource) or isinstance(
91-
data_source, SparkSource
89+
# TODO(adchia): Move Spark source inference out of this logic
90+
if (
91+
isinstance(data_source, FileSource)
92+
or "SparkSource" == data_source.__class__.__name__
9293
):
9394
ts_column_type_regex_pattern = r"^timestamp"
9495
elif isinstance(data_source, BigQuerySource):

sdk/python/feast/infra/aws.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,8 @@ def _deploy_feature_server(self, project: str, image_uri: str):
119119
lambda_client = boto3.client("lambda")
120120
api_gateway_client = boto3.client("apigatewayv2")
121121
function = aws_utils.get_lambda_function(lambda_client, resource_name)
122+
_logger.debug("Using function name: %s", resource_name)
123+
_logger.debug("Found function: %s", function)
122124

123125
if function is None:
124126
# If the Lambda function does not exist, create it.
@@ -309,7 +311,7 @@ def _create_or_get_repository_uri(self, ecr_client):
309311

310312
def _get_lambda_name(project: str):
311313
lambda_prefix = AWS_LAMBDA_FEATURE_SERVER_REPOSITORY
312-
lambda_suffix = f"{project}-{_get_docker_image_version()}"
314+
lambda_suffix = f"{project}-{_get_docker_image_version().replace('.', '_')}"
313315
# AWS Lambda name can't have the length greater than 64 bytes.
314316
# This usually occurs during integration tests where feast version is long
315317
if len(lambda_prefix) + len(lambda_suffix) >= 63:
@@ -338,7 +340,7 @@ def _get_docker_image_version() -> str:
338340
else:
339341
version = get_version()
340342
if "dev" in version:
341-
version = version[: version.find("dev") - 1].replace(".", "_")
343+
version = version[: version.find("dev") - 1]
342344
_logger.warning(
343345
"You are trying to use AWS Lambda feature server while Feast is in a development mode. "
344346
f"Feast will use a docker image version {version} derived from Feast SDK "
@@ -347,8 +349,6 @@ def _get_docker_image_version() -> str:
347349
"> git fetch --all --tags\n"
348350
"> pip install -e sdk/python"
349351
)
350-
else:
351-
version = version.replace(".", "_")
352352
return version
353353

354354

sdk/python/feast/infra/offline_stores/bigquery_source.py

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
from feast import type_map
55
from feast.data_source import DataSource
6-
from feast.errors import DataSourceNoNameException, DataSourceNotFoundException
6+
from feast.errors import DataSourceNotFoundException
77
from feast.protos.feast.core.DataSource_pb2 import DataSource as DataSourceProto
88
from feast.protos.feast.core.SavedDataset_pb2 import (
99
SavedDatasetStorage as SavedDatasetStorageProto,
@@ -16,19 +16,18 @@
1616
class BigQuerySource(DataSource):
1717
def __init__(
1818
self,
19-
name: Optional[str] = None,
2019
event_timestamp_column: Optional[str] = "",
2120
table: Optional[str] = None,
2221
table_ref: Optional[str] = None,
2322
created_timestamp_column: Optional[str] = "",
2423
field_mapping: Optional[Dict[str, str]] = None,
2524
date_partition_column: Optional[str] = "",
2625
query: Optional[str] = None,
26+
name: Optional[str] = None,
2727
):
2828
"""Create a BigQuerySource from an existing table or query.
2929
3030
Args:
31-
name (optional): Name for the source. Defaults to the table_ref if not specified.
3231
table (optional): The BigQuery table where features can be found.
3332
table_ref (optional): (Deprecated) The BigQuery table where features can be found.
3433
event_timestamp_column: Event timestamp column used for point in time joins of feature values.
@@ -37,13 +36,13 @@ def __init__(
3736
or view. Only used for feature columns, not entities or timestamp columns.
3837
date_partition_column (optional): Timestamp column used for partitioning.
3938
query (optional): SQL query to execute to generate data for this data source.
40-
39+
name (optional): Name for the source. Defaults to the table_ref if not specified.
4140
Example:
4241
>>> from feast import BigQuerySource
4342
>>> my_bigquery_source = BigQuerySource(table="gcp_project:bq_dataset.bq_table")
4443
"""
4544
if table is None and table_ref is None and query is None:
46-
raise ValueError('No "table" argument provided.')
45+
raise ValueError('No "table" or "query" argument provided.')
4746
if not table and table_ref:
4847
warnings.warn(
4948
(
@@ -63,7 +62,12 @@ def __init__(
6362
elif table_ref:
6463
_name = table_ref
6564
else:
66-
raise DataSourceNoNameException()
65+
warnings.warn(
66+
(
67+
"Starting in Feast 0.21, Feast will require either a name for a data source (if using query) or `table`."
68+
),
69+
DeprecationWarning,
70+
)
6771

6872
super().__init__(
6973
_name if _name else "",

sdk/python/feast/infra/offline_stores/file_source.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,19 +20,18 @@ class FileSource(DataSource):
2020
def __init__(
2121
self,
2222
path: str,
23-
name: Optional[str] = "",
2423
event_timestamp_column: Optional[str] = "",
2524
file_format: Optional[FileFormat] = None,
2625
created_timestamp_column: Optional[str] = "",
2726
field_mapping: Optional[Dict[str, str]] = None,
2827
date_partition_column: Optional[str] = "",
2928
s3_endpoint_override: Optional[str] = None,
29+
name: Optional[str] = "",
3030
):
3131
"""Create a FileSource from a file containing feature data. Only Parquet format supported.
3232
3333
Args:
3434
35-
name (optional): Name for the file source. Defaults to the path.
3635
path: File path to file containing feature data. Must contain an event_timestamp column, entity columns and
3736
feature columns.
3837
event_timestamp_column: Event timestamp column used for point in time joins of feature values.
@@ -42,6 +41,7 @@ def __init__(
4241
or view. Only used for feature columns, not entities or timestamp columns.
4342
date_partition_column (optional): Timestamp column used for partitioning.
4443
s3_endpoint_override (optional): Overrides AWS S3 enpoint with custom S3 storage
44+
name (optional): Name for the file source. Defaults to the path.
4545
4646
Examples:
4747
>>> from feast import FileSource

sdk/python/feast/infra/offline_stores/redshift_source.py

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,9 @@
1+
import warnings
12
from typing import Callable, Dict, Iterable, Optional, Tuple
23

34
from feast import type_map
45
from feast.data_source import DataSource
5-
from feast.errors import (
6-
DataSourceNoNameException,
7-
DataSourceNotFoundException,
8-
RedshiftCredentialsError,
9-
)
6+
from feast.errors import DataSourceNotFoundException, RedshiftCredentialsError
107
from feast.protos.feast.core.DataSource_pb2 import DataSource as DataSourceProto
118
from feast.protos.feast.core.SavedDataset_pb2 import (
129
SavedDatasetStorage as SavedDatasetStorageProto,
@@ -19,20 +16,19 @@
1916
class RedshiftSource(DataSource):
2017
def __init__(
2118
self,
22-
name: Optional[str] = None,
2319
event_timestamp_column: Optional[str] = "",
2420
table: Optional[str] = None,
2521
schema: Optional[str] = None,
2622
created_timestamp_column: Optional[str] = "",
2723
field_mapping: Optional[Dict[str, str]] = None,
2824
date_partition_column: Optional[str] = "",
2925
query: Optional[str] = None,
26+
name: Optional[str] = None,
3027
):
3128
"""
3229
Creates a RedshiftSource object.
3330
3431
Args:
35-
name (optional): Name for the source. Defaults to the table_ref if not specified.
3632
event_timestamp_column (optional): Event timestamp column used for point in
3733
time joins of feature values.
3834
table (optional): Redshift table where the features are stored.
@@ -43,6 +39,7 @@ def __init__(
4339
source to column names in a feature table or view.
4440
date_partition_column (optional): Timestamp column used for partitioning.
4541
query (optional): The query to be executed to obtain the features.
42+
name (optional): Name for the source. Defaults to the table_ref if not specified.
4643
"""
4744
if table is None and query is None:
4845
raise ValueError('No "table" argument provided.')
@@ -51,11 +48,15 @@ def __init__(
5148
if table:
5249
_name = table
5350
else:
54-
raise DataSourceNoNameException()
51+
warnings.warn(
52+
(
53+
"Starting in Feast 0.21, Feast will require either a name for a data source (if using query) or `table`."
54+
),
55+
DeprecationWarning,
56+
)
5557

56-
# TODO(adchia): figure out what to do if user uses the query to start
5758
super().__init__(
58-
_name,
59+
_name if _name else "",
5960
event_timestamp_column,
6061
created_timestamp_column,
6162
field_mapping,

0 commit comments

Comments
 (0)