Skip to content

Commit 2e0113e

Browse files
Mwad22mavysavydavTsotne Tabidzeachalswoop
authored
Set default feature naming to not include feature view name. Add option to include feature view name in feature naming. (#1641)
* test Signed-off-by: David Y Liu <davidyliuliu@gmail.com> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * refactored existing tests to test full_feature_names feature on data retreival, added new tests also. Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * removed full_feature_names usage from quickstart and README to have more simple examples. Resolved failing tests. Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Update CHANGELOG for Feast v0.10.8 Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * GitBook: [master] 2 pages modified Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Schema Inferencing should happen at apply time (#1646) * wip1 Signed-off-by: David Y Liu <davidyliuliu@gmail.com> * just need to do clean up Signed-off-by: David Y Liu <davidyliuliu@gmail.com> * linted Signed-off-by: David Y Liu <davidyliuliu@gmail.com> * improve test coverage Signed-off-by: David Y Liu <davidyliuliu@gmail.com> * changed placement of inference methods in repo_operation apply_total Signed-off-by: David Y Liu <davidyliuliu@gmail.com> * updated inference method name + changed to void return since it updates in place Signed-off-by: David Y Liu <davidyliuliu@gmail.com> * fixed integration test and added comments Signed-off-by: David Y Liu <davidyliuliu@gmail.com> * Made DataSource event_timestamp_column optional Signed-off-by: David Y Liu <davidyliuliu@gmail.com> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * GitBook: [master] 80 pages modified Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * GitBook: [master] 80 pages modified Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Provide descriptive error on invalid table reference (#1627) * Initial commit to catch nonexistent table Signed-off-by: Cody Lin <codyjlin@yahoo.com> Signed-off-by: Cody Lin <codyl@twitter.com> * simplify nonexistent BQ table test Signed-off-by: Cody Lin <codyl@twitter.com> * clean up table_exists exception Signed-off-by: Cody Lin <codyl@twitter.com> * remove unneeded variable Signed-off-by: Cody Lin <codyl@twitter.com> * function name change to _assert_table_exists Signed-off-by: Cody Lin <codyl@twitter.com> * Initial commit to catch nonexistent table Signed-off-by: Cody Lin <codyjlin@yahoo.com> Signed-off-by: Cody Lin <codyl@twitter.com> * simplify nonexistent BQ table test Signed-off-by: Cody Lin <codyl@twitter.com> * clean up table_exists exception Signed-off-by: Cody Lin <codyl@twitter.com> * function name change to _assert_table_exists Signed-off-by: Cody Lin <codyl@twitter.com> * fix lint errors and rebase Signed-off-by: Cody Lin <codyl@twitter.com> * Fix get_table(None) error Signed-off-by: Cody Lin <codyl@twitter.com> * custom exception for both missing file and BQ source Signed-off-by: Cody Lin <codyl@twitter.com> * revert FileSource checks Signed-off-by: Cody Lin <codyl@twitter.com> * Use DataSourceNotFoundException instead of subclassing Signed-off-by: Cody Lin <codyl@twitter.com> * Moved assert_table_exists out of the BQ constructor to apply_total Signed-off-by: Cody Lin <codyl@twitter.com> * rename test and test asset Signed-off-by: Cody Lin <codyl@twitter.com> * move validate logic back to data_source Signed-off-by: Cody Lin <codyl@twitter.com> * fixed tests Signed-off-by: Cody Lin <codyl@twitter.com> * Set pytest.integration for tests that access BQ Signed-off-by: Cody Lin <codyl@twitter.com> * Import pytest in failed test files Signed-off-by: Cody Lin <codyl@twitter.com> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Refactor OnlineStoreConfig classes into owning modules (#1649) * Refactor OnlineStoreConfig classes into owning modules Signed-off-by: Achal Shah <achals@gmail.com> * make format Signed-off-by: Achal Shah <achals@gmail.com> * Move redis too Signed-off-by: Achal Shah <achals@gmail.com> * update test_telemetery Signed-off-by: Achal Shah <achals@gmail.com> * add a create_repo_config method that should be called instead of RepoConfig ctor directly Signed-off-by: Achal Shah <achals@gmail.com> * fix the table reference in repo_operations Signed-off-by: Achal Shah <achals@gmail.com> * reuse create_repo_config Signed-off-by: Achal Shah <achals@gmail.com> Remove redis provider reference * CR comments Signed-off-by: Achal Shah <achals@gmail.com> * Remove create_repo_config in favor of __init__ Signed-off-by: Achal Shah <achals@gmail.com> * make format Signed-off-by: Achal Shah <achals@gmail.com> * Remove print statement Signed-off-by: Achal Shah <achals@gmail.com> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Possibility to specify a project for BigQuery queries (#1656) Signed-off-by: Matt Delacour <matt.delacour@shopify.com> Co-authored-by: Achal Shah <achals@gmail.com> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Refactor OfflineStoreConfig classes into their owning modules (#1657) * Refactor OfflineStoreConfig classes into their owning modules Signed-off-by: Achal Shah <achals@gmail.com> * Fix error string Signed-off-by: Achal Shah <achals@gmail.com> * Generic error class Signed-off-by: Achal Shah <achals@gmail.com> * Merge conflicts Signed-off-by: Achal Shah <achals@gmail.com> * make the store type work, and add a test that uses the fully qualified name of the OnlineStore Signed-off-by: Achal Shah <achals@gmail.com> * Address comments from previous PR Signed-off-by: Achal Shah <achals@gmail.com> * CR updates Signed-off-by: Achal Shah <achals@gmail.com> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Run python unit tests in parallel (#1652) Signed-off-by: Achal Shah <achals@gmail.com> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Rename telemetry to usage (#1660) * Rename telemetry to usage Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai> * Update docs Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai> * Update .prow and infra Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai> * Rename file Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai> * Change url Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai> * Re-add telemetry.md for backwards-compatibility Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * resolved final comments on PR (variable renaming, refactor tests) Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * reformatted after merge conflict Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Update CHANGELOG for Feast v0.11.0 Signed-off-by: Willem Pienaar <git@willem.co> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Update charts README (#1659) Adding feast jupyter link to it. + Fix the helm 'feast-serving' name in aws/azure terraform. Signed-off-by: szalai1 <szalaipeti.vagyok@gmail.com> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Added Redis to list of online stores for local provider in providers reference doc. (#1668) Signed-off-by: Nel Swanepoel <c.swanepoel@ucl.ac.uk> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Grouped inferencing statements together in apply methods for easier readability (#1667) * grouped inferencing statements together Signed-off-by: David Y Liu <davidyliuliu@gmail.com> * update in testing Signed-off-by: David Y Liu <davidyliuliu@gmail.com> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Add RedshiftDataSource (#1669) * Add RedshiftDataSource Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai> * Call parent __init__ first Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Provide the user with more options for setting the to_bigquery config (#1661) * Provide more options for to_bigquery config Signed-off-by: Cody Lin <codyl@twitter.com> * Fix default job_config when none; remove excessive testing Signed-off-by: Cody Lin <codyl@twitter.com> * Add param type and docstring Signed-off-by: Cody Lin <codyl@twitter.com> * add docstrings and typing Signed-off-by: Cody Lin <codyl@twitter.com> * Apply docstring suggestions from code review Co-authored-by: Willem Pienaar <6728866+woop@users.noreply.github.com> Signed-off-by: Cody Lin <codyjlin@yahoomail.com> Co-authored-by: Willem Pienaar <6728866+woop@users.noreply.github.com> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Add streaming sources to the FeatureView API (#1664) * Add a streaming source to the FeatureView API This diff only updates the API. It is currently up to the providers to actually use this information to spin up resources to consume events from the stream sources. Signed-off-by: Achal Shah <achals@gmail.com> * remove stuff from rebase Signed-off-by: Achal Shah <achals@gmail.com> * make format Signed-off-by: Achal Shah <achals@gmail.com> * Update protos Signed-off-by: Achal Shah <achals@gmail.com> * lint Signed-off-by: Achal Shah <achals@gmail.com> * format Signed-off-by: Achal Shah <achals@gmail.com> * CR Signed-off-by: Achal Shah <achals@gmail.com> * fix test Signed-off-by: Achal Shah <achals@gmail.com> * lint Signed-off-by: Achal Shah <achals@gmail.com> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Add to_table() to RetrievalJob object (#1663) * Add notion of OfflineJob Signed-off-by: Matt Delacour <matt.delacour@shopify.com> * Use RetrievalJob instead of creating a new OfflineJob object Signed-off-by: Matt Delacour <matt.delacour@shopify.com> * Add to_table() in integration tests Signed-off-by: Matt Delacour <matt.delacour@shopify.com> Co-authored-by: Tsotne Tabidze <tsotne@tecton.ai> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Rename to_table to to_arrow (#1671) Signed-off-by: Matt Delacour <matt.delacour@shopify.com> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Cancel BigQuery job if timeout hits (#1672) * Cancel BigQuery job if timedout hits Signed-off-by: Matt Delacour <matt.delacour@shopify.com> * Fix typo Signed-off-by: Matt Delacour <matt.delacour@shopify.com> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Fix Feature References example (#1674) Fix Feature References example by passing `entity_rows` to `get_online_features()` Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Allow strings for online/offline store instead of dicts (#1673) Signed-off-by: Achal Shah <achals@gmail.com> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Remove default list from the FeatureView constructor (#1679) Signed-off-by: Achal Shah <achals@gmail.com> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * made changes requested by @tsotnet Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Fix unit tests that got broken by Pandas 1.3.0 release (#1683) Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Add support for DynamoDB and S3 registry (#1483) * Add support for DynamoDB and S3 registry Signed-off-by: lblokhin <lenin133@yandex.ru> * rcu and wcu as a parameter of dynamodb online store Signed-off-by: lblokhin <lenin133@yandex.ru> * fix linter Signed-off-by: lblokhin <lenin133@yandex.ru> * aws dependency to extras Signed-off-by: lblokhin <lenin133@yandex.ru> * FEAST_S3_ENDPOINT_URL Signed-off-by: lblokhin <lenin133@yandex.ru> * tests Signed-off-by: lblokhin <lenin133@yandex.ru> * fix signature, after merge Signed-off-by: lblokhin <lenin133@yandex.ru> * aws default region name configurable Signed-off-by: lblokhin <lenin133@yandex.ru> * add offlinestore config type to test Signed-off-by: lblokhin <lenin133@yandex.ru> * review changes Signed-off-by: lblokhin <lenin133@yandex.ru> * review requested changes Signed-off-by: lblokhin <lenin133@yandex.ru> * integration test for Dynamo Signed-off-by: lblokhin <lenin133@yandex.ru> * change the rest of table_name to table_instance (where table_name is actually an instance of DynamoDB Table object) Signed-off-by: lblokhin <lenin133@yandex.ru> * fix DynamoDBOnlineStore commit Signed-off-by: lblokhin <lenin133@yandex.ru> * move client to _initialize_dynamodb Signed-off-by: lblokhin <lenin133@yandex.ru> * rename document_id to entity_id and Row to entity_id Signed-off-by: lblokhin <lenin133@yandex.ru> * The default value is None Signed-off-by: lblokhin <lenin133@yandex.ru> * Remove Datastore from the docstring. Signed-off-by: lblokhin <lenin133@yandex.ru> * get rid of the return call from S3RegistryStore Signed-off-by: lblokhin <lenin133@yandex.ru> * merge two exceptions Signed-off-by: lblokhin <lenin133@yandex.ru> * For ci requirement Signed-off-by: lblokhin <lenin133@yandex.ru> * remove configuration from test Signed-off-by: lblokhin <lenin133@yandex.ru> * feast-integration-tests for tests Signed-off-by: lblokhin <lenin133@yandex.ru> * change test path Signed-off-by: lblokhin <lenin133@yandex.ru> * add fixture feature_store_with_s3_registry to test Signed-off-by: lblokhin <lenin133@yandex.ru> * region required Signed-off-by: lblokhin <lenin133@yandex.ru> * Address the rest of the comments Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai> * Update to_table to to_arrow Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai> Co-authored-by: Tsotne Tabidze <tsotne@tecton.ai> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Parallelize integration tests (#1684) * Parallelize integration tests Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai> * Update the usage flag Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * BQ exception should be raised first before we check the timedout (#1675) Signed-off-by: Matt Delacour <matt.delacour@shopify.com> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Update sdk/python/feast/infra/provider.py Co-authored-by: Willem Pienaar <6728866+woop@users.noreply.github.com> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Update sdk/python/feast/feature_store.py Co-authored-by: Willem Pienaar <6728866+woop@users.noreply.github.com> Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * made error logic/messages more descriptive Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * made error logic/messages more descriptive. Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * Simplified error messages Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * ran formatter, issue in errors.py Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * python linter issues resolved Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * removed unnecessary default assignment in get_historical_features. default now set only in feature_store.py Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> * added error message assertion for feature name collisions, and other nitpick changes Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com> Co-authored-by: David Y Liu <davidyliuliu@gmail.com> Co-authored-by: Tsotne Tabidze <tsotne@tecton.ai> Co-authored-by: Achal Shah <achals@gmail.com> Co-authored-by: David Y Liu <7172604+mavysavydav@users.noreply.github.com> Co-authored-by: Willem Pienaar <github@willem.co> Co-authored-by: codyjlin <31944154+codyjlin@users.noreply.github.com> Co-authored-by: Matt Delacour <MattDelac@users.noreply.github.com> Co-authored-by: Willem Pienaar <git@willem.co> Co-authored-by: Peter Szalai <szalaipeti.vagyok@gmail.com> Co-authored-by: Nel Swanepoel <nels@users.noreply.github.com> Co-authored-by: Willem Pienaar <6728866+woop@users.noreply.github.com> Co-authored-by: Greg Kuhlmann <greg.kuhlmann@gmail.com> Co-authored-by: Leonid <lenin133@yandex.ru>
1 parent 6a09d49 commit 2e0113e

20 files changed

+356
-139
lines changed

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -75,11 +75,11 @@ print(training_df.head())
7575
# model = ml.fit(training_df)
7676
```
7777
```commandline
78-
event_timestamp driver_id driver_hourly_stats__conv_rate driver_hourly_stats__acc_rate
79-
2021-04-12 08:12:10 1002 0.497279 0.357702
80-
2021-04-12 10:59:42 1001 0.979747 0.008166
81-
2021-04-12 15:01:12 1004 0.151432 0.551748
82-
2021-04-12 16:40:26 1003 0.951506 0.753572
78+
event_timestamp driver_id conv_rate acc_rate avg_daily_trips
79+
0 2021-04-12 08:12:10+00:00 1002 0.713465 0.597095 531
80+
1 2021-04-12 10:59:42+00:00 1001 0.072752 0.044344 11
81+
2 2021-04-12 15:01:12+00:00 1004 0.658182 0.079150 220
82+
3 2021-04-12 16:40:26+00:00 1003 0.162092 0.309035 959
8383
8484
```
8585

docs/quickstart.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -119,9 +119,9 @@ pprint(feature_vector)
119119
```python
120120
{
121121
'driver_id': [1001],
122-
'driver_hourly_stats__conv_rate': [0.49274],
123-
'driver_hourly_stats__acc_rate': [0.92743],
124-
'driver_hourly_stats__avg_daily_trips': [72],
122+
'conv_rate': [0.49274],
123+
'acc_rate': [0.92743],
124+
'avg_daily_trips': [72],
125125
}
126126
```
127127

sdk/python/feast/errors.py

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
from typing import Set
1+
from typing import List, Set
22

33
from colorama import Fore, Style
44

@@ -88,6 +88,27 @@ def __init__(self, offline_store_name: str, data_source_name: str):
8888
)
8989

9090

91+
class FeatureNameCollisionError(Exception):
92+
def __init__(self, feature_refs_collisions: List[str], full_feature_names: bool):
93+
if full_feature_names:
94+
collisions = [ref.replace(":", "__") for ref in feature_refs_collisions]
95+
error_message = (
96+
"To resolve this collision, please ensure that the features in question "
97+
"have different names."
98+
)
99+
else:
100+
collisions = [ref.split(":")[1] for ref in feature_refs_collisions]
101+
error_message = (
102+
"To resolve this collision, either use the full feature name by setting "
103+
"'full_feature_names=True', or ensure that the features in question have different names."
104+
)
105+
106+
feature_names = ", ".join(set(collisions))
107+
super().__init__(
108+
f"Duplicate features named {feature_names} found.\n{error_message}"
109+
)
110+
111+
91112
class FeastOnlineStoreInvalidName(Exception):
92113
def __init__(self, online_store_class_name: str):
93114
super().__init__(

sdk/python/feast/feature_store.py

Lines changed: 70 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,7 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414
import os
15-
import sys
16-
from collections import OrderedDict, defaultdict
15+
from collections import Counter, OrderedDict, defaultdict
1716
from datetime import datetime, timedelta
1817
from pathlib import Path
1918
from typing import Any, Dict, List, Optional, Tuple, Union
@@ -24,7 +23,7 @@
2423

2524
from feast import utils
2625
from feast.entity import Entity
27-
from feast.errors import FeastProviderLoginError, FeatureViewNotFoundException
26+
from feast.errors import FeatureNameCollisionError, FeatureViewNotFoundException
2827
from feast.feature_view import FeatureView
2928
from feast.inference import (
3029
update_data_sources_with_inferred_event_timestamp_col,
@@ -230,9 +229,11 @@ def apply(
230229
update_entities_with_inferred_types_from_feature_views(
231230
entities_to_update, views_to_update, self.config
232231
)
232+
233233
update_data_sources_with_inferred_event_timestamp_col(
234234
[view.input for view in views_to_update], self.config
235235
)
236+
236237
for view in views_to_update:
237238
view.infer_features_from_input_source(self.config)
238239

@@ -255,7 +256,10 @@ def apply(
255256

256257
@log_exceptions_and_usage
257258
def get_historical_features(
258-
self, entity_df: Union[pd.DataFrame, str], feature_refs: List[str],
259+
self,
260+
entity_df: Union[pd.DataFrame, str],
261+
feature_refs: List[str],
262+
full_feature_names: bool = False,
259263
) -> RetrievalJob:
260264
"""Enrich an entity dataframe with historical feature values for either training or batch scoring.
261265
@@ -277,6 +281,9 @@ def get_historical_features(
277281
SQL query. The query must be of a format supported by the configured offline store (e.g., BigQuery)
278282
feature_refs: A list of features that should be retrieved from the offline store. Feature references are of
279283
the format "feature_view:feature", e.g., "customer_fv:daily_transactions".
284+
full_feature_names: A boolean that provides the option to add the feature view prefixes to the feature names,
285+
changing them from the format "feature" to "feature_view__feature" (e.g., "daily_transactions" changes to
286+
"customer_fv__daily_transactions"). By default, this value is set to False.
280287
281288
Returns:
282289
RetrievalJob which can be used to materialize the results.
@@ -289,32 +296,29 @@ def get_historical_features(
289296
>>> fs = FeatureStore(config=RepoConfig(provider="gcp"))
290297
>>> retrieval_job = fs.get_historical_features(
291298
>>> entity_df="SELECT event_timestamp, order_id, customer_id from gcp_project.my_ds.customer_orders",
292-
>>> feature_refs=["customer:age", "customer:avg_orders_1d", "customer:avg_orders_7d"]
293-
>>> )
299+
>>> feature_refs=["customer:age", "customer:avg_orders_1d", "customer:avg_orders_7d"],
300+
>>> )
294301
>>> feature_data = retrieval_job.to_df()
295302
>>> model.fit(feature_data) # insert your modeling framework here.
296303
"""
297-
298304
all_feature_views = self._registry.list_feature_views(project=self.project)
299-
try:
300-
feature_views = _get_requested_feature_views(
301-
feature_refs, all_feature_views
302-
)
303-
except FeatureViewNotFoundException as e:
304-
sys.exit(e)
305+
306+
_validate_feature_refs(feature_refs, full_feature_names)
307+
feature_views = list(
308+
view for view, _ in _group_feature_refs(feature_refs, all_feature_views)
309+
)
305310

306311
provider = self._get_provider()
307-
try:
308-
job = provider.get_historical_features(
309-
self.config,
310-
feature_views,
311-
feature_refs,
312-
entity_df,
313-
self._registry,
314-
self.project,
315-
)
316-
except FeastProviderLoginError as e:
317-
sys.exit(e)
312+
313+
job = provider.get_historical_features(
314+
self.config,
315+
feature_views,
316+
feature_refs,
317+
entity_df,
318+
self._registry,
319+
self.project,
320+
full_feature_names,
321+
)
318322

319323
return job
320324

@@ -480,7 +484,10 @@ def tqdm_builder(length):
480484

481485
@log_exceptions_and_usage
482486
def get_online_features(
483-
self, feature_refs: List[str], entity_rows: List[Dict[str, Any]],
487+
self,
488+
feature_refs: List[str],
489+
entity_rows: List[Dict[str, Any]],
490+
full_feature_names: bool = False,
484491
) -> OnlineResponse:
485492
"""
486493
Retrieves the latest online feature data.
@@ -548,7 +555,8 @@ def get_online_features(
548555
project=self.project, allow_cache=True
549556
)
550557

551-
grouped_refs = _group_refs(feature_refs, all_feature_views)
558+
_validate_feature_refs(feature_refs, full_feature_names)
559+
grouped_refs = _group_feature_refs(feature_refs, all_feature_views)
552560
for table, requested_features in grouped_refs:
553561
entity_keys = _get_table_entity_keys(
554562
table, union_of_entity_keys, entity_name_to_join_key_map
@@ -565,13 +573,21 @@ def get_online_features(
565573

566574
if feature_data is None:
567575
for feature_name in requested_features:
568-
feature_ref = f"{table.name}__{feature_name}"
576+
feature_ref = (
577+
f"{table.name}__{feature_name}"
578+
if full_feature_names
579+
else feature_name
580+
)
569581
result_row.statuses[
570582
feature_ref
571583
] = GetOnlineFeaturesResponse.FieldStatus.NOT_FOUND
572584
else:
573585
for feature_name in feature_data:
574-
feature_ref = f"{table.name}__{feature_name}"
586+
feature_ref = (
587+
f"{table.name}__{feature_name}"
588+
if full_feature_names
589+
else feature_name
590+
)
575591
if feature_name in requested_features:
576592
result_row.fields[feature_ref].CopyFrom(
577593
feature_data[feature_name]
@@ -599,7 +615,31 @@ def _entity_row_to_field_values(
599615
return result
600616

601617

602-
def _group_refs(
618+
def _validate_feature_refs(feature_refs: List[str], full_feature_names: bool = False):
619+
collided_feature_refs = []
620+
621+
if full_feature_names:
622+
collided_feature_refs = [
623+
ref for ref, occurrences in Counter(feature_refs).items() if occurrences > 1
624+
]
625+
else:
626+
feature_names = [ref.split(":")[1] for ref in feature_refs]
627+
collided_feature_names = [
628+
ref
629+
for ref, occurrences in Counter(feature_names).items()
630+
if occurrences > 1
631+
]
632+
633+
for feature_name in collided_feature_names:
634+
collided_feature_refs.extend(
635+
[ref for ref in feature_refs if ref.endswith(":" + feature_name)]
636+
)
637+
638+
if len(collided_feature_refs) > 0:
639+
raise FeatureNameCollisionError(collided_feature_refs, full_feature_names)
640+
641+
642+
def _group_feature_refs(
603643
feature_refs: List[str], all_feature_views: List[FeatureView]
604644
) -> List[Tuple[FeatureView, List[str]]]:
605645
""" Get list of feature views and corresponding feature names based on feature references"""
@@ -612,6 +652,7 @@ def _group_refs(
612652

613653
for ref in feature_refs:
614654
view_name, feat_name = ref.split(":")
655+
615656
if view_name not in view_index:
616657
raise FeatureViewNotFoundException(view_name)
617658
views_features[view_name].append(feat_name)
@@ -622,14 +663,6 @@ def _group_refs(
622663
return result
623664

624665

625-
def _get_requested_feature_views(
626-
feature_refs: List[str], all_feature_views: List[FeatureView]
627-
) -> List[FeatureView]:
628-
"""Get list of feature views based on feature references"""
629-
# TODO: Get rid of this function. We only need _group_refs
630-
return list(view for view, _ in _group_refs(feature_refs, all_feature_views))
631-
632-
633666
def _get_table_entity_keys(
634667
table: FeatureView, entity_keys: List[EntityKeyProto], join_key_map: Dict[str, str],
635668
) -> List[EntityKeyProto]:

sdk/python/feast/infra/aws.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,7 @@ def get_historical_features(
129129
entity_df: Union[pandas.DataFrame, str],
130130
registry: Registry,
131131
project: str,
132+
full_feature_names: bool,
132133
) -> RetrievalJob:
133134
job = self.offline_store.get_historical_features(
134135
config=config,
@@ -137,5 +138,6 @@ def get_historical_features(
137138
entity_df=entity_df,
138139
registry=registry,
139140
project=project,
141+
full_feature_names=full_feature_names,
140142
)
141143
return job

sdk/python/feast/infra/gcp.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,7 @@ def get_historical_features(
131131
entity_df: Union[pandas.DataFrame, str],
132132
registry: Registry,
133133
project: str,
134+
full_feature_names: bool,
134135
) -> RetrievalJob:
135136
job = self.offline_store.get_historical_features(
136137
config=config,
@@ -139,5 +140,6 @@ def get_historical_features(
139140
entity_df=entity_df,
140141
registry=registry,
141142
project=project,
143+
full_feature_names=full_feature_names,
142144
)
143145
return job

sdk/python/feast/infra/local.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,7 @@ def get_historical_features(
130130
entity_df: Union[pd.DataFrame, str],
131131
registry: Registry,
132132
project: str,
133+
full_feature_names: bool,
133134
) -> RetrievalJob:
134135
return self.offline_store.get_historical_features(
135136
config=config,
@@ -138,6 +139,7 @@ def get_historical_features(
138139
entity_df=entity_df,
139140
registry=registry,
140141
project=project,
142+
full_feature_names=full_feature_names,
141143
)
142144

143145

sdk/python/feast/infra/offline_stores/bigquery.py

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@ def get_historical_features(
9595
entity_df: Union[pandas.DataFrame, str],
9696
registry: Registry,
9797
project: str,
98+
full_feature_names: bool = False,
9899
) -> RetrievalJob:
99100
# TODO: Add entity_df validation in order to fail before interacting with BigQuery
100101
assert isinstance(config.offline_store, BigQueryOfflineStoreConfig)
@@ -121,7 +122,11 @@ def get_historical_features(
121122

122123
# Build a query context containing all information required to template the BigQuery SQL query
123124
query_context = get_feature_view_query_context(
124-
feature_refs, feature_views, registry, project
125+
feature_refs,
126+
feature_views,
127+
registry,
128+
project,
129+
full_feature_names=full_feature_names,
125130
)
126131

127132
# TODO: Infer min_timestamp and max_timestamp from entity_df
@@ -132,6 +137,7 @@ def get_historical_features(
132137
max_timestamp=datetime.now() + timedelta(days=1),
133138
left_table_query_string=str(table.reference),
134139
entity_df_event_timestamp_col=entity_df_event_timestamp_col,
140+
full_feature_names=full_feature_names,
135141
)
136142

137143
job = BigQueryRetrievalJob(query=query, client=client, config=config)
@@ -373,6 +379,7 @@ def get_feature_view_query_context(
373379
feature_views: List[FeatureView],
374380
registry: Registry,
375381
project: str,
382+
full_feature_names: bool = False,
376383
) -> List[FeatureViewQueryContext]:
377384
"""Build a query context containing all information required to template a BigQuery point-in-time SQL query"""
378385

@@ -432,6 +439,7 @@ def build_point_in_time_query(
432439
max_timestamp: datetime,
433440
left_table_query_string: str,
434441
entity_df_event_timestamp_col: str,
442+
full_feature_names: bool = False,
435443
):
436444
"""Build point-in-time query between each feature view table and the entity dataframe"""
437445
template = Environment(loader=BaseLoader()).from_string(
@@ -448,6 +456,7 @@ def build_point_in_time_query(
448456
[entity for fv in feature_view_query_contexts for entity in fv.entities]
449457
),
450458
"featureviews": [asdict(context) for context in feature_view_query_contexts],
459+
"full_feature_names": full_feature_names,
451460
}
452461

453462
query = template.render(template_context)
@@ -521,7 +530,7 @@ def _get_bigquery_client(project: Optional[str] = None):
521530
{{ featureview.created_timestamp_column ~ ' as created_timestamp,' if featureview.created_timestamp_column else '' }}
522531
{{ featureview.entity_selections | join(', ')}},
523532
{% for feature in featureview.features %}
524-
{{ feature }} as {{ featureview.name }}__{{ feature }}{% if loop.last %}{% else %}, {% endif %}
533+
{{ feature }} as {% if full_feature_names %}{{ featureview.name }}__{{feature}}{% else %}{{ feature }}{% endif %}{% if loop.last %}{% else %}, {% endif %}
525534
{% endfor %}
526535
FROM {{ featureview.table_subquery }}
527536
),
@@ -614,7 +623,7 @@ def _get_bigquery_client(project: Optional[str] = None):
614623
SELECT
615624
entity_row_unique_id,
616625
{% for feature in featureview.features %}
617-
{{ featureview.name }}__{{ feature }},
626+
{% if full_feature_names %}{{ featureview.name }}__{{feature}}{% else %}{{ feature }}{% endif %},
618627
{% endfor %}
619628
FROM {{ featureview.name }}__cleaned
620629
) USING (entity_row_unique_id)

0 commit comments

Comments
 (0)